Citation: UNSPECIFIED.
Hammond_park_Morepork_dataset.zip - Supplemental Material
Download (3MB)
Abstract
This dataset contains 3493 mel spectrograms adn associated labels for the purpose of developing machine learning models.
Readme (in archive file)
The purpose of this dataset is to allow you to develop models to detect the audio call of the Morepork(Ruru).
The dataset contains mel spectrograms of parts of recordings obtained from a Bird Recorder (The Cacophony Project, https://www.2040.co.nz/collections/cacophonometer-bird-monitoring).
You can split the dataset into training validation to develop your model and on request to tim.hunt@wintec.ac.nz or timhot@hotmail.com or https://cacophony.org.nz/contact Tim Hunt can verifiy your model against separate test data (march_2020) not contained in this dataset.
The recorder was located in Hammond Park, Hamilton New Zealand.
Use numpy.load to import the two files.
array_of_all_labels.npy
Contains a list of the labels - in the same order as the mfccs in array_of_all_mfccs.npy
array_of_all_mfccs.npy:
Contains mel spectrograms in the frequency range 600 to 1200 Hz
Each spectrogram represents 1.2 seconds
Python code used to create mel spectrograms:
def load_training_data_audio(recording_id, start_time, y_full_recording, sr):
if y_full_recording is None:
print(f"Recording {recording_id} has changed - going to load from file - start time is {start_time}")
recordings_folder_with_path = parameters.base_folder_for_recordings + '/' + parameters.downloaded_recordings_folder
filename = str(recording_id) + ".m4a"
audio_in_path = recordings_folder_with_path + "/" + filename
y_full_recording, sr = librosa.load(audio_in_path, sr=None, mono=True)
else:
print(f"Recording id is still {recording_id} and start time is now {start_time}")
duration_secs = 1.2 # seems to give a spectrogram size
start_time_seconds_float = float(start_time)
start_position_array = int(sr * start_time_seconds_float)
end_position_array = start_position_array + int((sr * duration_secs))
if end_position_array > y_full_recording.shape[0]:
print('Clip would end after end of recording')
return None, None, None # not sure if you have to return 3 Nones !
y_part = y_full_recording[start_position_array:end_position_array]
# Using Dennis's approach for calculating nfft
slices_per_second = 13 # chosen to give a spectrogram length of 32 as have 32 mels and want a square image
nfft = int(sr / slices_per_second)
hop_length=int(nfft / 2)
specgram = librosa.feature.melspectrogram(
y=y_part,
sr=sr,
n_fft=nfft,
hop_length=hop_length,
n_mels=32,
fmin=600,
fmax=1200)
print(specgram.shape)
mfccs = librosa.power_to_db(specgram, ref=np.max)
# Have been having memory issues - so will save spectrogram as 0-255 integer values
mfccs = (255*(mfccs - np.min(mfccs))/np.ptp(mfccs)).astype(int) # https://stackoverflow.com/questions/1735025/how-to-normalize-a-numpy-array-to-within-a-certain-range
mfccs = np.uint8(mfccs)
if mfccs.shape[1] < 32: # all must be the same size
print("mfccs.shape is less than 32", mfccs.shape)
return None # just throw it away
if mfccs.shape[1] > 32: # all must be the same size
print("mfccs.shape > 32 - will resize", mfccs.shape)
mfccs = mfccs[:,:32]
print("mfccs.shape 32?", mfccs.shape)
mfccs = np.expand_dims(mfccs, axis=2)
return mfccs ,sr , y_full_recording
Item Type: | Dataset |
---|---|
Uncontrolled Keywords: | dataset, Hammond Park Morepok |
Subjects: | T Technology > T Technology (General) |
Divisions: | Schools > Centre for Business, Information Technology and Enterprise > School of Information Technology |
Depositing User: | Tim Hunt |
Date Deposited: | 03 Nov 2020 02:47 |
Last Modified: | 21 Jul 2023 08:58 |
URI: | http://researcharchive.wintec.ac.nz/id/eprint/7522 |