Hammond Park Morepork (Ruru) Dataset

Hunt, Tim D. (2020) Hammond Park Morepork (Ruru) Dataset. Dataset (Unpublished)

[img] Archive (ZIP) (Zip file) - Supplemental Material

Official URL: https://cacophony.org.nz/

Abstract or Summary

This dataset contains 3493 mel spectrograms adn associated labels for the purpose of developing machine learning models. Readme (in archive file) The purpose of this dataset is to allow you to develop models to detect the audio call of the Morepork(Ruru). The dataset contains mel spectrograms of parts of recordings obtained from a Bird Recorder (The Cacophony Project, https://www.2040.co.nz/collections/cacophonometer-bird-monitoring). You can split the dataset into training validation to develop your model and on request to tim.hunt@wintec.ac.nz or timhot@hotmail.com or https://cacophony.org.nz/contact Tim Hunt can verifiy your model against separate test data (march_2020) not contained in this dataset. The recorder was located in Hammond Park, Hamilton New Zealand. Use numpy.load to import the two files. array_of_all_labels.npy Contains a list of the labels - in the same order as the mfccs in array_of_all_mfccs.npy array_of_all_mfccs.npy: Contains mel spectrograms in the frequency range 600 to 1200 Hz Each spectrogram represents 1.2 seconds Python code used to create mel spectrograms: def load_training_data_audio(recording_id, start_time, y_full_recording, sr): if y_full_recording is None: print(f"Recording {recording_id} has changed - going to load from file - start time is {start_time}") recordings_folder_with_path = parameters.base_folder_for_recordings + '/' + parameters.downloaded_recordings_folder filename = str(recording_id) + ".m4a" audio_in_path = recordings_folder_with_path + "/" + filename y_full_recording, sr = librosa.load(audio_in_path, sr=None, mono=True) else: print(f"Recording id is still {recording_id} and start time is now {start_time}") duration_secs = 1.2 # seems to give a spectrogram size start_time_seconds_float = float(start_time) start_position_array = int(sr * start_time_seconds_float) end_position_array = start_position_array + int((sr * duration_secs)) if end_position_array > y_full_recording.shape[0]: print('Clip would end after end of recording') return None, None, None # not sure if you have to return 3 Nones ! y_part = y_full_recording[start_position_array:end_position_array] # Using Dennis's approach for calculating nfft slices_per_second = 13 # chosen to give a spectrogram length of 32 as have 32 mels and want a square image nfft = int(sr / slices_per_second) hop_length=int(nfft / 2) specgram = librosa.feature.melspectrogram( y=y_part, sr=sr, n_fft=nfft, hop_length=hop_length, n_mels=32, fmin=600, fmax=1200) print(specgram.shape) mfccs = librosa.power_to_db(specgram, ref=np.max) # Have been having memory issues - so will save spectrogram as 0-255 integer values mfccs = (255*(mfccs - np.min(mfccs))/np.ptp(mfccs)).astype(int) # https://stackoverflow.com/questions/1735025/how-to-normalize-a-numpy-array-to-within-a-certain-range mfccs = np.uint8(mfccs) if mfccs.shape[1] < 32: # all must be the same size print("mfccs.shape is less than 32", mfccs.shape) return None # just throw it away if mfccs.shape[1] > 32: # all must be the same size print("mfccs.shape > 32 - will resize", mfccs.shape) mfccs = mfccs[:,:32] print("mfccs.shape 32?", mfccs.shape) mfccs = np.expand_dims(mfccs, axis=2) return mfccs ,sr , y_full_recording

Item Type:Dataset
Keywords that describe the item:dataset, Hammond Park Morepok
Subjects:T Technology > T Technology (General)
Divisions:Schools > Centre for Business, Information Technology and Enterprise > School of Information Technology
ID Code:7522
Deposited By:
Deposited On:03 Nov 2020 02:47
Last Modified:09 Dec 2020 21:31

Repository Staff Only: item control page