With the progression in artificial intelligence image recognition technology, Qrious data scientists decided to transform the unstructured audio files provided by DOC into visual spectrograms.
They could then use image classification technology and machine learning to train a model to automatically identify kiwi calls in those spectrograms.
DOC provided Qrious with 8,000 15-minute audio files as training data sets which had been manually tagged as ‘kiwi’ or ‘non-kiwi’.
Using AWS technology, these recordings were transformed into 900 second spectrograms. Where a kiwi call was identified in the spectrogram, this section was cropped into a seven second segment, and these segments became the training data.
Building the model
An image classification algorithm was then used to build the image recognition model, and a neural network model was trained to automatically classify a spectrogram as ‘kiwi’, ‘other bird’ or ‘background noise’.
To test the model, additional spectrograms were cropped into seven second frames which it was then able to separate into ‘kiwi’ or ‘non-kiwi’ files based on how they matched the training data.
Automating the results
The tool can now automatically convert each recording into 128 cropped image frames and find kiwi sounds within them. If the model believes there is a kiwi in one of the frames, it identifies when and automatically converts that frame back into a sound snippet for DOC’s team to identify.