Abstract— Field biologists use animal sounds to discover the presence of individuals and to study their behavior. Collecting bio-acoustic data has traditionally been a difficult and time-consuming process in which individual researchers use portable microphones to record sounds while taking notes of their own detailed observations. The recent development of new deployable acoustic sensor platforms presents opportunities to develop automated tools for bio-acoustic field research. In this work, we implement an AML-based source localization algorithm, and use it to localize marmot alarm-calls. We assess the performance of these techniques based on results from two field experiments: (1) a controlled test of direction-of-arrival (DOA) accuracy using a pre-recorded source signal, and (2) an experiment to detect and localize actual animals in their habitat, with a comparison to ground truth gathered from human observations. Although small arrays yield ambiguities from spatial aliasing of high frequency signals, we show that these ambiguities are readily eliminated by proper bearing crossings of the DOAs from several arrays. These results show that the AML source localization algorithm can be used to localize actual animals in their natural habitat, using a platform that is practical to deploy.
Ali, A. M. S. Asgari, T. C. Collier, M. Allen, L. Girod, R. E. Hudson, K. Yao, C. E. Taylor and D. T. Blumstein. J. Sign. Process Syst. 57:415-436. 2009.
Abstract- The ability to monitor interactions between individuals over time can provide us with information on life histories, mating systems, behavioural interactions between individuals and ecological interactions with the environment. Tracking individuals over time has traditionally been a time- and often a cost-intensive exercise, and certain types of animals are particularly hard to monitor. Here we use canonical discriminant analysis (CDA) to identify individual Mexican Ant-thrushes using data extracted with a semi-automated procedure from song recordings. We test the ability of CDA to identify individuals over time, using recordings obtained over a 4-year period. CDA correctly identified songs of 12 individual birds 93.3% of the time from recordings in one year (2009), while including songs of 18 individuals as training data. Predicting singers in one year using recordings from other years indicated some instances of variation, with correct classification in the range of 67–88%; one individual was responsible for the great majority (66%) of classification errors. We produce temporal maps of the study plot showing that considerably more information was provided by identifying individuals from their songs than by ringing and re-sighting colour-ringed individuals. The spatial data show site fidelity in males, but medium-term pair bonds and an apparently large number of female floaters. Recordings can be used to monitor intra- and intersexual interactions of animals, their movements over time, their interactions with the environment and their population dynamics.
Alexander N. G. Kirschel, Martin L. Cody, Zachary Harlow, Vasilis J. Promponas, Edgar E. Vallejo and Charles E. Taylor Ibis, 153:255-268
Abstract- Birds do not always vocalize at random, but may rather divide up soundspace in such a manner that they avoid overlap with the songs of other bird species. In effect, a high degree of communication efficiency can be achieved by many simultaneously active vocalists that finely integrate songs with minimal overlap. We describe this phenomenon from several recordings at our principal study location, near Volcano, California. The most-studied models for conceptualizing and studying such de-synchronized systems come from scheduling algorithms in computer science, where internet protocols involve packets of information that are broadcast widely; any collisions between them will corrupt the colliding packets so that they need to be resent. We have simulated some of these methods that might be appropriate for the soundspace of bird communities. Some features of these de-synchronized depend on specifics of the algorithms used.
Reiji Suzuki, Charles E. Taylor and Martin L. Cody Artificial Life and Robotics, 2012
Abstract- In this abstract, we present a beamforming method for estimating the directions and locations of multiple sources and separating each source’s spectrum from field data collected by a wireless acoustic sensor network. Each acoustic sensor is equipped with four microphones that receive acoustic signals in a timesynchronized manner. The difference in time-of-arrival of proximal signals depends on the source direction with respect to the geometry of the microphone array. We show that by using beamforming in the frequency domain, the locations and Direction-Of-Arrivals (DOAs) of multiple 3D sources may be estimated, and the source spectrum may be separated from the audio data spectra.
Juo-Yu Lee, Zac Harlow, Travis C. Collier, Charles E. Taylor and Kung Yao The 11th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN 2012) April 16-19, 2012 Beijing, China
Abstract- This paper evaluates the performance of a sparse representation-based (SR) classifier for a limited data, bird phrase classification task. The evaluation database contains 32 unique phrases segmented from songs of the Cassin’s Vireo (Vireo cassinii). Spectrographic features were extracted from each phrase-segmented audio file, followed by dimension reduction using principal component analysis (PCA). A performance comparison to the nearest subspace (NS) and support vector machine (SVM) classifiers was conducted. The SR classifier outperforms the NS and SVM classifiers, with a maximum absolute improvement of 3.4% observed when there are only four tokens per phrase in the training set.
Lee Ngee Tan, Kantapon Kaewtip, Martin L. Cody,Charles E. Taylor, Abeer Alwan Proceedings Interspeech 2012, Portland, Oregon, September 9-13, 2012 (in press)
Abstract-Bird songs are acoustic communication signals primarily used in male-male aggression and in male-female attraction. These are often monotonous patterns composed of a few phrases, yet some birds have extremely complex songs with a large phrase repertoire, organized in non-random fashion with discernible patterns. Since structure is typically associated with function, the structures of complex bird songs provide important clues to the evolution of animal communication systems. Here we propose an efficient network-based approach to explore structural design principles of complex bird songs, in which the song networks–transition relationships among different phrases and the related structural measures–are employed. We demonstrate how this approach works with an example using California Thrasher songs, which are sequences of highly varied phrases delivered in succession over several minutes. These songs display two distinct features: a large phrase repertoire with a ‘small-world’ architecture, in which subsets of phrases are highly grouped and linked with a short average path length; and a balanced transition diversity amongst phrases, in which deterministic and non-deterministic transition patterns are moderately mixed. We explore the robustness of this approach with variations in sample size and the amount of noise. Our approach enables a more quantitative study of global and local structural properties of complex bird songs than has been possible to date.
Sasahara K, Cody ML, Cohen D, Taylor CE PLoS ONE 7(9): e44436. doi:10.1371/journal.pone.0044436
Abstract- The performance of a sparse representation-based (SR) classifier for in-set bird phrase verification and classification is studied. The database contains phrases segmented from songs of the Cassin’s Vireo (Vireo cassinii). Each test phrase belongs to one of 33 phrase classes – 32 in-set categories, and 1 collective out-of-set category. Only in-set phrases are used for training. From each phrase segment, spectrographic features were extracted, followed by dimension reduction using PCA. A threshold is applied on the sparsity concentration index (SCI) computed by the SR classifier, for in-set bird phrase verification using a limited number of training tokens (3 – 7) per phrase class. When evaluated against the nearest subspace (NS) and support vector machine (SVM) classifiers using the same framework, the SR classifier has the highest classification accuracy, due to its good performances in both the verification and classification tasks.
Lee Ngee Tan, George Kossan, Martin L. Cody, Charles E. Taylor, Abeer Alwan The 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, May 16-31, 2013
Abstract- A bird phrase segmentation method using entropy-based change point detection is proposed. Spectrograms of bird calls are usually sparse while the background noise is relatively white. Therefore, considering the entropy of a sliding time-frequency block on the spectrogram, the entropy dips when detecting a signal and rises when the signal ends. Rather than applying a hard threshold on the entropy to determine the beginning and ending of a signal, a Bayesian change point detection is used to detect the statistical changes in the entropy sequence. Tests on a database of Cassin’s Vireo (Vireo cassinii), our proposed segmentation method with spectral subtraction or a novel spectral whitening method as the front-end generates more accurate time labels, lower the false alarm rate than the conventional time-domain energy detection method and achieves high phrase classification rate.
Ni-Chun Wang, Ralph Hudson, Lee Ngee Tan, Charles Taylor, Abeer Alwan, Kung Yao The 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, May 16-31, 2013
Abstract- In this paper, we present a novel approach to birdsong phase classification using template-based techniques suitable even for limited training data and noisy environments. The algorithm utilizes dynamic time-warping and prominent (high-energy) timefrequency regions of training spectrograms to derive templates. The algorithm is evaluated on 32 classes of Cassin’s Vireo bird phrases. Using only three training examples per class, our algorithm yields a phrase accuracy of 96.23%, outperforming other classifiers (e.g. 85.21% classification accuracy of SVM). In the presence of additive noise (10 dB SNR degradation), the proposed classifier does not degrade significantly, compared to others.