Sound Search by Vocal Imitation

Dataset Collection

So far we have collected three datasets for sound search by vocal imitation projects. Details are described as the following.

VocalSketch Data Set

We have done a pilot data collection of vocal imitations of 240 audio concepts and associated vocal imitations collected using Mechanical Turk: VocalSketch Data Set v1.0.4[2]. Ours is the only such data set we are aware of. Audio concepts are defined by both a sound label and a short sound recording of a specific instance of the audio concept. Our pilot data contains four kinds of audio concepts: everyday sounds, acoustic music instruments, commercial music synthesizers, and sounds from a single music synthesizer. For each concept, we collected 10 crowd-sourced vocal imitations, for a total of 2400 imitations of the 240 concepts. Imitations were produced in response to one of two stimuli.

Vocal Imitation Set

Inspired by Google's Audio Set ontology, the number of sound concepts can be greatly expanded. Recently, we released a much larger dataset called the VocalImitationSet. It is a collection of crowd-sourced vocal imitations of a large set of diverse sounds collected from Freesound (https://freesound.org/), which were curated based on Google's AudioSet ontology (https://research.google.com/audioset/). We expect that this dataset will help research communities obtain better understanding of human's vocal imitation and build a machine understand the imitations as humans do.

VimSketch Dataset

VimSketch Dataset combines the above datasets to provide even richer sound concepts and imitations. It can be downloaded here: https://zenodo.org/record/2596911#.XTOZ_jBKjIV.

FreeSoundIdeas Dataset

For subjective evaluation purpose, we have created a new dataset called \emph{FreeSoundIdeas} that has no audio overlap with the VimSketch Dataset that we previously created. The sounds of this dataset are from Freesound.org, while we reference sound descriptions and the structure of how sounds are organized in Sound Ideas to form the FreeSoundIdeas ontology. Specifically, the ontology has a multi-level tree structure and is derived from two libraries of Sound Ideas: "General Series 6000 - Sound Effect Library" and "Series 8000 Science Fiction Sound Effects Library", where the former has more than 7,500 sound effects covering a large scope, and the latter has 534 sound effects created by Hollywood's best science fiction sound designers. We copied the indexing keywords from 837 relatively distinct sounds in these two libraries and formed eight categories of sound concepts, namely, Animal (ANI), Human (HUM), Music (MSC), Natural Sounds (NTR), Office and Home (OFF), Synthesizers (SYN), Tools and Miscellaneous (TOL), and Transportation (TRA). We do not use sounds from Sound Ideas because of copyright issues, instead, we use keywords of each sound track from the abovementioned ontology as queries to search similar sound from Freesound.org. For each query, the first 5 to 30 returned sounds from Freesound.org are downloaded and stored as elements for our FreeSoundIdeas dataset. Keywords of these sounds from Freesound.org instead of the queried keywords to find these sounds are stored together with these sounds for a more accurate description. It is noted that this FreeSoundIdeas dataset has no overlap with the VimSketch dataset which is used to train the search algorithm for Vroom!. In total the FreeSoundIdeas dataset includes 3,602 sounds. There are 230, 300, 521, 86, 819, 762, and 660 sound concepts in the category of ANI, HUM, MSC, NTR, OFF, SYN, TOL, and TRA, respectively. It can be downloaded here: https://rochester.box.com/s/7999w8ha2shrji45h2x764vwo4gtedek>.



Last updated .