URSing Dataset

We introduce a dataset for facilitating audio-visual analysis of singing performances. The dataset comprises a number of songs where singers’ solo voices are recorded in isolation. For each song, we provide the high-quality audio recordings of the solo singing voice and mix with accompaniments, and the video recording of the upper body of the vocal soloist which contains facial expressions and lip movements. We anticipate that the dataset will be useful for developing audiovisual source separation systems. Note that some of the accompaniment tracks come with the backing vocals, which introduces extra challenges of developing an audio-based singing voice separation system, and encourages researchers to integrate the soloists’ visual information to facilitate the separation process. We also anticipate that the dataset will be useful for other multi-modal information retrieval techniques such as audiovisual expressions analysis, audio-visual correspondence, audiovisual lyrics transcription, etc. A more detailed description and download link is here.

URMP Dataset

We create a dataset for facilitating audio-visual analysis of musical performances. The dataset comprises a number of simple multi-instrument musical pieces assembled from coordinated but separately recorded performances of individual tracks. We anticipate that the dataset will be useful as “ground truth” for evaluating audio-visual techniques for music source separation, transcription, and performance analysis. A more detailed description and sample data is here.


Bach10 dataset is a polyphonic music dataset which can be used for versatile research problems, such as Multi-pitch Estimation and Tracking, Audio-score Alignment, Source Separation, etc. This dataset consists of the audio recordings of each part and the ensemble of ten pieces of four-part J.S. Bach chorales, as well as their MIDI scores, the ground-truth alignment between the audio and the score, the ground-truth pitch values of each part and the ground-truth notes of each piece. The audio recordings of the four parts (Soprano, Alto, Tenor and Bass) of each piece are performed by violin, clarinet, saxophone and bassoon, respectively. A more detailed description is here. Dataset Download

Ground-truth pitches for the PTDB-TUG speech dataset:

The Pitch-Tracking Database from Graz University of Technology (PTDB-TUG) is a speech database for pitch tracking. It contains microphone and laryngograph signals of 20 English native speakers reading the TIMIT corpus. The database also provides reference pitch trajectories which were calculated from the laryngograph signals using the RAPT pitch tracking algorithm [1]. Here, we provide another version of the reference pitch trajectories, calculated using the Praat pitch tracking algorithm [2] on the microphone signals. We found that about 85% of the Praat-generated ground-truth pitches agree with the RAPT-generated ground-truth pitches. Praat-generated Reference Pitch Trajectories Download

[1] D. Talkin, “A robust algorithm for pitch tracking (RAPT),” in Speech Coding and Synthesis (W.B. Kleijn and K.K. Paliwal, eds.), pp. 495–518, Elsevier Science B.V., 1995.
[2] P. Boersma, “Praat, a system for doing phonetics by computer,” Glot International, vol. 5, no. 9/10, pp. 341–345, 2001.

Non-stationary Noise:

For research on speech enhancement, we collected recordings of ten kinds of non-stationary noise: birds, casino, cicadas, computer keyboard, eating chips, frogs, jungle, machine guns, motor cycles, and ocean. The recording of aach noise is between one minute to three minutes long. Dataset Download.


Code for recent projects can be accessed from the Publications page.

Piano Music Transcription:

Please get access to the code here

Sound Search by Vocal Imitation:

This code performs sound search by vocal imitation using a Semi-Siamese Convolutional Network (SCN) described in the paper "Yichi Zhang and Zhiyao Duan, IMINET: convolutional semi-siamese networks for sound search by vocal imitation, in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2017, pp: 304-308.". For a vocal imitation spectrogram comes in, it compares with the spectrogram of each sound candidate in the dataset. Highest similarity imitation-recording pairs are chosen and returned to the user. SYMM-IMINET_WASPAA2017_Code.rar

Multi-pitch Estimation & Streaming:

This code performs Multi-pitch Estimation (MPE) and Multi-pitch Streaming (MPS) on polyphonic music or multi-talker speech. For a piece of polyphonic audio composed of monophonic harmonic sound sources, this program first estimates pitches in each time frame, then it streams these pitch estimates across time into pitch trajectories (streams), each of which corresponds to a sound source.
The MPE and MPS code is also available separately.,

Multi-pitch Estimation & Streaming Evaluation:

This toolbox is for evaluating multi-pitch analysis results. It compares the estimated pitch content with the ground-truth pitch content and outputs some error measures. Help each file to see the details of their measurement.


This code implements the Soundprism online score-informed source separation system.



We organized the 2017 North East Music Information Special Interest Group (NEMISIG) workshop. Check the website out!

Media Coverage

11/30/2023 - News10NBC Investigates: Here’s what happened when we did a deep fake on Berkeley Brean’s voice

11/9/2023 - Audio deepfake detective developing new sleuthing techniques

3/22/2023 - Sound Effects: UR Researcher's Pivot Sparks Duet with Voice Biometrics Company

1/20/2023 - How artificial intelligence may impact the music industry: a WXXI Connections Interview with host Mona Seghatoleslami

6/8/2022 - Play a Bach duet with an AI counterpoint

2/18/2020 - Audio’s Role in Immersive Virtual Reality Entertainment

10/29/2019 - The art and science of sound

4/5/2019 - CAREER awards spur junior researchers along varied paths

11/9/2018 - Wells Award winners excel in engineering and humanities

2/13/2018 - Giving virtual reality a ‘visceral’ sound

2/6/2018 - Building the right mobile app for caregivers of children with FASD

8/3/2017 - With automatic transcription, musicians can save themselves the treble

6/11/2017 - Researchers, engineers team up on app for caregivers facing FASD

5/16/2017 - New system displays song lyrics in real time, multiple languages

3/31/2017 - Unlocking the secrets of blue notes

3/10/2017 - Visiting students apply computational tools to music, mind

3/6/2017 - The mysteries of music—and the key of data

8/25/2016 - Three health analytics projects receive pilot funding