Zhiyao Duan

Music Similarity Measure and Recommendation

I worked on this project with Lie Lu in the Speech Group at Microsoft Research Asia (MSRA) from July 2007 to April 2008 when I was a research intern there.

Current music recommendation systems mainly rely on music reviews and listeners' feedbacks, however, there are numerous songs that have not been reviewed or even listened to. If we can calculate the similarity between songs directly from the raw audio, then fully automated recommendation will be possible for all songs including the new ones and the unpopular ones. One fundamental question is how to define the similarity measure. Music can be similar (and dissimilar) in different aspects, e.g. melody, instrumentation, genres, etc. How to model each aspect and combine them together is a challenging problem. In this project, we define the similarity measure between songs in several aspects, including genre, instrument, vocal, tempo, emotion, rhythm, tonality, etc. Each of these aspects represents an important factor that impacts people's judgment of similarity among songs. We model each aspect individually and then combined together to calculate the similarity matrix. More importantly, we consider the relations between different aspects.

Tonality Classification

I worked on this project with Lie Lu in the Speech Group at Microsoft Research Asia (MSRA) from November 2007 to January 2008 when I was a research intern there. This was a part of the "Music Similarity Measure and Recommendation" project mentioned above.

Traditional tonality mode (major or minor) classification or audio key finding algorithms often rely on detailed annotations of key names of the training songs. However, unlike classical music whose keys are usually explicitly labeled in their titles, key annotation for numerous popular music requires much expert knowledge and immense labor. In contrast, the mode of each song is much easier to label. However, with only modes labeled, traditional approaches to modes modeling cannot be directly applied, due to the lack of the reference point to transpose the chroma features with different keys. This work is to propose an approach for tonality classification of popular music without tonic annotations on the training data. In this work, we proposed an alignment approach to transpose chroma features within each mode to a reference (but unknown) tonic. Then several methods, including Single Profile Correlation (SPC), Multiple Profile Correlation (MPC) and Support Vector Machine (SVM), were exploited to address mode learning and classification.

Excitation Signal Extraction of Guitar Tones

I worked on this project with Nelson Lee and Prof. Julius Smith in the Center for Computer Research in Music and Acoustics (CCRMA) at Stanford University from April 2007 to June 2007 when I was a visiting researcher at CCRMA.

This work was concerned with extracting excitation signals from recorded plucked string sounds from an acoustic guitar, for the use of tone synthesis. The proposed method was based on removal of spectral peaks, followed by statistical interpolation to reconstruct the excitation spectrum in frequency intervals occluded by partial overtones. Experimental results on synthesized and real tones showed that it outperformed previous methods in removing tonal components in the resulting excitation signal while maintaining a noise-burst like quality.

Research

Publications

Even Older Research

Music Similarity Measure and Recommendation

Tonality Classification

Excitation Signal Extraction of Guitar Tones