This project is in collaboration with the
ByteDance AI Lab.
This project is partially supported by the National Science Foundation under grant No. 1741472, titled "BIGDATA: F: Audio-Visual Scene Understanding". |
Bochen Li, Yuxuan Wang, and Zhiyao Duan, Audiovisual Singing Voice Separation, Transactions of the International Society for Music Information Retrieval, 4(1), ppp.195–209, 2021. DOI: http://doi.org/10.5334/tismir.108. http://doi.org/10.5334/tismir.108 <pdf>
Vocal separation results from the URSing dataset, which was recorded in a sound booth with different scenarios as the training/validation data.
Original mixture |
Groud-truth solo vocal |
Result from audio-based method |
Result from proposed audiovisual method |
Evaluations on a capella songs downloaded from YouTube.
Original mixture |
Separated vocal from audio-based method |
Separated solo vocal from proposed method |
Evaluations on randomly mixed samples (same scenario as the training/validation data).
Original mixture |
Groud-truth solo vocal |
Result from audio-based method |
Result from proposed audiovisual method |