This project is supported by the National Science Foundation under grant No. 1846184, titled " CAREER: Human-Computer Collaborative Music Making". |
Excited to share our latest demo! Try it here 🎨🎶
We proposed a system for interactive musical inpainting. This system allows users to draw pitch and optional note density curves to guide the pitch contour and rhythm of the generated melodies.
The core of our system is based in a new melody disentanglement scheme that decomposes a melody to the following factors :
\begin{equation} \begin{aligned} Melody \ = \ &relative \ \ pitch \\ \oplus \ &pitch \ \ offset\\ \oplus \ &relative \ \ rhythm\\ \oplus \ &rhythm \ \ offset \\ \oplus \ &context. \end{aligned} \end{equation}
The user's input curves control the relative pitch (green curve) and the relative rhythm (blue curve) factors. We also provide two sliders for the user to control the pitch and rhythm offsets. The remaining context factor is estimated from the surrounding measures.
When the first 4 factors of the equation are 'precise', then the context factor is not needed to infer the melody. The more 'vagueness' we add to the first 4 factors, the more important the role context becomes in reconstructing the melody. Thus, by controling this tradeoff we can determine the amount of influence that the user's curves have over the final result. Please check our paper for more details
Previous work on music inpainting either does not support user interaction, or assumes users to know some music theory in order to achieve musically meaningful results. Our goal is to design a system that engages users into the music inpainting process without assuming a musical background.
A system like this:
The model we propose is a variational auto-encoder (VAE) based architecture. As shown on the left side of the figure, we have two encoders, \(Q_{curve}\) and \(Q_{context}\), to process the user's input and the context measures, respectively. The encoders' output forms the VAE's latent vector is \(Z = [Z_{c} ; Z_{rp} ; Z_{rr}]\), i.e., a concatenation of three vectors, where \(Z_{c}\) stores the context information, \(Z_{rp}\) stores the relative pitch information, and \(Z_{rr}\) stores information about the relative rhythm. On the right side, we use three decoders, \(P_{rp}\), \(P_r\), and \(P_{midi}\), with different intermediate losses to achieve the desired disentanglement of the three latent variables.
We conduct both objective and subjective evaluations to assess the feasibility and usability of the proposed system for drawing-based melody inpainting. We compare it with two closely related methods as baselines under the same user interface and we show that our model achieves better performance. For the objective evaluations and details on the baseline algorithms please check our paper.
For this experiment, we developed a simple web-based interface which consists of a canvas area for a user to draw the curves with their mouse, and three same display areas for showing the same context measures and the generated results for each method (ours and two baselines). The assignment of methods to the display areas is randoized for every trial to reduce biases toward a specific method
We recruited 23 participants with various musical backgrounds and we got 112 trials in total(each trial consists of drawing a curve and generating the melodies using each of the 3 methods). We asked the participants to answer the following questions:
An example of 4 different input curves using the same context measures.
An example of one hand-drawn pitch curve working with three different context measures to generate different but context matching melodies.
Two examples obtained during the user studies. The first line is the output of the proposed VAE model and the next two contain the baselines' ouputs.
Many more examples from all methods can be found here
The code for this project is available on GitHub.
Christodoulos Benetatos and Zhiyao Duan, Draw and Listen! A Sketch-based System for Music Inpainting, in Transactions of the International Society for Music Information Retrieval (TISMIR), 2022.