This is the demo page for ISMIR 2022 paper: Melody Infilling with User-Provided Structural Context.

Authors: Chih-Pin Tan, Wen-Yu Su and Yi-Hsuan Yang

Demo 1

Demo 1 contains 3 sets of infilling results generated by our model, VLI, and Hsu's work. Besides, We provide the original music with "Real" tag, and the result of "copying the provide structural cotext" with "Copy" tag. In the pianoroll player, the orange notes belong to the past and future context, and the purple notes in the middle belong to the infilling result.

As mentioned in the paper, we notice our model generates results much similar to the provided structural context (over imitation). However, with our subjective listening, our model still performs better than "Copy" on connecting the targets to the contexts, particularly "to the past context".

Song 1

Song 2

Song 3

Demo 2

While preparing for the demo, we found a good strategy to remedy the problem of over-imitating. Instead of considering only the loss of predicting the infilling target as described in our paper, we trained an "improved" model that considers the loss of not only the infilling target but also the past context while training. We found doing so much reduces the imitation problem.

In Demo 2, we provide the results of the improved model with 'Improved' tag. Please note that this is unpublished result not mentioned in the paper at all. (The model evaluated either objective or subjective remains to be the one demonstrated in Demo 1.)

Song 1

Song 2

Song 3

Note: If you are browsing this page using an iPhone, please remember to turn the silent mode off through the switch on the side of your iPhone.