Dataset: Musdb18
PoCM is an extension of FiLM
Since this channel-wise linear combination can also be viewed as a point-wise convolution, we name it PoCM. With inter-channel operations, PoCM can modulate features more flexibly and expressively than FiLM.
Instaed of PoCM, we use Gated PoCM (GPoCM), since GPoCN is robust for source separation task. It is natural to use gated apporach the source separation tasks becuase a sparse latent vector (that contains many near-zero elements) obtained by applying GPoCMs, naturally generates separated result (i.e. more silent than the original).
The authors of cunet tried to manipulate latent space in the encoder,
However, we found that this approach is not practical since it makes the latent space (i.e., the decoder's input feature space) more discontinuous.
Via preliminary experiments, we observed that applying FiLMs in the decoder was consistently better than applying FilMs in the encoder.
I am currently interested in the following areas:
Since I already started writing a paper for this project, I cannot share more information about it in detail, but I am currently working on a personal project called Machine Learning-based Audio Editing for a user-friendly interface. The goal of this research is to create an audio manipulation model equipped with a convenient user interface. I believe this project's result will be widely used in various audio signal processing software such as DAWs, or DAW-plugins, as many users have loved the ML-based applications of izotope. Decreasing the difficulty of audio editing will make more users create, edit, manipulate, and share their audio files.
Although I am not writing songs these days, I used to write and sing my songs.
The experiences give me some inspiration about future research topics as a DAW 'user'.