I studied for my MS in condensed matter physics at the University of Trieste. Working on my thesis (mainly focused on the intercalation of insulants between graphene and metal substrates), I learned many surface science experimental techniques, like XPS and STM. In addition to those, in my curriculum, programming has always been present (starting with FORTRAN), mainly applied to numerical simulations of physical systems and massive quantities of data analysis. The other activity that takes up my time is music, and so the focus for my programming has slowly shifted in that direction: I ended up exploring different possibilities for sound and music generation and analysis. Code has become for me a new instrument to make music, and to gain a deeper understanding of structures and patterns hidden in it.
Project: Automated Pitch Transcription
My project consisted of the development of an algorithm to detect the notes played in an audio recording in a non-supervised way. The algorithm tries to factorize the spectrogram as a product of a dictionary matrix (consisting of the typical short time spectra of the notes played) and an intensity matrix.
Even an untrained human is able to solve (at least partially) the problem of transcription of polyphonic music with the correct tools. Already the correlations obtained from looking at the spectrogram while listening to the music are a very good cue for the transcription. These correlations are not that obvious for a computer, though. One of the most used techniques used in literature for automatic transcription of polyphonic music is non-negative matrix factorization: the spectrogram is a matrix V that can be approximated as a matrix product WH, where W will be a dictionary of basic spectra (we could implement the constraint where these spectra should be harmonic, since we are only interested in melodic information), and H will be the intensities of each pitch as a function of time. The whole point of this factorization is to shift from a set of basis functions that are sinusoids to a different set where they are harmonically rich functions, and the corresponding intensities are linked to pitches of sounds and not frequencies. The values for the matrices of the factorization will be calculated with a gradient descent algorithm to which some constraints could be applied (like smoothness in time and harmonicity of the basis spectra).
The output of this algorithm (the matrix H, representing the intensity of each pitch as a function of time) will be then processed to get a sequence of note on and note off events as a function of pitch and times.
An additional layer that could be possibly added is another Markov model that stores the transition probabilities between notes. I expect though that the transition matrix of such a model trained on different styles of music (I think different composers, as well) will have very different output.
The final result (a sequence of note on and off events at specific times) will be saved as a MIDI file without a specific time signature.
In this project, the main goal is to get pitch and time information of the notes' events. One potential development would be applying to the result beat and key detection with probabilistic models trained on a big corpus of data (it is not possible to write a proper score without this information). Using that, it would be possible to conceive an application based on supervised machine learning that abstracts all mistakes and interpretations from the actual audio recording and produces the simplest score that approximates the music played. From that, the development of a minisite that automatically does music transcription is a feasible objective.