Bayesian audio-to-score alignment based on joint inference of timbre, volume, tempo, and note onset timings

作者: Akira Maezawa , Hiroshi G. Okuno

DOI: 10.1162/COMJ_A_00286

关键词: Joint (audio engineering)ReverberationBayesian probabilityMusical instrumentComputer scienceInferenceSpeech recognitionTimbreAudio signalBayesian inference

摘要: This article presents an offline method for aligning audio signal to individual instrumental parts constituting a musical score. The proposed is based on fitting multiple hidden semi-Markov models HSMMs the observed signal. emission probability of each state HSMM described using latent harmonic allocation LHA, Bayesian model sound mixture. Each corresponds one instrument's part, and duration conditioned linear dynamics system LDS tempo model. Variational inference used jointly infer HSMM, LDS. We evaluate capability align its score, under reverberation, structural variations, fluctuations in onset timing among different parts.

参考文章(34)
Christopher Raphael, A Hybrid Graphical Model for Aligning Polyphonic Audio with Musical Scores. international symposium/conference on music information retrieval. ,(2004)
Gerhard Widmer, Andreas Arzt, Simon Dixon, Automatic Page Turning for Musicians via Real-Time Machine Listening european conference on artificial intelligence. pp. 241- 245 ,(2008) , 10.3233/978-1-58603-891-5-241
Daniel P. W. Ellis, Johanna Devaney, Handling Asynchrony in Audio-Score Alignment international computer music conference. ,vol. 2009, pp. 29- 32 ,(2009) , 10.7916/D81V5Q9K
Sebastian Ewert, Meinard Müller, Score-informed Voice Separation for Piano Recordings international symposium/conference on music information retrieval. pp. 245- 250 ,(2011)
Akira Maezawa, Katsutoshi Itoyama, Kazuyoshi Yoshii, Hiroshi G. Okuno, Nonparametric Bayesian dereverberation of power spectrograms based on infinite-order autoregressive processes IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 22, pp. 1918- 1930 ,(2014) , 10.1109/TASLP.2014.2355772
Katsutoshi Itoyama, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno, Integration and Adaptation of Harmonic and Inharmonic Models for Separating Polyphonic Musical Signals international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 57- 60 ,(2007) , 10.1109/ICASSP.2007.366615
Takuma Otsuka, Kazuhiro Nakadai, Toru Takahashi, Tetsuya Ogata, HiroshiG Okuno, Real-time audio-to-score alignment using particle filter for coplayer music robots EURASIP Journal on Advances in Signal Processing. ,vol. 2011, pp. 384651- ,(2011) , 10.1155/2011/384651
K. Yoshii, M. Goto, A Nonparametric Bayesian Multipitch Analyzer Based on Infinite Latent Harmonic Allocation IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 20, pp. 717- 730 ,(2012) , 10.1109/TASL.2011.2164530
Sebastian Ewert, Meinard Muller, Peter Grosche, High resolution audio synchronization using chroma onset features international conference on acoustics, speech, and signal processing. pp. 1869- 1872 ,(2009) , 10.1109/ICASSP.2009.4959972