PaperCamp: No Marshmallows, Just Term Papers

Computer Vision for Music Identification

Submitted by: Submitted by Metaphu

Views: 267

Words: 6412

Pages: 26

Category: Science and Technology

Date Submitted: 10/09/2012 12:40 AM

Report This Essay

View Full Essay

Computer Vision for Music Identiﬁcation

Yan Ke1 , Derek Hoiem1 , Rahul Sukthankar1,2 1 School of Computer Science, Carnegie Mellon; 2 Intel Research Pittsburgh {yke,dhoiem,rahuls}@cs.cmu.edu http://www.cs.cmu.edu/˜yke/musicretrieval/

Abstract

We describe how certain tasks in the audio domain can be effectively addressed using computer vision approaches. This paper focuses on the problem of music identiﬁcation, where the goal is to reliably identify a song given a few seconds of noisy audio. Our approach treats the spectrogram of each music clip as a 2-D image and transforms music identiﬁcation into a corrupted sub-image retrieval problem. By employing pairwise boosting on a large set of Viola-Jones features, our system learns compact, discriminative, local descriptors that are amenable to efﬁcient indexing. During the query phase, we retrieve the set of song snippets that locally match the noisy sample and employ geometric veriﬁcation in conjunction with an EM-based “occlusion” model to identify the song that is most consistent with the observed signal. We have implemented our algorithm in a practical system that can quickly and accurately recognize music from short audio samples in the presence of distortions such as poor recording quality and signiﬁcant ambient noise. Our experiments demonstrate that this approach signiﬁcantly outperforms the current state-of-theart in content-based music identiﬁcation.

1. Introduction

At ﬁrst glance, problems in the audio domain may appear to have little relevance to computer vision. The former deals with processing 1-D signals over time while computer vision tends to focus on the interpretation of one or more 2-D images (typically captured from a 3-D scene). However, we believe that certain problems in the audio domain transform very naturally into a form that can be effectively tackled by computer vision techniques. This belief is motivated by the observation that audio researchers commonly employ 2-D time-frequency...

View Full Essay

Computer Vision for Music Identification

More like this