Submitted by: Submitted by shaikhsalman
Views: 337
Words: 7477
Pages: 30
Category: Business and Industry
Date Submitted: 01/03/2013 05:47 AM
A Markov Random Field Model for Term Dependencies
Donald Metzler
metzler@cs.umass.edu
W. Bruce Croft
croft@cs.umass.edu
Center for Intelligent Information Retrieval
Department of Computer Science
University of Massachusetts
Amherst, MA 01003
ABSTRACT
This paper develops a general, formal framework for modeling term dependencies via Markov random fields. The model
allows for arbitrary text features to be incorporated as evidence. In particular, we make use of features based on
occurrences of single terms, ordered phrases, and unordered
phrases. We explore full independence, sequential dependence, and full dependence variants of the model. A novel
approach is developed to train the model that directly maximizes the mean average precision rather than maximizing
the likelihood of the training data. Ad hoc retrieval experiments are presented on several newswire and web collections,
including the GOV2 collection used at the TREC 2004 Terabyte Track. The results show significant improvements are
possible by modeling dependencies, especially on the larger
web collections.
Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information
Search and Retrieval
General Terms
Algorithms, Experimentation, Theory
Keywords
Information retrieval, term dependence, phrases, Markov
random fields
1. INTRODUCTION
There is a rich history of statistical models for information
retrieval, including the binary independence model (BIM),
language modeling [16], inference network model [23], and
the divergence from randomness model [1], amongst others [4]. It is well known that dependencies exist between
terms in a collection of text. For example, within a SIGIR
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on...