Ssssssssss

Submitted by: Submitted by

Views: 337

Words: 7477

Pages: 30

Category: Business and Industry

Date Submitted: 01/03/2013 05:47 AM

Report This Essay

A Markov Random Field Model for Term Dependencies

Donald Metzler

metzler@cs.umass.edu

W. Bruce Croft

croft@cs.umass.edu

Center for Intelligent Information Retrieval

Department of Computer Science

University of Massachusetts

Amherst, MA 01003

ABSTRACT

This paper develops a general, formal framework for modeling term dependencies via Markov random fields. The model

allows for arbitrary text features to be incorporated as evidence. In particular, we make use of features based on

occurrences of single terms, ordered phrases, and unordered

phrases. We explore full independence, sequential dependence, and full dependence variants of the model. A novel

approach is developed to train the model that directly maximizes the mean average precision rather than maximizing

the likelihood of the training data. Ad hoc retrieval experiments are presented on several newswire and web collections,

including the GOV2 collection used at the TREC 2004 Terabyte Track. The results show significant improvements are

possible by modeling dependencies, especially on the larger

web collections.

Categories and Subject Descriptors

H.3.3 [Information Storage and Retrieval]: Information

Search and Retrieval

General Terms

Algorithms, Experimentation, Theory

Keywords

Information retrieval, term dependence, phrases, Markov

random fields

1. INTRODUCTION

There is a rich history of statistical models for information

retrieval, including the binary independence model (BIM),

language modeling [16], inference network model [23], and

the divergence from randomness model [1], amongst others [4]. It is well known that dependencies exist between

terms in a collection of text. For example, within a SIGIR

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on...

More like this