Language Independent Extractive Summarization

Submitted by: Submitted by

Views: 42

Words: 2607

Pages: 11

Category: Science and Technology

Date Submitted: 12/01/2014 12:05 PM

Report This Essay

Language Independent Extractive Summarization

Rada Mihalcea Department of Computer Science and Engineering University of North Texas rada@cs.unt.edu

Abstract

We demonstrate TextRank – a system for unsupervised extractive summarization that relies on the application of iterative graphbased ranking algorithms to graphs encoding the cohesive structure of a text. An important characteristic of the system is that it does not rely on any language-specific knowledge resources or any manually constructed training data, and thus it is highly portable to new languages or domains.

cally designed to address this problem, by using an extractive summarization technique that does not require any training data or any language-specific knowledge sources. TextRank can be effectively applied to the summarization of documents in different languages without any modifications of the algorithm and without any requirements for additional data. Moreover, results from experiments performed on standard data sets have demonstrated that the performance of TextRank is competitive with that of some of the best summarization systems available today.

2 Extractive Summarization

Ranking algorithms, such as Kleinberg’s HIT S algorithm (Kleinberg, 1999) or Google’s P ageRank (Brin and Page, 1998) have been traditionally and successfully used in Web-link analysis, social networks, and more recently in text processing applications. In short, a graph-based ranking algorithm is a way of deciding on the importance of a vertex within a graph, by taking into account global information recursively computed from the entire graph, rather than relying only on local vertex-specific information. The basic idea implemented by the ranking model is that of voting or recommendation. When one vertex links to another one, it is basically casting a vote for that other vertex. The higher the number of votes that are cast for a vertex, the higher the importance of the vertex. These graph ranking algorithms are...