Test Method V

Submitted by: Submitted by

Views: 44

Words: 21583

Pages: 87

Category: Other Topics

Date Submitted: 02/17/2015 04:30 PM

Report This Essay

Noname manuscript No. (will be inserted by the editor)

Streaming Multiple Aggregations Using Phantoms

Rui Zhang · Nick Koudas · Beng Chin Ooi · Divesh Srivastava · Pu Zhou

the date of receipt and acceptance should be inserted later

Abstract Data streams characterize the high speed and large volume input of a new class of applications such as network monitoring, web content analysis and sensor networks. Among these applications, network monitoring may be the most compelling one – the backbone of a large Internet service provider can generate 1 petabyte of data per day. For many network monitoring tasks such as traffic analysis and statistics collection, aggregation is a primitive operation. Various analytical and statistical needs naturally lead to related aggregate queries. In this article, we address the problem of efficiently computing multiple aggregations over high speed data streams based on the two-level query processing architecture of GS, a real data stream management system deployed in AT&T. We discern that additionally computing and maintaining fine-granularity aggregations (called phantoms) has the benefit of supporting shared computation. Based on a thorough analysis, we propose algorithms to identify the best set of phantoms to maintain and determine allocation of resources (particularly, space) to compute the aggregations. Experiments show that our algorithm achieves

near-optimal computation costs, which outperforms the best adapted algorithm by more than an order of magnitude. Keywords Data stream · Aggregation · Multiple-query optimization · Phantom · GS 1 Introduction The phenomenon of data streams is real. In data stream applications, data arrives very fast and the volume is so high that one may not wish to (or be able to) store all the data; yet, the need exists to query and analyze this data. The quintessential application seems to be the processing of IP traffic data in the network (see, e.g., [4,25]). Routers forward IP packets at high...