Nodb

Submitted by: Submitted by

Views: 10

Words: 13157

Pages: 53

Category: Other Topics

Date Submitted: 03/20/2016 10:53 PM

Report This Essay

NoDB: Efficient Query Execution on Raw Data Files

Ioannis Alagiannis

Renata Borovica

Miguel Branco

Stratos Idreos‡ Anastasia Ailamaki

EPFL, Switzerland

{ioannis.alagiannis, renata.borovica, miguel.branco, anastasia.ailamaki}@epfl.ch

‡ CWI, Amsterdam

stratos.idreos@cwi.nl

ABSTRACT

1.

As data collections become larger and larger, data loading evolves

to a major bottleneck. Many applications already avoid using database systems, e.g., scientific data analysis and social networks, due

to the complexity and the increased data-to-query time. For such

applications data collections keep growing fast, even on a daily basis, and we are already in the era of data deluge where we have

much more data than what we can move, store, let alone analyze.

Our contribution in this paper is the design and roadmap of a

new paradigm in database systems, called NoDB, which do not require data loading while still maintaining the whole feature set of

a modern database system. In particular, we show how to make

raw data files a first-class citizen, fully integrated with the query

engine. Through our design and lessons learned by implementing

the NoDB philosophy over a modern DBMS, we discuss the fundamental limitations as well as the strong opportunities that such a

research path brings. We identify performance bottlenecks specific

for in situ processing, namely the repeated parsing and tokenizing

overhead and the expensive data type conversion costs. To address

these problems, we introduce an adaptive indexing mechanism that

maintains positional information to provide efficient access to raw

data files, together with a flexible caching structure.

Our implementation over PostgreSQL, called PostgresRaw, is

able to avoid the loading cost completely, while matching the query

performance of plain PostgreSQL and even outperforming it in

many cases. We conclude that NoDB systems are feasible to design and implement over modern database architectures, bringing...