Performance Evaluation for Large Scale Distributed Storage Systems

Submitted by: Submitted by

Views: 302

Words: 7123

Pages: 29

Category: Science and Technology

Date Submitted: 10/28/2013 11:49 PM

Report This Essay

分布式数据库的应用性能比较研究

摘 要

数据库是现代IT、CT应用不可缺少的组成部分。而以往的SQL(结构化查询语言)的RDBMS(关系型数据库管理系统)在Web 2.0 的发展下,面临着大量资料传送和频繁写入的挑战。RDBMS主要使用的是ACID (原子性,一致性,隔离性,持久性) 架构,现在受到越来越多的质疑。本文探讨了SQL和NoSQL(非关系型数据库)的优缺点,同时对目前受关注程度较高的NoSQL数据库进行了性能分析。并在实验中采用YCSB平台和Ubuntu系统进行了NoSQL数据库性能测试。

在试验中,对Hbase分布式数据库进行了多项测试分析,包括存储时间测试、实际容量测试、Workload测试和Hit Radio测试。通过这些测试发现,HBase在大多数试验中表现出良好的性能。同时也发现,不同的数据库由于采用的理论不同,拥有不同的优劣势。因此,在实际使用中,需要根据具体的业务情况,选择合适的数据库。

关键词:非关系型数据库,分布式数据库,性能测试,HBase, NoSQL, YCSB

Performance Evaluation for Large Scale Distributed Storage Systems

Abstract

Database is the important part of modern application and our life now. Database systems moved to frequently transmit and operate lots of data from SQL to RDBMS. In the proposed article, SQL and NoSQL systems are compared and YCSB platform with workloads and the Ubuntu system are used to run the performance experiment over the most popular NoSQL system.

In addition, the performance of the insert time, actual size and hit radio in HBase were also analyzed. HBase perform well in most cases based on the results. But in the meantime, each system has their advantages and disadvantages for they have different theorem, saving mode and strategy etc. Therefore, before choosing the appropriate system, analyzing the systems based on actual situation is necessary.

Key Words: HBase; NoSQL; YCSB

Contents

Chapter 1 Introduction 1

1.1 Background 1

1.2 Previous Research 2

1.3 Research Purpose 3

1.4 Methodology 4

1.4.1 Method 4

1.4.2 Research Design 5

1.5 Chapter 5

Chapter 2 Overview 6

2.1 ACID, BASE and CAP Theorem 6

2.1.1 ACID Theorem 6

2.1.2 BASE Theorem 7

2.1.3 CAP Theorem 7

2.1.4 Comparison of SQL and NoSQL 9

2.2 NoSQL Database 10

2.3 HBase: an important NoSQL database 11

2.3.1 The Features of HBase Database 12

2.3.2 HBase’s Architecture Overview 12

2.3.3 HBase’s Data Model: Column-Oriented 14

2.4 YCSB Benchmark 15...