Download - Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Transcript
Page 1: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

INSTITUTE O

F COM

PUTING

TECHN

OLO

GY

Benchmarking Datacenter and Big Data Systems

Wanling Gao, Zhen Jia, Lei Wang, Yuqing Zhu, Chunjie Luo, Yingjie Shi, Yongqiang He, Shiming Gong, Xiaona Li, Shujie Zhang, Bizhu Qiu, Lixin Zhang, Jianfeng Zhan

http://prof.ict.ac.cn/ICTBench

1

Page 2: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Acknowledgements

This work is supported by the Chinese 973 project (Grant No.2011CB302502), the Hi-Tech Research and Development (863) Program of China (Grant No.2011AA01A203, No.2013AA01A213), the NSFC project (Grant No.60933003, No.61202075) , the BNSFproject (Grant No.4133081), and Huawei funding.

2/

Page 3: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Publications BigDataBench: a Big Data Benchmark Suite from Web Search Engines. Wanling Gao, et

al. The Third Workshop on Architectures and Systems for Big Data (ASBD 2013) in conjunction with ISCA 2013.

Characterizing Data Analysis Workloads in Data Centers. Zhen Jia, et al. 2013 IEEE International Symposium on Workload Characterization (IISWC-2013)

Characterizing OS behavior of Scale-out Data Center Workloads. Chen Zheng et al. Seventh Annual Workshop on the Interaction amongst Virtualization, Operating Systems and Computer Architecture (WIVOSCA 2013). In Conjunction with ISCA 2013.[

Characterization of Real Workloads of Web Search Engines. Huafeng Xi et al. 2011 IEEE International Symposium on Workload Characterization (IISWC-2011).

The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems. Zhen Jia et al. Second workshop of big data benchmarking (WBDB 2012 India) & Lecture Note in Computer Science (LNCS)

CloudRank-D: Benchmarking and Ranking Cloud Computing Systems for Data Processing Applications. Chunjie Luo et al. Front. Comput. Sci. (FCS) 2012, 6(4): 347–362

3/

Page 4: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Content

Background and Motivation

Our ICTBench

Case studies

4/

Page 5: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Question One

Gap between Industry and Academia Longer and longer distance

• Code • Data sets

5/

Page 6: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Question Two

Different benchmark requirements Architecture communities

• Simulation is very slow • Small data and code sets

System communities • Large-scale deployment is valuable.

Users • There are three kind of lies: lies, damn lies, and

benchmarks • Real-world applications

6/

Page 7: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Data Centers in the World

Emerson December 2011 http://www.emersonnetworkpower.com/en-US/About/NewsRoom/Pages/2011DataCenterState.aspx

7/

Page 8: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

State-of-Practice Benchmark Suites

SPEC CPU SPEC Web HPCC PARSEC

TPCC YCSB Gridmix

8/

Page 9: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Current Benchmarks Field Benchmark Name

CPU SPEC CPU

Web server SPEC Web

CMP PARSEC

OLTP TPC-C

OLAP TPC-DS

HPC HPCC, Linpack

NoSQL YCSB

Network httperf

… …

9/

Page 10: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Why a New Benchmark Suite for Datacenter Computing

No benchmark suite covers diversity of data center workloads

State-of-art: CloudSuite Only includes 6 applications according to its

popularity

10/

Page 11: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Memory Level Parallelism(MLP): Simultaneously outstanding cache misses

Why a New Benchmark Suite (Cont’)

MLP

11/

CloudSuite

our benchmark suite

DCBench

Page 12: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Scale-out performance

Why a New Benchmark Suite (Cont’)

1

2

3

4

5

6

1 4 8

sort

grep

wordcount

svm

kmeans

fkmeans

all-pairs

Bayes

HMM

Spe

ed u

p

Cloudsuite Data analysis benchmark

Working nodes

DCBench

12/

Page 13: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Content

Background and Motivation

Our ICTBench

Case studies

13/

Page 14: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

ICTBench Project

Benchmarking Foundation of researches. Bridge

ICTBench: three benchmark suites DCBench: architecture (application, OS, and VM

execution) BigDataBench: System (large-scale big data application) CloudRank: Cloud benchmarks (distributed management)

Project homepage http://prof.ict.ac.cn/ICTBench

14/

Page 15: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

DCBench

DCBench: typical data center workloads Different from scientific computing: FLOPS Cover applications in important domains

• Search engine, electronic commence etc. Each benchmark = a single application

Purposes Architecture system (small-to-medium) researches

15/

Page 16: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

BigDataBench

Characterizing big data applications Not including data-intensive super computing Synthetic data sets varying from 10G~ PB Each benchmark = a single big application.

Purposes large-scale system and architecture researches

An incremental approach Release a start-up benchmark suite

• Workloads in the search engine system

Other important domains

16/

Page 17: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

CloudRank

Cloud computing Elastic resource management Consolidating different workloads

Cloud benchmarks Each benchmark = a group of consolidated data

center workloads. Three benchmarks: services/ data processing/ desktop

Purposes Capacity planning, system evaluation and researches User can customize their benchmarks.

17/

Page 18: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Benchmarking Methodology To decide and rank main application domains

according to a publicly available metric e.g. page view and daily visitors

To single out the main applications from main

applications domains

18/

Page 19: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Top Sites on the Web

More details in http://www.alexa.com/topsites/global;0

40%

25%

15%

5%

15%

Search Engine Social NetworkElectronic Commerce Media StreamingOthers

Top Sites on the Web

19/

Page 20: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Benchmarking Methodology To decide and rank main application domains

according to a publicly available metric e.g. page view and daily visitors

To single out the main applications from main

applications domains

20/

Page 21: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

40%

25%

15%

5%

15%

Search Engine Social NetworkElectronic Commerce Media StreamingOthers

Algorithms in Top Sites: Search Engine

Algorithms used in Search: Pagerank Graph mining Segmentation Feature Reduction Grep Statistical counting Vector calculation sort Recommendation ……

Top Sites on The Web

21/

Page 22: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Our practice

Building a sematic search engine (Chinese) ProfSearch

• Search scientists or professionals • 267083 researchers across 260 universities and institutes • http://prof.ict.ac.cn/

22/

Page 23: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

ProfSearch

• Scrapy

Crawler Workloads

• SVM, Naïve Bayes, K-means, HMM, CRFs, LSA, LDA

Analysis Workloads

• HDFS – Storing unstructured web pages • HIVE – Storing semi-structured intermediate data • MySQL – Storing structured data extracted from the web

Store and Management Workloads

• Sphinx

Web Service Workloads

23/

Page 24: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

40%

25%

15%

5%

15%

Search Engine Social NetworkElectronic Commerce Media StreamingOthers

Algorithms in Top Sites: Social Network

Algorithms used in Social Network: Recommendation Clustering Classification Graph mining Grep Feature Reduction Statistical counting Vector calculation Sort ……

Top Sites on The Web

24/

Page 25: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

40%

25%

15%

5%

15%

Search Engine Social NetworkElectronic Commerce Media StreamingOthers

Algorithms in Top Sites: Electronic Commerce

Algorithms used in electronic commerce: Recommendation Associate rule mining Warehouse operation Clustering Classification Statistical counting Vector calculation ……

Top Sites on The Web

25/

Page 26: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Main Algorithms in Data Centers

Data center algorithms

Basic operation

Association rule mining

Classification

Cluster

Recommendation

Warehouse operation

Feature reduction

Graph mining

Vector calculate

Segmentation

26/

Page 27: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Where Do Those Algorithms Exactly Used in Data Centers ?

Here, lets’ investigate mostly used applications in data centers

The ubiquitous search engine Frequently used recommendation

sub-systems

27/

Page 28: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Main Arithmetic in Common Search Engines (Nutch)

Word Grep Word Count

Segmentation

Sort Classification DecisionTree

BFS

Segmentation Scoring & Sort

Merge Sort

Vector calculate PageRank

28/

Page 29: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Algorithms in Search Engine

graph mining

grep & segmentation

pagerank word count

sort

vector calculation

29/

Page 30: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Representative Algorithms in Search Engine

Algorithms Role in the search engine

graph mining crawl web page

Grep abstracting content from HTML

segmentation word segmentation

pagerank compute the page rank value

Word counting word frequency count

vector calculation document matching

sort document sorting

30/

Page 31: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Algorithms in Recommendation Sub-systems

31/

Page 32: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Representative Algorithms in Recommendation Sub-systems

Algorithms Role in the recommendation sub-systems

Classification classify web pages/user behavior

Frequent pattern growth user log mining

Hidden markov model information extraction

Clustering/similarity analysis clustering web pages/user behavior

Collaborative filtering recommendation

Feature reduction text representation/user behavior representation

Graph mining web link analysis

32/

Page 33: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Overview of DCBench Category Workloads Programmin

g model language source

Basic operation Sort MapReduce Java Hadoop Wordcount MapReduce Java Hadoop Grep MapReduce Java Hadoop

Classification Naïve Bayes MapReduce Java Mahout Support Vector Machine

MapReduce Java Implemented by ourself

Cluster K-means MapReduce Java Mahout MPI C++ IBM PML

Fuzzy k-means MapReduce Java Mahout MPI C++ IBM PML

Recommendation

Item based Collaborative Filtering

MapReduce Java Mahout

Association rule mining

Frequent pattern growth

MapReduce Java Mahout

Segmentation Hidden Markov model MapReduce Java Implemented by ourself

33/

Page 34: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Category Workloads Programming model

language source

Warehouse operation

Database operations MapReduce Java Hive-bench

Feature reduction

Principal Component Analysis

MPI C++ IBM PML

Kernel Principal Component Analysis

MPI C++ IBM PML

Vector calculate Paper similarity analysis

All-Pairs C&C++ Implemented by ourself

Graph mining Breadth-first search MPI C++ Graph500

Pagerank MapReduce Java Mahout Service Search engine C/S Java nutch

Auction C/S Java Rubis

Service Media streaming C/S Java Cloudsuite

Overview of DCBench (Cont’)

34/

Page 35: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Workloads in BigDataBench 1.0 Beta

Analysis Workloads Simple but representative operations

• Sort, Grep, Wordcount Highly recognized algorithms

• Naïve Bayes, SVM

Search Engine Service Workloads Widely deployed services

• Nutch Server

35/

Page 36: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Features of Workloads Workloads Resource

Characteristic Computing Complexity Instructions

Sort I/O bound O(n*lgn) Integer comparison domination

Wordcount CPU bound

O(n) Integer comparison and calculation domination

Grep Hybrid

O(n) Integer comparison

domination

Naïve Bayes /

O(m*n) [m: the length of

dictionary]

Floating-point computation domination

SVM /

O(M*n) [M: the number of support

vectors * dimension]

Floating-point computation domination

Nutch Server I/O & CPU bound

Integer comparison

domination

36/

Page 37: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Variety of Workloads are Included

Workloads

Off-line

Base Operations

I/O bound Sort

CPU bound Wordcount

Hybrid Grep

Machine Learning

Naïve Bayes SVM

On-line

Nutch Server

37/

Page 38: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Methodology of Generating Big Data

To preserve the characteristics of real-world data

Small-scale

Data Big Data

Characteristic Analysis

Expand

Semantic Locality

Temporally

Spatially Word frequency

Word reuse distance

Word distribution in document

38/

Page 39: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Content

Background and Motivation

Our ICTBench

Case studies

39/

Page 40: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Use Case 1: Microarchitecture Characterization

Using DCBench Five nodes cluster

one mater and four slaves(working nodes)

Each node:

40/

Page 41: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Instructions Execution level

DCBench: Data analysis workloads have more app-level instructions Service workloads have higher percentages of kernel-level

instructions

0%10%20%30%40%50%60%70%80%90%

100%

Nai

ve B

ayes

SVM

Grep

Wor

dCou

ntK-

mea

nsFu

zzy

K-m

eans

Page

Rank

Sort

Hive

-ben

chIB

CFHM

M avg

Soft

war

e Te

stin

gM

edia

Str

eam

ing

Data

Ser

ving

Web

Sea

rch

Web

Ser

ving

SPEC

FPSP

ECIN

TSP

ECW

ebHP

CC-C

OM

MHP

CC-D

GEM

MHP

CC-F

FTHP

CC-H

PLHP

CC-P

TRAN

SHP

CC-R

ando

mAc

cess

HPCC

-STR

EAM

kernel application

service

Data analysis

41/

Page 42: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Architecture Block Diagram

42/

Page 43: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Pipeline Stall DC workloads have severe front end stall (i.e. instruction

fetch stall) Services: more RAT(Register Allocation Table) stall Data analysis: more RS(Reservation Station) and ROB(ReOrder Buffer) full

stall

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Instruction fetch_stall Rat_stall load_stall RS_full stall store_stall ROB_full stall

43/

Page 44: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Front End Stall Reasons For DC, High Instruction cache miss and Instruction TLB

miss make the front end inefficiency

0

20

40

60

80

100

L1 I

Cach

e M

iss p

er K

-Inst

ruct

ion

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

ITLB

Pag

e W

alks

per

K-in

stru

ctio

n

44/

Page 45: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

MLC Behaviors DC workloads have more MLC misses than HPC

Data analysis workloads own better locality (less L2 cache misses)

0

20

40

60

80

100

L2 C

ache

mis

ses

per k

-Inst

ruct

ion

Data analysis

Service

HPCC

45/

Page 46: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

LLC Behaviors LLC is good enough for DC workloads

Most L2 cache misses can be satisfied by LLC

0%10%20%30%40%50%60%70%80%90%

100%

Nai

ve B

ayes

SVM

Grep

Wor

dCou

ntK-

mea

nsFu

zzy

K-m

eans

Page

Rank

Sort

Hive

-ben

chIB

CFHM

M avg

Soft

war

e Te

stin

gM

edia

Str

eam

ing

Data

Ser

ving

Web

Sea

rch

Web

Ser

ving

SPEC

FPSP

ECIN

TSP

ECW

ebHP

CC-C

OM

MHP

CC-D

GEM

MHP

CC-F

FTHP

CC-H

PLHP

CC-P

TRAN

SHP

CC-R

ando

mAc

cess

HPCC

-STR

EAM

The

ratio

of L

3 Ca

che

satis

fed

L2

Cach

e M

iss

46/

Page 47: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

DTLB Behaviors DC workloads own more DTLB miss than HPC

Most data analysis workloads have less DTLB miss

0

0.5

1

1.5

2

2.5

Nai

ve B

ayes

SVM

Grep

Wor

dCou

ntK-

mea

nsFu

zzy

K-m

eans

Page

Rank

Sort

Hive

-ben

chIB

CFHM

M avg

Soft

war

e Te

stin

gM

edia

Str

eam

ing

Data

Ser

ving

Web

Sea

rch

Web

Ser

ving

SPEC

FPSP

ECIN

TSP

ECW

ebHP

CC-C

OM

MHP

CC-D

GEM

MHP

CC-F

FTHP

CC-H

PLHP

CC-P

TRAN

SHP

CC-R

ando

mAc

cess

HPCC

-STR

EAMPa

ge W

alks

per

K-In

stru

ctio

n Data analysis Service HPCC

47/

Page 48: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Branch Prediction DC:

Data analysis workloads have pretty good branch behaviors

Service’s branch is hard to predict

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

7.00%

8.00%

Nai

ve B

ayes

SVM

Grep

Wor

dCou

ntK-

mea

nsFu

zzy

K-m

eans

Page

Rank

Sort

Hive

-ben

chIB

CFHM

M avg

Soft

war

e Te

stin

gM

edia

Str

eam

ing

Data

Ser

ving

Web

Sea

rch

Web

Ser

ving

SPEC

FPSP

ECIN

TSP

ECW

ebHP

CC-C

OM

MHP

CC-D

GEM

MHP

CC-F

FTHP

CC-H

PLHP

CC-P

TRAN

SHP

CC-R

ando

mAc

cess

HPCC

-STR

EAMBr

anch

mis

pred

ictio

n ra

tio

Data analysis

Service

HPCC

48/

Page 49: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

DC Workloads Characteristics Data analysis workloads have different behaviors from service

workloads Instruction execution level: service own more kernel level instructions Cache behaviors: data analysis own better locality Branch prediction: service workloads are hard to predict

Front end inefficiency ITLB misses L1 I Cache misses

Diversity workloads are needed Different workloads have different characteristics No one-fit-all solution

49/

Page 50: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Use Case 2: System Evaluation

Using BigDataBench 1.0 Beta Data Scale

10 GB – 2 TB

Hadoop Configuration 1 master 14 slave node

50/

Page 51: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

System Evaluation a threshold for each workload

100MB ~ 1TB System is fully loaded when the data

volume exceeds the threshold

Sort is an exception An inflexion point(10GB ~ 1TB) Data processing rate decreases after

this point Global data access requirements

• I/O and network bottleneck

System performance is dependent on applications and data volumes.

51/

Page 52: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Use Case 3: Architecture Research

Using BigDataBench 1.0 Beta Data Scale

10 GB – 2 TB

Hadoop Configuration 1 master 14 slave node

52/

Page 53: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Use Case 3: Architecture Research

Some micro-architectural events are tending towards stability when the data volume increases to a certain extent

Cache and TLB behaviors have different trends with increasing data volumes for different workloads L1I_miss/1000ins: increase for Sort, decrease for Grep

53/

Page 54: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Search Engine Service Experiments

Same phenomena is observed Micro-architectural events

are tending towards stability when the index size increases to a certain extent

Big data impose challenges

to architecture researches since large-scale simulation is time-consuming

Index size:2GB ~ 8GB Segment size:4.4GB ~ 17.6GB

54/

Page 55: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Conclusion

ICTBench DCBench BigDataBench CloudRank

An open-source project on datacenter and big data benchmarking http://prof.ict.ac.cn/ICTBench

Welcome downloading

55/

Page 56: Benchmarking Datacenter and Big Data Systemsprof.ict.ac.cn/BigDataBench/old/1.0/BigDataBenchmarking...Big Data Benchmarking Workshop Acknowledgements This work is supported by the

Big Data Benchmarking Workshop Big Data Benchmarking Workshop

Thank you! Any questions?

56/