Benchmarking Datacenter and Big Data ... Big Data Benchmarking Workshop Acknowledgements This work...

Click here to load reader

  • date post

    14-Oct-2020
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of Benchmarking Datacenter and Big Data ... Big Data Benchmarking Workshop Acknowledgements This work...

  • IN STITUTE O

    F CO M

    PUTIN G

    TECH N

    O LO

    G Y

    Benchmarking Datacenter and Big Data Systems

    Wanling Gao, Zhen Jia, Lei Wang, Yuqing Zhu, Chunjie Luo, Yingjie Shi, Yongqiang He, Shiming Gong, Xiaona Li, Shujie Zhang, Bizhu Qiu, Lixin Zhang, Jianfeng Zhan

    http://prof.ict.ac.cn/ICTBench

    1

    http://prof.ict.ac.cn/ICTBench

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    Acknowledgements

     This work is supported by the Chinese 973 project (Grant No.2011CB302502), the Hi- Tech Research and Development (863) Program of China (Grant No.2011AA01A203, No.2013AA01A213), the NSFC project (Grant No.60933003, No.61202075) , the BNSFproject (Grant No.4133081), and Huawei funding.

    2/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    Publications  BigDataBench: a Big Data Benchmark Suite from Web Search Engines. Wanling Gao, et

    al. The Third Workshop on Architectures and Systems for Big Data (ASBD 2013) in conjunction with ISCA 2013.

     Characterizing Data Analysis Workloads in Data Centers. Zhen Jia, et al. 2013 IEEE International Symposium on Workload Characterization (IISWC-2013)

     Characterizing OS behavior of Scale-out Data Center Workloads. Chen Zheng et al. Seventh Annual Workshop on the Interaction amongst Virtualization, Operating Systems and Computer Architecture (WIVOSCA 2013). In Conjunction with ISCA 2013.[

     Characterization of Real Workloads of Web Search Engines. Huafeng Xi et al. 2011 IEEE International Symposium on Workload Characterization (IISWC-2011).

     The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems. Zhen Jia et al. Second workshop of big data benchmarking (WBDB 2012 India) & Lecture Note in Computer Science (LNCS)

     CloudRank-D: Benchmarking and Ranking Cloud Computing Systems for Data Processing Applications. Chunjie Luo et al. Front. Comput. Sci. (FCS) 2012, 6(4): 347–362

    3/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    Content

     Background and Motivation

     Our ICTBench

     Case studies

    4/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    Question One

     Gap between Industry and Academia  Longer and longer distance

    • Code • Data sets

    5/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    Question Two

     Different benchmark requirements  Architecture communities

    • Simulation is very slow • Small data and code sets

     System communities • Large-scale deployment is valuable.

     Users • There are three kind of lies: lies, damn lies, and

    benchmarks • Real-world applications

    6/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    Data Centers in the World

    Emerson December 2011 http://www.emersonnetworkpower.com/en-US/About/NewsRoom/Pages/2011DataCenterState.aspx

    7/

    http://www.emersonnetworkpower.com/en-US/About/NewsRoom/Pages/2011DataCenterState.aspx http://www.emersonnetworkpower.com/en-US/About/NewsRoom/Pages/2011DataCenterState.aspx http://www.emersonnetworkpower.com/en-US/About/NewsRoom/Pages/2011DataCenterState.aspx

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    State-of-Practice Benchmark Suites

    SPEC CPU SPEC Web HPCC PARSEC

    TPCC YCSB Gridmix

    8/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    Current Benchmarks Field Benchmark Name

    CPU SPEC CPU

    Web server SPEC Web

    CMP PARSEC

    OLTP TPC-C

    OLAP TPC-DS

    HPC HPCC, Linpack

    NoSQL YCSB

    Network httperf

    … …

    9/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    Why a New Benchmark Suite for Datacenter Computing

     No benchmark suite covers diversity of data center workloads

     State-of-art: CloudSuite  Only includes 6 applications according to its

    popularity

    10/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

     Memory Level Parallelism(MLP):  Simultaneously outstanding cache misses

    Why a New Benchmark Suite (Cont’)

    MLP

    11/

    CloudSuite

    our benchmark suite

    DCBench

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

     Scale-out performance

    Why a New Benchmark Suite (Cont’)

    1

    2

    3

    4

    5

    6

    1 4 8

    sort

    grep

    wordcount

    svm

    kmeans

    fkmeans

    all-pairs

    Bayes

    HMM

    S pe

    ed u

    p

    Cloudsuite Data analysis benchmark

    Working nodes

    DCBench

    12/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    Content

     Background and Motivation

     Our ICTBench

     Case studies

    13/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    ICTBench Project

     Benchmarking  Foundation of researches.  Bridge

     ICTBench: three benchmark suites  DCBench: architecture (application, OS, and VM

    execution)  BigDataBench: System (large-scale big data application)  CloudRank: Cloud benchmarks (distributed management)

     Project homepage  http://prof.ict.ac.cn/ICTBench

    14/

    http://prof.ict.ac.cn/ICTBench

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    DCBench

     DCBench: typical data center workloads  Different from scientific computing: FLOPS  Cover applications in important domains

    • Search engine, electronic commence etc.  Each benchmark = a single application

     Purposes  Architecture  system (small-to-medium) researches

    15/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    BigDataBench

     Characterizing big data applications  Not including data-intensive super computing  Synthetic data sets varying from 10G~ PB  Each benchmark = a single big application.

     Purposes  large-scale system and architecture researches

     An incremental approach  Release a start-up benchmark suite

    • Workloads in the search engine system  Other important domains

    16/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    CloudRank

     Cloud computing  Elastic resource management  Consolidating different workloads

     Cloud benchmarks  Each benchmark = a group of consolidated data

    center workloads.  Three benchmarks: services/ data processing/ desktop

     Purposes  Capacity planning, system evaluation and researches  User can customize their benchmarks.

    17/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    Benchmarking Methodology  To decide and rank main application domains

    according to a publicly available metric  e.g. page view and daily visitors

     To single out the main applications from main

    applications domains

    18/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    Top Sites on the Web

    More details in http://www.alexa.com/topsites/global;0

    40%

    25%

    15%

    5%

    15%

    Search Engine Social Network Electronic Commerce Media Streaming Others

    Top Sites on the Web

    19/

    http://www.alexa.com/topsites/global;0

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    Benchmarking Methodology  To decide and rank main application domains

    according to a publicly available metric  e.g. page view and daily visitors

     To single out the main applications from main

    applications domains

    20/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    40%

    25%

    15%

    5%

    15%

    Search Engine Social Network Electronic Commerce Media Streaming Others

    Algorithms in Top Sites: Search Engine

    Algorithms used in Search: Pagerank Graph mining Segmentation Feature Reduction Grep Statistical counting Vector calculation sort Recommendation ……

    Top Sites on The Web

    21/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    Our practice

     Building a sematic search engine (Chinese)  ProfSearch

    • Search scientists or professionals • 267083 researchers across 260 universities and institutes • http://prof.ict.ac.cn/

    22/

  • Big Data Benchmarking Workshop Big Data Benchmarking Workshop

    ProfSearch

    • Scrapy

    Crawler Workloads

    • SVM, Naïve Bayes, K-means, HMM, CRFs, LSA, LDA

    Analysis Workloads

    • HDFS – Storing unstructured web pages • HIVE – Storing semi-structured intermediate d