Download - HTAPBench: Hybrid Transactional and Analytical …...HTAPBench: Hybrid Transactional and Analytical Processing Benchmark Fábio Coelho INESC TEC & U.Minho Braga, Portugal [email protected]

HTAPBench: Hybrid Transactional and AnalyticalProcessing Benchmark

Fábio Coelho∗

INESC TEC & U.MinhoBraga, Portugal

[email protected]

João Paulo†


[email protected]

Ricardo VilaçaLeanXcale SLMadrid, Spain

[email protected]é Pereira


[email protected]

Rui OliveiraINESC TEC & U.Minho

Braga, [email protected]

ABSTRACTThe increasing demand for real-time analytics requires thefusion of Transactional (OLTP) and Analytical (OLAP) sys-tems, eschewing ETL processes and introducing a plethoraof proposals for the so-called Hybrid Analytical and Trans-actional Processing (HTAP) systems.

Unfortunately, current benchmarking approaches are notable to comprehensively produce a unified metric from theassessment of an HTAP system. The evaluation of bothengine types is done separately, leading to the use of disjointsets of benchmarks such as TPC-C or TPC-H.

In this paper we propose a new benchmark, HTAPBench,providing a unified metric for HTAP systems geared towardthe execution of constantly increasing OLAP requests lim-ited by an admissible impact on OLTP performance. Toachieve this, a load balancer within HTAPBench regulatesthe coexistence of OLTP and OLAP workloads, proposinga method for the generation of both new data and requests,so that OLAP requests over freshly modified data are com-parable across runs.

We demonstrate the merit of our approach by validatingit with different types of systems: OLTP, OLAP and HTAP;showing that the benchmark is able to highlight the differ-ences between them, while producing queries with compara-ble complexity across experiments with negligible variability.

CCS Concepts•Information systems→Database performance eval-uation; Relational parallel and distributed DBMSs;

∗Funded by source 1 (Acknowledgments).

†Funded by source 2 (Acknowledgments).

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

ICPE’17, April 22-26, 2017, L’Aquila, Italyc© 2017 ACM. ISBN 978-1-4503-4404-3/17/04. . . $15.00

DOI: http://dx.doi.org/10.1145/3030207.3030228

KeywordsBenchmarking; OLTP; OLAP; HTAP

1. INTRODUCTIONWe are undergoing a change in database system design.

Up until now, there has clearly been two main categoriesinto which systems could be classified: operational or trans-actional systems (OLTP) and data warehousing or analyti-cal systems (OLAP). For a long time, each system type livedapart from each other, as each had very distinct goals. Onthe one hand, transactional systems focus their operationon providing ways to enable a high throughput of rathersmall-sized transactions. Typically, during execution, eachtransaction operates under a very limited space of tuples.OLTP systems use a data schema based on reduced levelsof redundancy, resulting from normalization techniques [26]across entities. On the other hand, analytical systems fo-cus on performing aggregations over large sets of data. Thequery centric nature of an OLAP data schema introduceshigh levels of redundancy in the shape of materialized viewsor precomputed aggregates [19].

As pointed out by Gartner [28], a new class of engines, ca-pable of handling mixed workloads with high levels of trans-actional activity and, at the same time, providing scalablebusiness analytics directly over production data is arising.Such systems bypass the need to use the ETL [18] processand are commonly referred to as HTAP - Hybrid Analyt-ical and Transactional Processing systems. Traditionally,both workloads are handled by separate engines, periodi-cally feeding the OLAP with data from the OLTP enginethrough an ETL process. The approach seeks to ensure thebest performance of each individual engine at the expense ofdata freshness for analytics. By eschewing the ETL process,HTAP systems are poised to reduce implementation, man-agement and storage costs and, most importantly, enablereal-time analytics over production data. Several vendorssuch as Oracle, SAP or PostgreSQL started to fill in thisgap and proposed solutions that coined variant terminolo-gies such as OLTAP [21](Online Transactional AnalyticalProcessing) or OLxP [16, 22] (Online Transctional or Ana-lytical Processing) that also aim to mix both workload types.Other companies have started to offer Hybrid solutions suchas NuoDB [24], VoltDB [27] or Splice Machine [20].

293

Given the traditional taxonomy of OLTP and OLAP sys-tems, the industry along with independent organizations de-fined benchmarking approaches specially tailored for eithertransactional or analytical workloads, such as TPC-C [8] andTPC-E [11] for OLTP workloads and TPC-H [9] or TPC-DS [23] for analytical workloads. Each benchmark focuseson the optimization challenges associated with each systemtype, defining evaluation suites with very different and con-tradicting goals. This is so as optimizing an OLTP targetedoperation would intrinsically degrade OLAP performanceand vice-versa [14]. For instance, OLTP and OLAP work-loads generate specific sets of queries that require distinctstorage layouts in order to be efficient. They can also ac-commodate different-sized datasets, employing the conceptof warehouse as scaling factor. Optimizing a storage layoutto support both access patterns efficiently is not a trivialtask but it must be accomplished to have efficient hybrid sys-tems. Moreover, as further discussed in the paper, storageaccesses generated by OLTP workloads are mostly randomwhile OLAP workloads are mainly sequential. Likewise, ahybrid workload will assess the ability of the system undertest (SUT) to simultaneously schedule random and sequen-tial access patterns to storage mediums and manage bothlight and intense operations regarding memory allocationand processor time; these are some of the reasons why theseworkloads have until now been evaluated independently.

Most importantly, it is not easy to combine and directlytranslate results from distinct benchmarks to the effective-ness of a system to handle an HTAP workload. Gartnerstates that an HTAP system should prioritize a sustainedtransactional throughput, delivering at the same time scal-able analytical processing without disrupting the operationalactivity [28]. Consequently, even if both workloads can berun on the same engine, it is not straightforward to mean-ingfully and consistently reconcile the results of both work-loads in a single HTAP metric. This is so as each workloadis usually oblivious to the presence of the other, trying toindependently reach the maximum qualified throughput ineach separate workload, and therefore producing uncorre-lated metrics.

In this paper, we present HTAPBench, a new benchmarksuite designed to evaluate hybrid systems with mixed OLTPand OLAP workloads. HTAPBench introduces a new met-ric that provides a reading of the analytical capability as thesystem scales. The proposed benchmark introduces a hybridworkload that simultaneously exercises a workload with op-erational and analytical activity over the same system. Itintroduces a Client Balancer that controls how analyticalclients are launched, ensuring that the OLTP activity stayswithin a configured threshold and the results are kept com-parable across runs by addressing data uniformity of theworkload. Throughout the paper we use database when re-ferring to the stored data, engine when referring to the soft-ware and SUT or system when referring to the compositionof software and underlying hardware.

Contributions: First, we propose a new unified metricaimed at engines with mixed workloads. Second, we presentthe design and implementation of HTAPBench supportingsuch a metric. We introduce a technique for generating re-alistic OLAP queries over a dynamically changing datasetresulting from concurrent OLTP, with predictable complex-ity. In addition, we propose a load balancer that relies on afeedback control mechanism that regulates the coexistenceof the OLTP and OLAP workload execution.

Roadmap: The remainder of this paper is organized asfollows: Section 2 presents an overview and the unified met-ric we propose. Section 3 evaluates our proposal againstdifferent systems, and Section 4 validates the results as wellas discusses the properties of the system. Section 5 goesthrough related work and Section 6 concludes this work.

2. HTAPBenchThe Hybrid Transactional and Analytical Processing Bench-

mark is targeted at assessing engines capable of deliver-ing mixed workloads composed of OLTP transactions andOLAP business queries without resorting to ETL. Typically,in environments with mixed workloads, the relative weightgiven to OLTP and OLAP is governed by delivering a highOLTP throughput while still being able to simultaneouslyperform analytical business queries [28]. This goal should bemet in such a way that the OLTP throughput is kept withinexpected intervals. Likewise, HTAPBench focuses its oper-ation on ensuring a stable OLTP throughput and assessingthe capability of the SUT to cope with an increasing demandon the OLAP counterpart. To fulfill OLTP requirements,HTAPBench requires the testing engine to grant full ACIDcapabilities during all stages of the benchmark. The designis composed of several modules as depicted in Figure 1. TheDensity Consultant, the Client Balancer and the DynamicQuery-H Generator modules provide the foundation of ourapproach and are discussed in this Section.

Figure 1: HTAPBench architecture.

In a nutshell, HTAPBench decomposes the execution intothree main stages: (i) the populate stage, (ii) the warmupstage and (iii) the execution stage. Two of the modules,which we define as agents, will regulate the OLTP and OLAPactivity. During system start, HTAPBench will be config-ured with a target OLTP throughput, triggering an OLTPworkload configured with the required number of clients tomeet the required throughput. Periodically, HTAPBench

Δt

time

OLAP worker

Execution Warmup

OLT

PO

LAP

OLAP worker

OLTP client

OLAP worker

OLTP client

OLTP client

Populate

OLTP Client

OLTP Client

OLTP Client

Figure 2: HTAPBench execution.

294

will assess the ability of the SUT to handle an increasingOLAP activity, as depicted in Figure 2, while ensuring thatthe transactional throughput does not decrease below a con-figured threshold.

The unified metric is central to our design, and its gene-sis is directly translated from the need of HTAP systems toscale without disturbing the OLTP activity. It mirrors theability of a given analytical worker to complete queries, ina scenario composed of an increasing number of analyticalworkers and a stable transactional activity. Analyzing thisbehavior will enable us to identify situations where addingan additional OLAP worker degrades the OLTP/OLAP en-gine performance.

2.1 WorkloadThe mixed workload used in this benchmark is composed

of a transactional agent and an analytical agent that simul-taneously instruct the system to perform operations over thesame dataset. We selected TPC-C and TPC-H as agents, aseach is able to stress the inherent characteristics of eachworkload type. TPC-C was chosen as the transactionalclient due to its high rate of read-write operations, being oneof the most used workloads for OLTP evaluation. TPC-Cspecification models a real-world scenario where a company,comprised of several warehouses and districts, processes or-ders placed by clients. The workload scales according to thenumber of configured warehouses.

TPC-H also specifies a real-world scenario, modeling awholesale supplier and employing a schema that is veryclose in structure to TPC-C. Moreover, we selected TPC-H over TPC-DS [10] since the workload in TPC-DS is datawarehouse-driven, not only relying on a star schema but alsorequiring the use of ETL processes to keep data updated andin conformity with such a schema. On the basis that analyt-ical queries in a hybrid workload should exercise a datasetcommon to the operational workload, TPC-H better fits therequirement as it does not use a star schema, placing it closerto the workload schema in TPC-C.

The mixed workload in HTAPBench uses all the entitiesin TPC-C and TPC-H’s Nation, Region and Supplier, asproposed in [5]. The remaining TPC-H entities were mergedinto a non-intrusive way in TPC-C’s workload. The resultis a workload that matches Gartner’s recommendations forhybrid workloads, where data should not be moved from op-erational to data warehouses in order to support analytics,but live under the same schema allowing drill-down analyt-ical operations to point toward the freshest data producedby the operational activity.

The OLTP execution in HTAPBench runs according toa target number of transactions per second (tps). It isthus necessary to ensure the optimal configuration regardingsome TPC-C specific parameters such as the total number ofwarehouses and clients defining the number of transactionsper second (tpmC).

target(tpmC) = target(tps)× 60× %NewOrder

100(1)

#clients =target(tmpC)

1.286(2)

#warehouses =#clients

10(3)

To compute these parameters, we refer to the TPC-C specifi-

cation [8] and use the characterization for the TPC-C’s idealclient, considering the minimum think time for each trans-action type, and provided that transactions do not fail andthus no rollback operation is required. According to TPC-C,a single client should not be able to execute more than 1.286tpmC. Under these conditions, it is possible to extract thetarget tpmC from the target tps (expression 1), as well asthe total number of clients (expression 2) and warehouses(expression 3). The required target tps is one of the config-urable criteria in HTAPBench and directly relates with theexpected scalability of the system and respective databasesize (further details are provided in Section 2.4).

The business queries in TPC-H are built from filtering,aggregation and grouping operations over a given result set.Filtering operations use SQL operators such as where, hav-ing or between. Their main goal is to limit the number ofconsidered rows. Since the transactional activity will feedthe analytical queries in HTAPBench, the number of rowsfiltered by the latter will grow over time. If not addressed,the results of these analytical queries are poised to becomeincomparable across runs. Data distributions regulate howthe parameters for filtering operators are selected, enablingthe queries to dynamically exercise several regions of thedataset while exhibiting comparable complexity. On theother hand, if the queries are not dynamically generated,the use of fixed bounds on the filters would end up travers-ing the full domains, preventing the query planner of theSUT to be exercised.

To verify the impact of using fixed or dynamic parameters,we conducted an experiment where we considered the exe-cution of the 22 TPC-H queries over 2 setups. The first useda set of fixed parameters that would resemble full domainsearches; the second used dynamically generated parameterswithout being bound to a data distribution. The configu-ration of dynamically generated parameters created a newset for each run, while the fixed configuration reuses thesame set across runs. Each setup considers the average of5 independent executions. Queries are computed against acolumn-oriented engine. In each run, the database is popu-lated with one warehouse and the queries are executed with-out any of the filtering operators in their composition, estab-lishing a baseline comparison that represents the universe ofrows in each query.

The experiment observes the average difference of resultset row count in consecutive executions of each run for agiven setup, as a percentage of the baseline result. Whenusing fixed parametrization, the result set cardinality did notchange across consecutive runs, and in most cases, queriesended up selecting a considerably broader space of tuples.When we used dynamically generated filters in the TPC-Hqueries, a variation of up to 77% in result set cardinality wasobserved in comparison with the previous approach. This isdue to not using a distribution to feed the date fields duringthe population stage of the benchmark. By not using adistribution to regulate how these fields are generated, itbecomes likely that the items inserted during the populatestage of the benchmark present uneven time distributionswhen compared with the ones created during the executionstage. The next Section introduces a way to generate aworkload distribution that ensures analytical queries withcomparable complexity across runs.

295

2.2 Result Set HomogeneityThe analytical queries composing a hybrid workload are

fed with data created or manipulated by the transactionalagent, either during the initial populate stage, or during thetransactional execution part of the hybrid workload. Theengine qualifying as HTAP must operate under an isolationcriterion that enables the analytical queries to observe datacommitted by the transactional agent at the time the an-alytical queries started. Likewise, a given analytical queryshould freely access the entire dataset, spanning from thefirst to the last committed transaction.

Figure 3: Timestamp density difference.

As discussed in the previous Section, the absence of aregulating mechanism would result in the use of randomizedquery boundaries, producing incomparable results across runs.The same may happen when analytical queries observe datagenerated in the populate and execution stages of the hy-brid workload. Figure 3 depicts an example of the patternsthat data is created or changed by the transactional agent.On the one hand, the OLTP populate stage (Figure 3a) pro-motes bursts of transactions inserting data, causing a highconcentration of timestamps in a short time period. On theother hand, during the OTLP execution stage, the OTLPtransaction rate is regulated by TPC-C.

What is desirable is that the pattern generated by theOLTP execution within TPC-C is also observed by analyt-ical queries whenever they traverse the data loaded duringthe populate stage. To mitigate this issue, we introduce adensity extraction mechanism that ensures the same datapattern across stages. Briefly stated, our approach observesthe amount of generated date fields during the executionstage of TPC-C, allowing the system to apply the extracteddensity during population.

Density FunctionThe populate and execution stages of TPC-C generate dif-ferent date densities across the whole dataset, varying ac-cording to the configured transaction mix within TPC-C.Moreover, as not all transaction types generate the samenumber of new timestamps, we configured our density func-tion to reflect that behavior.

txnMix =%NewOrder + %Payment + 10×%Delivery

100(4)

d(TS/s) = tps× txnMix (5)

Both the New Order and Payment transactions generate onetimestamp each, while each Delivery transaction generatesten. The Order Status and the Stock Level transactionsdo not generate any timestamps. It is then possible to ex-press density as a function between the target number oftransactions per second and the ratio of New Order, Pay-

ment and Delivery transactions, as defined by expression 5.

In the following, we set up an experiment that allowed us toobserve the expected density.

This experiment was conducted on a server with an IntelXeon x3220 2.4 GHz QuadCore, 8GB of memory and 128GBSolid State Drive. For the purpose of this test, we relied ona Hybrid system, which we set up according to Table 1 toreflect workloads with more than 70GB in total size. Ineach experiment, the database was dropped and populated.Afterwards, we ran TPC-C under the standard transactionmix in runs that lasted 60 minutes.

tpmC clients warehouses635 495 49741 576 58886 689 69

Table 1: Workload Configuration - Ideal TPC-C client.

The results depicted in Table 2 are the average of fiveindependent runs regarding each target. The results depictan increasing amount of newly issued timestamps (T ) as thedefined target increases, thus reflecting a density functionthat also presents an increasing trend. The results also show

tpmCTotal Expected Experimental

Observed (Ts) d(Ts/s) d(Ts/s)635 108,051 30.24 30.01741 125,500 35.14 34.86886 150,114 42.02 41.69

Table 2: Density Observation Results.

that the density function provides results that are only 3%apart when comparing with the experimental observation.

The timestamp density will introduce a change in the stan-dard TPC-C specification. It is worth noting that this mod-ification does not introduce any change in TPC-C businesslogic. The individual TPC-C results are kept comparablewith a same-sized TPC-C installation.

2.3 Component Design

Unified MetricThe disparity in workload complexity and structure betweenOLTP and OLAP workloads is a major hurdle when tryingto define a unified metric for an HTAP system. So far, one ofthe main disadvantages of previous approaches was the factthat they would enable both OLTP and OLAP executionsto grow in order to achieve the maximum throughput foreach. The non-regulated growth induced by OLTP execu-tion would inherently degrade OLAP performance since an-alytical queries would have to scan more data. HTAPBenchremoves one axis of variability by regulating the OLTP work-load. The assurance of a constant transactional executionalso leads to a sustained and known database growth; thatis, the rate at which OLAP queries observe new data is fixedand predictable, bypassing the need to normalize the OLAPresults in terms of the observed growth.

QpHpW =QphH

#OLAPworkers@tpmC (6)

Expression 6 defines the metric we propose, QpHpW or”Queries of type H per Hour per Worker”. It reads as thenumber of analytical queries executed per OLAP worker re-garding a system that is able to sustain the configured tps.

296

It is defined by the ratio between TPC-H’s metric and thetotal number of registered OLAP workers induced by theClient Balancer. A higher QpHpW maps a system whereeach OLAP worker is able to compute more queries per an-alytical worker, thus representing a higher overall through-put. Achieving the best configuration for a given installationbecomes a multi-run optimization problem regarding a giventarget tps.

Client BalancerThe Client Balancer module is responsible for monitoringand deciding whether or not to launch additional OLAPworkers. When the OLTP agent ensures that the targettps is stable, the Client Balancer will periodically assesswhether or not the SUT is capable of handling an extraOLAP worker. This assessment relies on a proportional-integral feedback controller.

output = KP ∆tps + KI

∫∆t (7)

The feedback control mechanism (expression 7) is charac-terized by a proportional (KP ) and an integral (KI) gainadjustment. The gain parameters are used over the foundsystem deviation (∆tps = target tps − measured tps) tocompute the output value. The correct adjustment of eithergain factor is vital to ensure that the feedback controllerdoes not exceed too quickly the SUT’s capabilities, whichin turn would launch a higher number of OLAP workers,causing disruption on the throughput of the OLTP execu-tion. We tuned the proportional-integral feedback controllerby experimentation to use the gain adjustment characterizedby KP = 0.4 and kI = 0.03. The individual study on how toreach these parameters is left out of the scope of this paper.Algorithm 1 presents the Client Balancer decision process.

Algorithm 1 Client Balancer

1: procedure2: wait(∆t)3: error ← target tps−measured tps4: integral← integral + error × 1

∆t5: output← KP × error + KI × integral6: previous error ← error7:8: if output > (target tps×margin) and ¬saturated

then9: start OLAP worker

10: else11: saturated← true

The Client Balancer will periodically (∆t) poll the engine re-garding the current number of transactions being delivered.This information is then fed into the feedback controller. Tolaunch another OLAP worker, the Client Balancer ensuresthat the output value produced by the feedback mechanismis within a given error margin of the configured target tpsand that the system is not saturated. The point of satu-ration is reached when the current number of OLTP trans-actions per second being delivered drops below the chosenthreshold.

Density Consultant & LoaderHTAPBench follows the standard TPC-C transaction mix.The Density Consultant computes the correct density ac-

cording to the chosen target tps and transaction mix. Dur-ing the populate stage, in order to generate timestamps thatfollow the required density, the Loader is equipped with aclock that initiates with the system time at which the pop-ulate stage is initiated. The clock then computes how muchtime should elapse between clock ticks (∆TS) in order tofulfill the required density, as defined by expression 8.

∆TS(ms) =1

d(TS/s)× 1000 (8)

The HTAPBench Loader will proceed to create and loadall the table entities represented in the hybrid data schema,built from merging TPC-H’s schema into TPC-C’s. The fi-nal installation will scale in size according to the computednumber of warehouses. When loading tables with referencesto date items, the Load worker makes use of the clock, in-creasing one tick for each new date field to be loaded. Aftercompletion, applying the density function ensures that thetemporal density in the date fields matches the observeddensity during execution of the transactional workload.

Dynamic Query-H GeneratorThe analytical queries within HTAPBench are constructedaccording to the TPC-H specification, which requires themto be built with randomized parameters within given bound-aries. The Dynamic Query-H Generator module is respon-sible for building the SQL statements for the queries, ensur-ing that the random values comply with the TPC-H spec-ification. This module integrates with the Client Balancermodule that will launch the analytical workers, with its out-put afterwards fed into the TPC-H worker. The DynamicQuery-H Generator computes the time window frames thatshould be considered for query execution, and introduces afeature that enables the window frame generation to reflect asliding window behavior. In the specification for each query,TPC-H requires static time frames. Take as an examplequery Q6 that computes the total revenue for orders placedwithin a given period.

Listing 1: TPC-H Query 6

select sum( ol amount ) as revenue fromo r d e r l i n e where o l d e l i v e r y d between

[Date ] and [Date + 1 year ] ando l quan t i t y between [ Amount a ] and [ Amount b ]

This particular query restricts the result set to orders placedwithin a one-year time frame, starting on January the first ofa randomly selected year between 1993 and 1997, and end-ing in the following year. In TPC-H, this is attainable sinceit does not consider database growth. However, in HTAP-Bench, the database grows at the pace dictated by the OLTPexecution of the benchmark. Thus, if window frames were tobe kept static, the new regions on the dataset would neverbe queried. To produce a homogeneous result set that isrepresentative of the whole dataset, the Dynamic Query-HGenerator ensures that queries comply with the time rangeimposed by the specification while simultaneously leveragingthe Density Consultant module to shift the starting date ofthe range to meet the speed at which the OLTP executionis making the database grow. Hence, the sliding window be-havior not only ensures that the entire dataset is consideredbut also that consecutive executions of the same query arekept comparable.

297

Results MonitorThe Results Monitor collects the execution results producedby each worker. The final measurements are only collectedafter the configured execution time elapsed and all the work-ers finalized all procedures. The transactional activity ischaracterized according to TPC-C’s metric (tpmC ) while,the analytical activity is characterized by TPC-H’s metric(QphH ). Together with both metrics, this module also out-puts data from the Client Balancer, characterizing each runin terms of how many OLAP workers were launched andwhen as well as the result set volume and latency for eachanalytical query executed.

ImplementationHTAPBench is available as an open source project 1. Itwas implemented in Java for improved portability and in-cludes all the aforementioned components. The current pro-totype was implemented as an extension of OLTPBench [13],a framework that enables the execution of several bench-marking approaches. Moreover, OLTPBench’s implementa-tion of TPC-C does not consider the think time used duringtransaction execution. The TPC-C specification requires atime in which each transaction simulates (by entering a sleepstage) the time required by the terminal user to insert data,as well as the terminal’s processing time (TPC-C’s simula-tion of a real-world scenario). Consequently, as part of ourextension to OLTPBench to implement HTAPBench, we in-troduced the think time processing stage for each transac-tion of TPC-C.

HTAPBench relies on JDBC to establish a connectionwith the engine. Since JDBC defines a standard interfaceto connect with several engines, it is possible to use HTAP-Bench with a wide range of engines, provided that they sup-port JDBC.

2.4 Benchmark ConfigurationHTAPBench has several system requirements. The ma-

chine running HTAPBench should be provisioned with aJava distribution and the appropriate database engine driver.

HTAPBench requires the user to provide the engine JDBCconnection URL, the fully qualified name of engine driverclass and the target number of transactions per second (tps).The required target tps for a given configuration derivesfrom an expectation regarding the performance compliancefor the SUT. The user should start with a small target tpsvalue and progressively increase the target tps until the SUTis saturated, as it happens with the warehouse scalabilityin TPC-C. Other configurations are allowed, enabling cus-tomization of the workload, namely: the Client Balancer ∆tand error margin, the TPC-C transaction mix, the TPC-H

query mix and the Execution time. Hereafter, we brieflydescribe the impact of the mandatory parameters on thesystem.

• Client Balancer ∆t: This parameter configures the pe-riod in seconds used by the Client Balancer module toassess if further OLAP workers should be launched. As-signing the appropriate evaluation period has a direct im-pact on the convergence time and precision of the bench-mark. Choosing a small value may cause the system toconverge to quickly, but overestimate the number of ad-missible OLAP clients. On the other hand, choosing a

1https://github.com/faclc4/HTAPBench.git

large value improves the client balancer decision by ex-posing it to a larger number of samples, but delays theoverall process. This parameter defaults to 60 seconds,configuring a trade-off between scalability and assured-ness.

• Client Balancer error margin: This parameter config-ures how sensitive to change the Client Balancer shouldbe, setting up the range of allowed tps computed in per-centage of the target tps. It has a default value of 20%,which by experimentation we consider to be a reason-able trade-off of loss OLTP throughput in favor of havingOLAP capability.

• Execution time: This parameter configures the durationof time that each run of the execution stage of the bench-mark should last. This parameter defaults to 60 minutes,configuring a reasonable execution time after the warmupstage for the Client Balancer to exercise the SUT.

The execution of HTAPBench is divided into three mainstages, as previously introduced. After system configura-tion, the user should run the populate script that gener-ates the appropriate Comma-Separated-Values (CSV) filesfor the configured tps. Afterwards, the user should consultthe documentation of the engine to be tested in order touse the correct procedure to load the CSV files (optionally,it is possible to directly populate the SUT, but this oper-ation usually takes longer, since it does not load data inbatch mode). To start the execution stage, which includesthe initial warmup, the user should use the execution script,automatically deploying the required number of clients. Theexecution will run for the configured time and will afterwardsproduce result files, characterizing the OLTP and OLAP ex-ecutions and computing the unified metric.

3. BENCHMARKING CAMPAIGNThe benchmarking examples presented in this Section re-

sult from an extensive study intended to evaluate the keyproperties of HTAPBench. As such, the main purpose ofthe presented scenarios is not to quantitatively compare dif-ferent SUTs, but rather to demonstrate the expressivenessof the benchmark suite and its metrics.

Three different SUT were selected, namely: (i) an OLTPsystem, (ii) an OLAP system and (iii) a Hybrid system.Besides the target tps and the Client Balancer error mar-gin, the same HTAPBench configuration was used acrossexperiments. The Client Balancer was set to use an evalu-ation period (∆t) of 60 seconds and the OLTP activity wasregulated by the standard transaction mix within TPC-C.For the OLAP activity, we set up HTAPBench so that eachbusiness query would be chosen according to a uniform dis-tribution. Across experiments, we configured HTAPBenchto inject 100 transactions per second, which according toour density extraction mechanism amounts to 2,099 activeOLTP clients and 210 warehouses, a total of 117GB of data.The selected target tps was chosen as the number of config-ured warehouses generate a dataset with over 100GB, whichis above the third recommended scale factor for the OLAPagent (e.g., 1GB, 10GB, 100GB). All the following experi-ments reflect the average of 5 independent 60 minute runs.

298

0

20

40

60

80

100

120

0 10 20 30 40 50 600

10

30

50

80

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(a) OLTP SUT.

0

20

40

60

80

100

120

0 10 20 30 40 50 600

2

4

10

thro

ughput −

txn/s

ec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(b) OLAP SUT.

0

20

40

60

80

100

120

0 10 20 30 40 50 600

5

12

20

thro

ughput −

txn

/sec

#O

LA

P W

ork

ers

time (min)

OLTP OLAP target

(c) Hybrid SUT.

# OLAP workers QpH QpHpWOLTP 50 7 0.14 @ 756OLAP 4 123 30.75 @ 217Hybrid 12 169 14.14 @ 530

(d) Overview of results for different SUT.

Figure 4: Registered progress between OLTP throughput and induced OLAP scalability. The optimal trade-off betweenOLTP and OLAP performance is achieved when the OLTP throughput goes below the gray area, at which point the OLAP

activity is not further increased. The table in (d) summarizes the OLAP results, showing that the Hybrid SUT is morebalanced for an HTAP workload.

3.1 OLTP SystemThe current experiment used a server with an Intel Xeon

E5-2670 v3 CPU with 24 physical cores (48 virtual) at 2.3GHz and 72GB of memory, running Ubuntu 12.04 LTS asthe operating system. We deployed an OLTP row-orientedengine and configured the level of memory that would allowthe required number of clients. The Client Balancer withinHTAPBench was configured to consider the default errormargin of 20%. The admissible loss of OLTP throughputinduced by the configuration of the error margin is depictedas a gray area in Figures 4(a), 4(b) and 4(c). HTAPBenchwas launched from the same machine.

The results in Figure 4(a) depict that HTAPBench wasable to launch and sustain the required target tps through-out the entire execution time. This is read by looking atthe OLTP line that is the linear interpolation of the plot-ted points. The required target tps was reached on the firstminute of execution and, from that moment on, the ClientBalancer started its configuration by launching an OLAPworker at each minute. The evolution regarding when andhow many OLAP workers exist is read by looking at theplotted line resembling a staircase.

With a configured error margin of 20%, the bottom tpsbarrier was only surpassed after 50 minutes, saturating thesystem at that point in time, and therefore not launchingfurther OLAP workers. Please consider that by defaultHTAPBench’s Client Balancer introduces a single OLAPworker in each evaluation period (∆t). Even though we donot present this evaluation for lack of space, it is possibleto configure HTAPBench’s Client Balancer to deploy morethan one OLAP worker at a time.

Throughout the test, a declining trend in the OLTP through-put becomes evident. In what strictly concerns the OLAPactivity, the engine under test was able to hold up to 50OLAP clients. As previously stated, the main premise inHTAPBench’s evaluation scheme is to discover how manyOLAP workers are required to stress out a given SUT, whileensuring that the transactional activity does not degradebeyond a configurable threshold. In addition to providinga temporal evaluation of both OLTP and OLAP workerswithin the system, HTAPBench outputs the unified metricwe propose, but also the standard TPC-C tpmC and TPC-H QphH metrics. Within this setup, this SUT was able tosustain 756 tpmC and 7 QphH, resulting in 0.14 QpH ineach OLAP worker (7 QpH / 50 OLAP workers). As such,the unified metric, QpHpW , amounts to 0.14 @ 756 tpmC.

3.2 OLAP SystemThe current experiment reused the previous configuration.

We deployed an OLAP column-oriented engine, only settingup the required level of clients allowed in the engine.

The results in Figure 4(b) depict that the SUT sustainedthe OLTP throughput for a shorter period. This behav-ior is not unreasonable since the focus of this engine is noton OLTP activity. From the moment the threshold was bro-ken (6th minute), the Client Balancer stopped releasing newOLAP clients. From this time on, albeit at a lower through-put, the OLTP activity was kept stable until the end of therun time. The SUT was able to register 217 tpmC while sus-taining 4 individual OLAP clients. Cumulatively, the OLAPactivity reached a peak of 123 QphH. As such, the unifiedmetric we propose, QpHpW , reaches 30.75@217 tpmC.

299

Compared with the previous system, we observe that thelatter system can enable a larger number of analytic queries,despite the fact that this particular system was only able tolaunch 4 OLAP streams, compared to 50 OLAP streamswith the first SUT. This may seem counter-intuitive, con-sidering that we are working atop a column-based enginespecifically designed for an analytical workload. However,the 4 OLAP streams in the current SUT are much moreefficient when compared with the 50 OLAP streams in theOLTP SUT.

3.3 Hybrid SystemNext, we use HTAPBench to characterize a Hybrid engine.

As such systems are designed to scale out, they are usuallymade available over a distributed architecture.

We deployed the Hybrid system over 10 nodes, 9 of whichare responsible for handling and storing data, and the re-maining node provides coordination and other global ser-vices. Each node has an Intel i3-2100-3.1GHz 64 bit pro-cessor with 2 physical cores (4 virtual), 8GB of RAM mem-ory and 1 SATA II (3.0Gbit/s) hard drive, running Ubuntu12.04 LTS as the operating system and interconnected by aswitched Gigabit Ethernet network.

The results depicted in Figure 4(c) reveal that the OLTPthroughput reached the assigned target in the first minuteof execution. From the first minute onward, the ClientBalancer started to deploy OLAP streams until the OLTPthroughput degraded beyond the considered error margin,which happened after 20 minutes, registering a grand totalof 12 OLAP streams. From that moment on, the OLTPthroughput slowly degraded over the remainder of the testduration. Cumulatively, the SUT was able to sustain 530tpmC and 169 QphH. Therefore, our unified metric QpHpW ,presents as 14.14@530 tpmC.

3.4 DiscussionThe results of these three tests are summarized in the

table in Figure 4(d). Although the OLTP SUT achievedthe highest number of OLAP workers, the results in Fig-ure 4(d) also show that it achieved the lowest OLAP per-formance (0.14). The OLAP workers in the OLTP enginespend most of their time waiting for the OLAP queries tobe processed and therefore complete relatively few OLAPqueries. On the other hand, the OLAP engine processessignificantly more analytical queries, but fails to cope withthe required OLTP throughput that is used by the ClientBalancer to launch additional OLAP workers. Overall theOLTP engine completed 7 QpH and the OLAP engine com-pleted 123 QpH. This results show that the OLAP SUT wasbetter at handling the OLAP workload, but its inability tocope with the OLTP workload in the hybrid configurationharmed the overall scalability. The favored hardware con-figuration for the Hybrid SUT is significantly different fromthe OLTP system or the OLAP system, which undermines adirect comparison among them. Nevertheless, HTAPBenchwas able to evaluate the Hybrid SUT, enabling 12 OLAPstreams and achieving a total of 169 OLAP queries. The re-sults reveal that the Hybrid SUT can sustain a considerableOLTP throughput with a moderate OLAP scalability.

4. VALIDATIONIn order to validate the expressiveness of our metrics, we

studied the system’s representativeness, workload accuracy,

homogeneity, error margin variability and cost. Moreover,we discuss the benchmark portability, reproducibility andrepeatability. For that matter, we used all the previous SUTconfigurations.

First, we show that the proposed metric can either identifysystems with similar or very different goals. Second, we an-alyze HTAPBench’s accuracy by verifying the storage tracesproduced in three distinct scenarios. Third, we verify thatthe produced query result sets are homogeneous as specified.Fourth, we conduct a variability analysis which presents theconsequences of varying HTAPBench’s error rate. Finallywe discuss the cost of using the suite.

4.1 HTAP Unified MetricThe unified metric we propose allows us to establish a

comparison across two dimensions. Figure 5 uses a quadrantfield plot to visually compare the relationship between ourproposed metric and the target OLTP throughput.

0

100

200

300

400

500

600

700

800

900

0.14 14.14 30.75

OLTP 1

OLAP

HYBRID

OLTP 2

tpm

C

Unified Metric (QpHpW)

Figure 5: Quadrant field plot for the unified metric.

While an increase in the vertical axis translates into ahigher OLTP throughput, the horizontal axis evaluates thecapability to perform more analytical queries per OLAPworker. The best result would be a reading on the up-per right quadrant. The analysis depicts that while theSUT labeled OLTP 1 registered a QpHpW of 0.14@756tpmC, the OLAP observed 30.75@217 tmpC and the [email protected] tpmC. Overall, we can state that the analyzedsystems follow the trend in which the increase of OLTP ac-tivity diminishes the OLAP capability. The hybrid systemreached a position close to the middle of the plot, indicatingbetter support for the mixed workload with simultaneousOLTP and OLAP activity.

So far, the presented results allow us to conclude that thebenchmarking suite is able to compare systems with verydistinct work plans. In order to verify if the suite is able todistinguish systems designed for similar workloads, we repro-duced the same experimental scenario as in Section 3.1 butchanged the SUT to a different engine designed for OLTPworkloads. We evaluated its performance and plotted it inFigure 5 with the label ”OLTP 2”. The results place OLTPsystem 2 very close to system OLTP 1 with enough precisionso as not to overlap both results.

All of the previous experiments were integrated in a statis-tical study from where we were able to compute the variationcoefficient of the computed metrics, pointing to a variationof 1.2%. As such, if the evaluation of 2 different systems,or system configurations produce variabilities with less than1.2%, the systems should be considered indistinguishable.

4.2 Error Margin VariabilityThe Client Balancer in HTAPBench controls whether fur-

300

1.935×108

1.94×108

1.945×108

1.95×108

1.955×108

0 500 1000 1500 2000 2500 3000 3500 4000

Sto

rage

offset

Time (sec)

write read

(a) OLTP access pattern.

1.895×108

1.9×108

1.905×108

1.91×108

1.915×108

1.92×108

1.925×108

0 500 1000 1500 2000 2500 3000 3500 4000

Sto

rage

offset

Time (sec)

write read

(b) OLAP access pattern.

1.9×108

1.92×108

1.94×108

1.96×108

1.98×108

2×108

0 500 1000 1500 2000 2500 3000 3500 4000

Sto

rage o

ffse

t

Time (sec)

write read

(c) HTAP access pattern.

Figure 6: Traces collected through the blktrace tool during a HTAPBench execution over an (a) OLTP, (b) OLAP and (c)HTAP SUT. The trace represents read (black marks) and write (red marks) patterns induced on the storage medium by the

workload. The vertical axis represents the storage medium offset. The horizontal axis represents time in seconds.

ther OLAP streams are deployed or not, evaluating at eachpoint in time if the current OLTP throughput does not gobelow a threshold defined by the error margin configura-tion. Intuition leads us to believe that by increasing theerror margin, the Client Balancer will naturally allow moreOLAP clients to be deployed, thus increasing the overallamount of analytical queries performed.

Error Rate QpH tpmC OLAP Clients10 % 70.99 130.64 120 % 168.9 131.36 540 % 265.99 138.30 11

Table 3: Client Balancer error rate variance.

The current experiment assesses if in fact such behaviortranslated into the actual behavior of the benchmark. Forthis purpose, we set up the OLAP SUT as in Section 3.2 butvaried the error margin across runs.

First, by analyzing Table 3, the reader should be awarethat the consecutive executions registered similar OLTP through-puts as expected. Moreover, as we increase the allowed errorrate, we verify that more analytical queries were performed,which is a consequence of having more OLAP clients on thesystem, thus increasing the registered QpH and consequentlythe QpHpW.

4.3 Workload RepresentativenessDatabase systems must be evaluated with representative

workloads that test realistically the storage back-end ca-pabilities. To better understand the representativeness ofHTAPBench we analyzed the storage traces produced. The3 previously tested SUT were evaluated with these work-loads and the resulted storage traces were collected and an-alyzed with the blktrace2 tool. This tool allows us to collectstorage traces for a given block device. With these tracesit is possible to extract useful information, such as the ac-cess patterns of storage requests, the ratio of storage readsand writes, the throughput and latency of each request, etc.Each SUT was deployed in a single machine as in Section 3.1.

Figure 6 depicts the access patterns registered by the stor-age medium for an HTAP SUT. The figures plot the offsetof the storage medium (vertical axis) being accessed dur-ing execution time of the benchmark (horizontal axis). Theanalysis of the traces collected allowed us to confirm thatOLTP workloads are dominated by random storage accesses(Figure 6(a)) while OLAP workloads do mostly sequentialstorage accesses (Figure 6(b)) dominated by read-only re-quests. On the other hand, mixed workloads generated byHTAPBench present a mix of these access patterns, as ex-pected (Figure 6(c)). Moreover, the ratio of storage readsand writes is according to the specification of each workload.In summary, these results show that HTAPBench is able tosimulate a realistic storage load for a hybrid SUT that pro-

2blktrace manual: http://linux.die.net/man/8/blktrace.

301

0.001

0.01

0.1

1

10

100

1000

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22

(%)

TPC−H queries

HTAPBenchNo Density

Figure 7: Result set execution cost as a percentage of the baseline across independent runs. The vertical axis is presented inlogarithmic scale. The absence of columns represents null variability.

cess both OLAP and OLTP workloads simultaneously. Con-sequently, HTAPBench’s representativeness of the analyticalcapability on top of a sustained transactional operation is en-sured by the experimental data, but also by the individualrepresentativeness of the TPC workloads used. Moreover,the results show that current proposals for hybrid SUT mustbe aware that processing simultaneously OLAP and OLTPworkloads generates a mix of random and sequential storageaccesses. This challenge can drive novel research proposalsfor both hybrid engines and the back-end storage systemssupporting them.

4.4 Homogeneity and ReproducibilityThe present experiment validates the effectiveness of our

density mechanism. Figure 7 depicts 2 different scenarios,where we compare a distribution using random field gener-ation with the density approach we propose. The depictedresults relate to a baseline comparison where all queries wereexecuted without any of their filtering operators, thus allow-ing us to extrapolate the total universe of rows consideredin each analytical query. The value presented for each queryis the absolute difference in produced result set rows in twoconsecutive executions: first, with random parameters thatdo not follow any distribution and, second, with the densitymechanism proposed in HTAPBench.

In the first scenario, the analytical queries ran with the in-troduction of randomized parameters; the results show thatconsecutive executions of such queries produced variable re-sult sets. The measured variability of result set lines acrossall query executions for this setup amounted to 12%, witha registered maximum of 20%. For the second scenario, weused HTAPBench’s density mechanism to generate the ran-dom parameters to be used in all analytical queries. QueriesQ13, Q21 and Q22 produce variabilities of 0% as the consec-utive runs always produced the same number of result setrows.

The results show that by using our approach, we were ableto produce query result sets with comparable costs. Consec-utive executions of the same query registered a variability ofjust 0.17% on average across the different SUT and differ-ent configuration settings. These results confirm a very highlevel of reproducibility of results for a given configuration.

4.5 Benchmark Adoption EffortThe benchmarking suite proposed in this paper derives

from two industry standard benchmarks with added func-tionality. The available suites offer implementations that areindividually deployed to collect separate metrics, and theytypically require the implementation of connector mecha-nisms between the benchmarking suite and the engine driver.

HTAPBench follows the trend of a few other suites, rely-ing in the native JDBC driver of the SUT to communicate

all the operations of the workload. Concerning the requiredtime to run the benchmark, the setups considered in thispaper are the result of 5 independent runs. In each one ofthem, the SUT had to be populated with data and thentested with the hybrid workload. The time spent duringthe populate stage greatly depends on the configured tar-get of transactions per second and the SUT used. In thesetups considered, we observed an average of 4 hours topopulate the files for the engine. Afterwards, before sub-sequent executions, the engine would dump and restore theinitial populate data, taking an average of 15 minutes. Thistechnique is also valid for setups with terabytes of data. Re-garding the execution stage, each setup ran for 60 minutes.Consequently, the overall cost for each SUT was 10 hours.Considering other suites, the total cost per SUT is relativelythe same.

5. RELATED WORKThere are several organizations that propose benchmark-

ing standards to assess either transactional or analytical sys-tems, namely: the Transaction Performance Council (TPC) [12],the Standard Performance Evaluation Council (SPEC) [7]and the Storage Performance Council (SPC) [6]. We cate-gorize several systems into OLTP, OLAP or Hybrid and webriefly present their main characteristics and shortcomings.

5.1 OLTPOnline Transactional Benchmarking systems, such as TPC-

C [8], focus on assessing a system’s ability to cope with alarge amount of small-sized transactions. Usually, OLTPsystems rely on row-oriented data stores, in which the trans-actions operate over a restricted space of tuples, ensuring atthe same time properties such as consistency and isolationin what is commonly referred to as ACID [17].

The TPC-C specification models a real-world scenario wherea company, comprised of several warehouses and districts,processes orders placed by clients. The workload is definedover 9 tables operated by a transaction mix comprised of fivedifferent transactions, namely: New Order, Payment, Order-Status, Delivery and Stock-Level. Each transaction is com-posed of several read and update operations, where 92% areupdate operations, which characterizes this as a write heavyworkload. The benchmark is divided into a load and an ex-ecution stage. During the first stage, the database tablesare populated and, during the second stage, the transactionmix is executed over that dataset. TPC-C defines how thesetables are populated and also defines their size and scalingrequirements, which is indexed to the number of configuredwarehouses in the system. The outcome of this benchmarkis a metric defined as the maximum qualified throughput ofthe system, tpmC, or the number of New Order transactionsper minute.

302

The TPC-E [11] benchmark builds on TPC-C, introducinga variable number of client terminals. It models a scenariowhere a brokerage firm receives stock purchase requests fromcustomers, trying to acquire the correspondent stock bondsfrom a Stock pool. The purchase orders placed by clientsare based on asynchronous transactions while stock requestsbetween the Brokerage firm and the Stock Exchange arebased on synchronous transactions. Compared with TPC-C, this benchmark builds a much wider and more complexsystem as it is composed of 33 tables and 10 transactions,6 of which are read-only and the remainder are read-writetransactions; the latter accounting for 23% of all requests.

Both specifications build benchmarking suites strictly in-tended to evaluate OLTP engines. Despite their represen-tativity of OLTP workloads, the characteristic short oper-ational transactions prevent the workload from exercisingOLAP requirements. Engines designed to provide a highOLTP throughput are typically row oriented as the natureof a single transaction induces I/O bound operations in sev-eral attributes per transaction. This behavior translates intorandom access patterns to the storage medium and does notproduce CPU and memory bound operations as do OLAPworkloads.

5.2 OLAPOnline Analytical Systems such as TPC-H [9, 25] or TPC-

DS [10] focus on assessing a system’s ability to performmulti-dimensional operations, usually on top of column-orienteddata stores.

TPC-H builds a workload that does not require ETL,modeling a real world-scenario where a wholesale suppliermust perform deliveries worldwide. The business queriesperform complex data operations (e.g., aggregations) acrosslarge sets of data. The workload defines a query mix com-prised of 22 business queries that access 8 different tables.The execution is divided into three stages. The first stageloads the database. During the second stage, a single userexecutes all 22 business queries, while during the third stage,a multi-user setup is used in order to evaluate the system’sability to schedule concurrent operations. TPC-H does notconsider any growth factor during runtime, which meansthat the dataset does not change in terms of its total size.The outcome of this benchmark is computed through a met-ric that accounts for the total number of completed queriesper hour (QphH ).

TPC-DS builds a workload that requires ETL, namelyto ensure data freshness. It models a scenario where ordersmust be processed from physical and online stores of a whole-sale supplier, mapping it into a star schema composed of 7fact tables and 18 dimensions. The workload holds 4 differ-ent query types, namely Reporting, Iterative, Data Miningand Ad-hoc queries. The database populated for TPC-DS,as in TPC-H, does not consider any growth factor; still, theinitial population is regulated in terms of a scale factor thathas direct influence over the data size. The output metricis defined as the number of queries per hour at a given scalefactor, QphDS@ScaleFactor.

TPC-DS is seen as an evolution of TPC-H, addressingoversimplifications that prevent the proper evaluation of OLAPsystems. However, the need to use ETL to promote updateson the star schema prevents us from using it as an OLAPagent in our hybrid workload.

The high prevalence of read operations in these workloads

mostly generate sequential accesses to storage mediums, andtherefore are not able to simulate the short and cross at-tribute nature of OLTP workloads that also live in a hybridworkload.

5.3 HybridHybrid workloads should capture both access patterns ob-

served in the previous individual specifications [15]. Thereare a few benchmarking suites that use both access patterns,namely CH-benCHmark [5] and CBTR [4, 2, 1, 3].

CH-benCHmark creates a mixed workload also based onTPC standard benchmarks, enabling two types of clients.A transactional client provides a TPC-C agent, while ananalytical client provides a TPC-H agent. To allow theanalytical workload across the transactional relations, CH-benCHmark merged both schema into a single one, com-prising relations from TPC-C and TPC-H. The relations ac-cessed by the OLTP execution scale according to TPC-C’sspecification. Relations accessed by the OLAP executionare kept unchanged. However, CH-benCHmark neglects as-pects within TPC-H’s specification. Namely, the analyticalqueries should hold random parameters in order not to con-stantly transverse the same regions of the dataset. It alsodisregards the required distributions for date fields, impact-ing the produced analytical results.

CBTR defines a benchmarking system aimed at mixedworkloads, which does not account for any previous stan-dardized specifications, considering them too predictable. Itintroduces a workload built from real enterprise data thatmodels an order-to-cash workflow. For that matter, a newschema and the respective transactional activity is presented.By using real data, CBTR bypasses the need to use nu-merical distributions to populate or to generate data duringbenchmark execution.

The major differentiator of HTAPBench when comparedwith the previous approaches lies specifically in its ClientBalancer module that governs how both workloads coexist.Both CH-Benchmark and CBTR use naive approaches tofind the maximum qualified throughput for both the trans-actional and analytical workloads. The main concern thatarises is that neither workload agent in each benchmark-ing suite is aware of the other agent, creating a disputefor resources as each agent tries to saturate the SUT. TheClient Balancer in HTAPBench follows Gartner’s recom-mendation to specify how the transactional and analyticalagents coexist; instructing the transactional agent to sus-tain a configured throughput and allowing the analyticalagent to saturate the SUT up to the point when the trans-actional throughput is affected. Moreover, HTAPBench alsoaddresses the issues found in CH-Benchmark relating to non-uniform result sets by using the density mechanism to reg-ulate that behavior.

6. CONCLUSIONThis paper proposed HTAPBench, a benchmarking suite

for engines that support hybrid workloads composed of highlevels of transactional activity and, at the same time, providebusiness analytics directly over production data.

As our main contributions, we proposed a unified met-ric that enables the quantitative comparison of very similarsystems, as well as very different systems. We also proposesolutions to the key challenges of a hybrid benchmark, suchas the Client Balancer.

303

Moreover we provide homogeneous and comparable re-sults across executions. We validated our prototype on topof OLTP, OLAP and Hybrid systems, demonstrating theinherent expressiveness of HTAPBench’s metrics to char-acterize such systems. We show that HTAPBench is ableto distinguish different classes of systems (i.e., OLTP fromOLAP), as well as distinguish systems within the same classwith high precision. We ensured that HTAPBench exercisedthe storage layout as expected for each workload type. Theresults allowed us to conclude that by using the proposedapproach we were able to introduce the required workloadrandomness while keeping the results comparable by ensur-ing equal query execution costs across the whole dataset.

7. ACKNOWLEDGMENTSThe authors would like to thank Marco Vieira for all the

constructive reviews and comments on the final stage of thiswork, but also to Martin Arlitt and the anonymous reviewersfor their helpful comments. This work is financed by: (1) theERDF – European Regional Development Fund through theOperational Programme for Competitiveness and Interna-tionalization - COMPETE 2020 Programme within projectPOCI-01-0145-FEDER-006961, and by National Funds throughthe Portuguese funding agency, FCT - Fundacao para aCiencia e a Tecnologia as part of project UID/EEA/50014/2013;(2) the European Union’s Horizon 2020 - The EU FrameworkProgramme for Research and Innovation 2014-2020, undergrant agreement No. 653884.

8. REFERENCES[1] A. Bog, J. Kruger, and J. Schaffner. A composite

benchmark for online transaction processing andoperational reporting. In Advanced Management ofInformation for Globalized Enterprises, 2008. AMIGE2008. IEEE Symposium on, pages 1–5. IEEE, 2008.

[2] A. Bog, H. Plattner, and A. Zeier. A mixedtransaction processing and operational reportingbenchmark. Information Systems Frontiers,13(3):321–335, July 2011.

[3] A. Bog, K. Sachs, and H. Plattner. Interactiveperformance monitoring of a composite oltp and olapworkload. In Proceedings of the 2012 ACM SIGMODInternational Conference on Management of Data,SIGMOD ’12, pages 645–648, New York, NY, USA,2012. ACM.

[4] A. Bog, K. Sachs, and A. Zeier. Benchmarkingdatabase design for mixed oltp and olap workloads. InProceedings of the 2Nd ACM/SPEC InternationalConference on Performance Engineering, ICPE ’11,pages 417–418, New York, NY, USA, 2011. ACM.

[5] R. Cole, F. Funke, L. Giakoumakis, W. Guy,A. Kemper, S. Krompass, H. Kuno, R. Nambiar,T. Neumann, M. Poess, et al. The mixed workloadch-benchmark. In Proceedings of the FourthInternational Workshop on Testing Database Systems,page 8. ACM, 2011.

[6] S. P. Council. Storage Performance Council. 2015.

[7] S. P. E. Council. Standard Performance EvaluationCouncil. 2015.

[8] T. P. P. Council. TPC Benchmark C. 2010.

[9] T. P. P. Council. TPC Benchmark H. 2010.

[10] T. P. P. Council. TPC Benchmark DS. 2012.

[11] T. P. P. Council. TPC Benchmark E. 2015.

[12] T. P. P. Council. The Transaction ProcessingPerformance Council. 2015.

[13] D. E. Difallah, A. Pavlo, C. Curino, andP. Cudre-Mauroux. Oltp-bench: An extensible testbedfor benchmarking relational databases. PVLDB,7(4):277–288, 2013.

[14] C. D. French. ”one size fits all” database architecturesdo not work for dss. In Proceedings of the 1995 ACMSIGMOD International Conference on Management ofData, SIGMOD ’95, pages 449–450, New York, NY,USA, 1995. ACM.

[15] F. Funke, A. Kemper, and T. Neumann.Benchmarking hybrid oltp&olap database systems. InBTW, pages 390–409, 2011.

[16] A. K. Goel, J. Pound, N. Auch, P. Bumbulis,S. MacLean, F. Farber, F. Gropengiesser, C. Mathis,T. Bodner, and W. Lehner. Towards scalable real-timeanalytics: An architecture for scale-out of olxpworkloads. Proc. VLDB Endow., 8(12):1716–1727,Aug. 2015.

[17] T. Haerder and A. Reuter. Principles oftransaction-oriented database recovery. ACM Comput.Surv., 15(4):287–317, Dec. 1983.

[18] R. Kimball, M. Ross, et al. The data warehousetoolkit: the complete guide to dimensional modelling.US: John Wiley & Sons, 2002.

[19] S. S. Lightstone, T. J. Teorey, and T. Nadeau.Physical Database Design: The DatabaseProfessional’s Guide to Exploiting Indexes, Views,Storage, and More. Morgan Kaufmann PublishersInc., San Francisco, CA, USA, 2007.

[20] S. Machine. Splice machine. Technical report, SpliceMacine, 2015.

[21] N. Mukherjee, S. Chavan, M. Colgan, D. Das,M. Gleeson, S. Hase, A. Holloway, H. Jin, J. Kamp,K. Kulkarni, T. Lahiri, J. Loaiza, N. Macnaughton,V. Marwah, A. Mullick, A. Witkowski, J. Yan, andM. Zait. Distributed architecture of oracle databasein-memory. Proc. VLDB Endow., 8(12):1630–1641,Aug. 2015.

[22] M. Nakamura, T. Tabaru, Y. Ujibashi, T. Hashida,M. Kawaba, and L. Harada. Extending postgresql tohandle olxp workloads. In Innovative ComputingTechnology (INTECH), 2015 Fifth InternationalConference on, pages 40–44. IEEE, 2015.

[23] R. O. Nambiar and M. Poess. The making of tpc-ds.In Proceedings of the 32Nd International Conferenceon Very Large Data Bases, VLDB ’06, pages1049–1058. VLDB Endowment, 2006.

[24] NuoDB. Hybrid transaction and analytical processingwith nuodb. Technical report, NuoDB, 2014.

[25] M. Poess and C. Floyd. New tpc benchmarks fordecision support and web commerce. SIGMOD Rec.,29(4):64–71, Dec. 2000.

[26] J. Rossberg and R. Redler. Pro Scalable .NET 2.0Application Designs. Expert’s Voice in .Net. Apress,2005.

[27] VoltDB. Voltdb - whitepaper. Technical report,VoltDB, 2014.

[28] M. Waclawiczek. HTAP - What it Is and Why itMatters. 2015.

304