Mysql Bigdata

25
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. Unlocking New Big Data Insights with MySQL A MySQL Whitepaper

description

Mysql Bigdata

Transcript of Mysql Bigdata

Page 1: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Unlocking New Big Data Insights with MySQL

A MySQL Whitepaper

Page 2: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 2

Table of Contents

Introduction .................................................................................................... 3

1. Defining Big Data .................................................................................... 3

2. The Internet of things ............................................................................. 4

3. The Lifecycle of Big Data ....................................................................... 6

Step 1: Acquire Data .................................................................................. 8

Step 2: Organize Data ............................................................................. 14

Step 3: Analyze Data ............................................................................... 17

Step 4: Decide ......................................................................................... 18

4. MySQL Big Data Best Practices .......................................................... 20

Conclusion .................................................................................................... 25

Additional Resources .................................................................................. 25

Page 3: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 3

Introduction

Today the terms “Big Data” and “Internet of Things” draw a lot of attention, but behind the hype there's a simple story. For decades, companies have been making business decisions based on traditional “enterprise data”. Beyond that critical data, however, is a potential treasure trove of additional data: weblogs, social media, email, sensors, photographs and much more that can be mined for useful information. Decreases in the cost of both storage and compute power have made it feasible to collect this data - which would have been thrown away only a few years ago. As a result, more and more organizations are looking to include non-traditional yet potentially very valuable data with their traditional enterprise data in their business intelligence analysis. As the world’s most popular open source database, and the leading open source database for Web-based and Cloud-based applications, MySQL is a key component of numerous big data platforms. This whitepaper explores how you can unlock extremely valuable insights using MySQL with the Hadoop platform.

1. Defining Big Data

Big data typically refers to the following types of data:

Traditional enterprise data – includes customer information from CRM systems, transactional ERP data, web store transactions, and general ledger data.

Machine-generated /sensor data – includes Call Detail Records (“CDR”), weblogs, smart meters, manufacturing sensors, equipment logs (often referred to as digital exhaust) and trading systems data.

Social data – includes customer feedback streams, micro-blogging sites like Twitter, social media platforms like Facebook.

The McKinsey Global Institute estimates that data volume is growing 40% per year

1. But while it’s

often the most visible parameter, volume of data is not the only characteristic that matters. We often refer to the “Vs” defining big data:

Volume. Machine-generated data is produced in much larger quantities than non-traditional data. For instance, a single jet engine can generate 10TB of data in 30 minutes. With more than 25,000 airline flights per day, the daily volume of just this single data source runs into the Petabytes. Smart meters and heavy industrial equipment like oil refineries and drilling rigs generate similar data volumes, compounding the problem.

Velocity. Social media data streams – while not as massive as machine-generated data – produce a large influx of opinions and relationships valuable to customer relationship

1 Big data: The next frontier for innovation, competition, and productivity: McKinsey Global Institute 2011

Page 4: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 4

management. Even at 140 characters per tweet, the high velocity (or frequency) of Twitter data ensures large volumes.

Variety. Traditional data formats tend to be relatively well defined by a data schema and change slowly. In contrast, non-traditional data formats exhibit a dizzying rate of change. As new services are added, new sensors deployed, or new marketing campaigns executed, new data types are needed to capture the resultant information.

The Importance of Big Data When big data is distilled and analyzed in combination with traditional enterprise data, organizations can develop a more thorough and insightful understanding of their business, which can lead to enhanced productivity, a stronger competitive position and greater innovation – all of which can have a significant impact on the bottom line. For example, retailers usually know who buys their products. Use of social media and web log files from their ecommerce sites can help them understand who didn’t buy and why they chose not to, information not formerly available to them. This can enable much more effective micro customer segmentation and targeted marketing campaigns, as well as improve supply chain efficiencies through more accurate demand planning. Other common use cases include:

Sentiment analysis

Marketing campaign analysis

Customer churn modeling

Fraud detection

Research and Development

Risk Modeling

And more…

2. The Internet of things

The Big Data imperative is compounded by the Internet of Things, generating an enormous amount of additional data. The devices we use are getting smaller and smarter. They’re connecting more easily, and they’re showing up in every aspect of our lives. This new reality in technology— called the Internet of Things—is about collecting and managing the massive amounts of data from a rapidly growing network of devices and sensors, processing that data, and then sharing it with other connected things. It’s the technology of the future, but you probably have it now—in the smart meter from your utility company, in the environmental controls and security systems in your home, in your activity wristband or in your car’s self-monitoring capabilities.

Page 5: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 5

Gartner estimates the total economic value-add from the Internet of Things across industries will reach US$1.9 trillion worldwide in 2020

2.

For example, just a few years from now, your morning routine might be a little different thanks to Internet of Things technology. Your alarm goes off earlier than usual because your home smart hub has detected traffic conditions suggesting an unusually slow commute. The weather sensor warns of a continued high pollen count, so because of your allergies, you decide to wear your suit with the sensors that track air quality and alert you to allergens that could trigger an attack.

You have time to check your messages at the kitchen e-screen. The test results from your recent medical checkup are in, and there’s a message from your doctor that reiterates his recommendations for a healthier diet. You send this information on to your home smart hub. It automatically displays a chart comparing your results with those of the general population in your age range, and asks you to confirm the change to healthier options on your online grocery order. The e-screen on the refrigerator door suggests yogurt and fresh fruit for breakfast.

Major Advances in Machine-to-Machine Interactions Mean Incredible Changes

The general understanding of how things work on the internet is a familiar pattern: humans connect through a browser to get the information or do the action they want to do on the internet.

The Internet of Things changes that model. In the Internet of Things, things talk to things, and processes have two-way interconnectivity so they can interoperate both locally and globally. Decisions can be made according to predetermined rules, and the resulting actions happen automatically— without the need for human intervention. These new interactions are driving tremendous opportunities for new services.

2 Peter Middleton, Peter Kjeldsen, and Jim Tully, “Forecast: The Internet of Things, Worldwide, 2013,”

(G00259115), Gartner, Inc., November 18, 2013.

Page 6: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 6

The Value of Data

Transforming data into valuable information is no small task. The variables and the risks are real and often uncharted; flexibility and time to market can mean the difference between failure and success. But, with the considerable potential of this developing market, some businesses are aggressively undertaking the challenges. These businesses—the ones planning now for this new technology—will be the ones to succeed and thrive. Oracle delivers an integrated, secure, comprehensive platform for the entire IoT architecture across all vertical markets. For more information on Oracle’s Internet of Things platform, visit: http://www.oracle.com/us/solutions/internetofthings/overview/index.html

We shall now consider the lifecycle of Big Data, and how to leverage the Hadoop platform to derive added value from data acquired in MySQL solutions.

3. The Lifecycle of Big Data

With the exponential growth in data volumes and data types, it is important to consider the complete lifecycle of data, enabling the right technology to be aligned with the right stage of the lifecycle.

Page 7: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 7

Figure 1: The Data Lifecycle

As Figure 1 illustrates, the lifecycle can be distilled into four stages:

Acquire: Data is captured at source, typically as part of ongoing operational processes. Examples include log files from a web server or user profiles and orders stored within a relational database supporting a web service.

Organize: Data is transferred from the various operational systems and consolidated into a big data platform, i.e. Hadoop / HDFS (Hadoop Distributed File System).

Analyze: Data stored in Hadoop is processed, either in batches by Map/Reduce jobs or interactively with technologies such as the Apache Drill or Cloudera Impala initiatives. Data can also be processed in Apache Spark. Hadoop may also perform pre-processing of data before being loaded into data warehouse systems, such as the Oracle Exadata Database Machine.

Decide: The results of the Analyze stage above are presented to users, enabling actions to be taken. For example, the data maybe loaded back into the operational MySQL database supporting a web site, enabling recommendations to be made to buyers; into reporting MySQL databases used to populate the dashboards of BI (Business Intelligence) tools, or into the Oracle Exalytics In-Memory machine.

MySQL in the Big Data Lifecycle

In the following sections, we will consider MySQL in the Big Data Lifecycle as well as the technologies and tools at your disposal at each stage of the lifecycle.

Page 8: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 8

Figure 2: MySQL in the Big Data Lifecycle

Acquire: Through NoSQL APIs, MySQL is able to ingest high volume, high velocity data, without sacrificing ACID guarantees, thereby ensuring data quality. Real-time analytics can also be run against newly acquired data, enabling immediate business insight, before data is loaded into Hadoop. In addition, sensitive data can be pre-processed, for example healthcare or financial services records can be anonymized, before transfer to Hadoop.

Organize: Data can be transferred in batches from MySQL tables to Hadoop using Apache Sqoop or the MySQL Hadoop Applier. With the Applier, users can also invoke real-time change data capture processes to stream new data from MySQL to HDFS as they are committed by the client.

Analyze: Multi-structured data ingested from multiple sources is consolidated and processed within the Hadoop platform.

Decide: The results of the analysis are loaded back to MySQL via Apache Sqoop where they power real-time operational processes or provide analytics for BI tools. Each of these stages and their associated technology are discussed below.

Step 1: Acquire Data

With data volume and velocity exploding, it is vital to be able to ingest data at high speed. For this reason, Oracle has implemented a NoSQL interface directly to the InnoDB storage engine, and additional NoSQL interfaces to MySQL Cluster, which bypass the SQL layer completely. Without SQL parsing and optimization, Key-Value data can be written directly to MySQL tables up to 9x faster, while maintaining ACID guarantees.

In addition, users can continue to run complex queries with SQL across the same data set, providing real-time analytics to the organization.

Page 9: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 9

Native Memcached API access is available for MySQL 5.6 and MySQL Cluster. By using its ubiquitous API for writing and reading data, developers can preserve their investments in Memcached infrastructure by re-using existing Memcached clients, while also eliminating the need for application changes. As discussed later, MySQL Cluster also offers additional NoSQL APIs including Node.js, Java, JPA, HTTP/REST and C++.

On-Line Schema Changes Speed, when combined with flexibility, is essential in the world of big data. Complementing NoSQL access, support for on-line DDL (Data Definition Language) operations in both MySQL 5.6 and MySQL Cluster enables DevOps teams to dynamically evolve and update their database schema to accommodate rapidly changing requirements, such as the need to capture additional data generated by their applications. These changes can be made without database downtime. Using the Memcached interface, developers do not need to define a schema at all when using MySQL Cluster.

NoSQL for the MySQL Database

As illustrated in the following figure, NoSQL for the MySQL database is implemented via a Memcached daemon plug-in to the mysqld process, with the Memcached protocol mapped to the native InnoDB API.

Clients and Applications

MySQL Server Memcached Plug-in

innodb_ memcached

local cache (optional)

Handler API InnoDB API

InnoDB Storage Engine

mysqld process

SQL Memcached Protocol

Figure 3: Memcached API Implementation for InnoDB

Page 10: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 10

With the Memcached code running in the same process space, users can insert and query data at high speed. With simultaneous SQL access, users can maintain all the advanced functionality offered by InnoDB including support for crash-safe transactional storage, Foreign Keys, complex JOIN operations, etc. Benchmarks demonstrate that the NoSQL Memcached API for InnoDB delivers up to 9x higher performance than the SQL interface when inserting new key/value pairs, with a single low-end commodity server

3 supporting nearly 70,000 Transactions per Second.

0

10000

20000

30000

40000

50000

60000

70000

80000

8 32 128 512

TP

S

Client Connections

MySQL 5.6: NoSQL Benchmarking

Memcached API

SQL

Figure 4: Over 9x Faster INSERT Operations

The delivered performance demonstrates MySQL with the native Memcached NoSQL interface is well suited for high-speed inserts with the added assurance of transactional guarantees.

MySQL as Embedded Database MySQL is embedded by over 3,000 ISVs and OEMs. It is for instance a popular choice in Point of Sales (POS) applications, security appliances, network monitoring equipments…etc. In the age of the Internet of Things, those systems are increasingly connected with each other’s, and generating vast amount of potentially valuable data. More information about MySQL as an embedded database is available at: http://www.mysql.com/oem/

MySQL Cluster

MySQL Cluster has many attributes that make it ideal for new generations of high volume, high velocity applications that acquire data at high speed, including:

3 The benchmark was run on an 8-core Intel server configured with 16GB of memory and the Oracle Linux operating system.

Page 11: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 11

In-memory, real-time performance

Auto-sharding across distributed clusters of commodity nodes

Cross-data center geographic replication

Online scaling and schema upgrades

Shared-nothing, fault-tolerant architecture for 99.999% uptime

SQL and NoSQL interfaces As MySQL Cluster stores tables in network-distributed data nodes, rather than in the MySQL Server, there are multiple interfaces available to access the database. The chart below shows all of the access methods available to the developer. The native API for MySQL Cluster is the C++ based NDB API. All other interfaces access the data through the NDB API. At the extreme left hand side of the chart, an application has embedded the NDB API library enabling it to make native C++ calls to the database, and therefore delivering the lowest possible latency. On the extreme right hand side of the chart, MySQL presents a standard SQL interface to the data nodes, providing connectivity to all of the standard MySQL drivers.

MySQL Cluster Data Nodes

NDB API

Native memcached JavaScript

JDBC / ODBC PHP / PERL

Python / Ruby

Clients and Applications

SQL NoSQL

Figure 5: Ultimate Developer Flexibility – MySQL Cluster APIs

Whichever API is used to insert or query data, it is important to emphasize that all of these SQL and NoSQL access methods can be used simultaneously, across the same data set, to provide the ultimate in developer flexibility. Benchmarks executed by Intel and Oracle demonstrate the performance advantages that can be realized by combining NoSQL APIs with the distributed, multi-master design of MySQL Cluster

4.

4 http://mysql.com/why-mysql/benchmarks/mysql-cluster/

Page 12: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 12

1.2 Billion write operations per minute (19.5 million per second) were scaled linearly across a cluster of 30 commodity dual socket (2.6GHz), 8-core Intel servers, each equipped with 64GB of RAM, running Linux and connected via Infiniband. Synchronous replication within node groups was configured, enabling both high performance and high availability – without compromise. In this configuration, each node delivered 650,000 ACID-compliant write operations per second.

0

5

10

15

20

25

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

MillionsofUPDATEsperSecond

MySQLClusterDataNodes

1.2BillionUPDATEsperMinute

Figure 6: MySQL Cluster performance scaling-out on commodity nodes.

These results demonstrate how users can acquire transactional data at high volume and high velocity on commodity hardware using MySQL Cluster. To learn more about the NoSQL APIs for MySQL, and the architecture powering MySQL Cluster, download the Guide to MySQL and NoSQL: http://www.mysql.com/why-mysql/white-papers/mysql-wp-guide-to-nosql.php

MySQL Fabric

MySQL is powering some of the most demanding Web applications, thereby collecting an enormous amount of data potentially adding tremendous value to the businesses capable of harnessing it. MySQL Fabric makes it easier and safer to scale out MySQL databases in order to acquire large amounts of information: Indeed, while MySQL Replication provides the mechanism to scale out reads (having one master MySQL server handle all writes and then load balance reads across as many slave MySQL servers as you need), a single server must handle all of the writes. As modern applications become more and more interactive, the proportion of writes will continue to increase. The ubiquity of social media means that the age of the publish once and read a billions times web site is over. Add to this the promise offered by Cloud platforms - massive, elastic scaling out of the underlying infrastructure - and you get a huge demand for scaling out to dozens, hundreds or even thousands of servers. The most common way to scale out is by sharding the data between multiple MySQL Servers; this can be done vertically (each server holding a discrete subset of the tables - say those for a specific

Page 13: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 13

set of features) or horizontally where each server holds a subset of the rows for a given table. While effective, sharding has required developers and DBAs to invest a lot of effort in building and maintaining complex logic at the application and management layers - detracting from higher value activities. The introduction of MySQL Fabric makes all of this far simpler. MySQL Fabric is designed to manage pools of MySQL Servers - whether just a pair for High Availability or many thousands to cope with scaling out huge web application. MySQL Fabric provides a simple and effective option for High Availability as well as the option of massive, incremental scale-out. It does this without sacrificing the robustness of MySQL and InnoDB; requiring major application changes or needing your Dev Ops teams to move to unfamiliar technologies or abandon their favorite tools. For more information about MySQL Fabric, get “MySQL Fabric - A Guide to Managing MySQL High Availability & Scaling Out”

5.

Figure 7: MySQL Fabric: High Availability + Sharding based scale out

5 http://www.mysql.com/why-mysql/white-papers/mysql-fabric-product-guide/

Page 14: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 14

Step 2: Organize Data

Once data has been acquired into MySQL, many users will run real-time analytics across it to yield immediate insight into their operations. They will then want to load or stream data into their Hadoop platform where it can be consolidated with data from other sources for processing. Referred to as the “Organize” stage, there are two approaches to exporting data from MySQL to Hadoop:

Apache Sqoop (Batch, Bi-Directional)

MySQL Hadoop Applier (Real-Time, Uni-Directional)

Apache Sqoop

Originally developed by Cloudera, Sqoop is now an Apache Top-Level Project

6. Apache Sqoop is a

tool designed for efficiently transferring bulk data between Hadoop and structured datastores such as relational databases. Sqoop can be used to: 1. Import data from MySQL into the Hadoop Distributed File System (HDFS), or related systems

such as Hive and HBase. 2. Extract data from Hadoop – typically the results from processing jobs - and export it back to

MySQL tables. This will be discussed more in the “Decide” stage of the big data lifecycle. 3. Integrate with Oozie

7 to allow users to schedule and automate import / export tasks.

Sqoop uses a connector-based architecture that supports plugins providing connectivity between HDFS and external databases. By default Sqoop includes connectors for most leading databases including MySQL and Oracle Database, in addition to a generic JDBC connector that can be used to connect to any database that is accessible via JDBC. Sqoop also includes a specialized fast-path connector for MySQL that uses MySQL-specific batch tools to transfer data with high throughput.

When using Sqoop, the dataset being transferred is sliced up into different partitions and a map-only job is launched with individual mappers responsible for transferring a slice of this dataset. Each record of the data is handled in a type-safe manner since Sqoop uses the database metadata to infer the data types.

6 http://sqoop.apache.org/

7 http://oozie.apache.org/

Page 15: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 15

Figure 8: Importing Data from MySQL to Hadoop using Sqoop

When initiating the Sqoop import, the user provides a connect string for the database and the name of the table to be imported. As shown in the figure above, the import process is executed in two steps: 1. Sqoop analyzes the database to gather the necessary metadata for the data being imported. 2. Sqoop submits a map-only Hadoop job to the cluster. It is this job that performs the actual data

transfer using the metadata captured in the previous step.

The imported data is saved in a directory on HDFS based on the table being imported, though the user can specify an alternative directory if they wish. By default the data is formatted as CSV (Comma Separated Values), with new lines separating different records. Users can override the format by explicitly specifying the field separator and record terminator characters. You can see practical examples of importing and exporting data with Sqoop on the Apache blog. Credit goes to the ASF for content and diagrams: https://blogs.apache.org/sqoop/entry/apache_sqoop_overview

MySQL Hadoop Applier

Apache Sqoop is a well-proven approach for bulk data loading. However, there are a growing number of use-cases for streaming real-time updates from MySQL into Hadoop for immediate analysis. In addition, the process of bulk loading can place additional demands on production database infrastructure, impacting performance. The MySQL Hadoop Applier is designed to address these issues by performing real-time replication of events between MySQL and Hadoop.

Page 16: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 16

Replication via the Hadoop Applier is implemented by connecting to the MySQL master reading events from the binary log

8 events as soon as they are commited on the MySQL master, and

writing them into a file in HDFS. “Events” describe database changes such as table creation operations or changes to table data. The Hadoop Applier uses an API provided by libhdfs, a C library to manipulate files in HDFS. The library comes precompiled with Hadoop distributions. It connects to the MySQL master to read the binary log and then:

Fetches the row insert events occurring on the master

Decodes these events, extracts data inserted into each field of the row, and uses content handlers to get it in the format required

Appends it to a text file in HDFS. This is demonstrated in the figure below:

Figure 9: MySQL to Hadoop Real-Time Replication

Databases are mapped as separate directories, with their tables mapped as sub-directories with a Hive data warehouse directory. Data inserted into each table is written into text files (named as datafile1.txt) in Hive / HDFS. Data can be in comma separated or other formats, configurable by command line arguments.

8 http://dev.mysql.com/doc/refman/5.6/en/binary-log.html

Page 17: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 17

Figure 10: Mapping between MySQL and HDFS Schema

The installation, configuration and implementation are discussed in detail in the Hadoop Applier blog

9. Integration with Hive is documented as well.

You can download and evaluate Hadoop Applier code from the MySQL labs

10 (select the “Hadoop

Applier” build from the drop down menu). Note that this code is currently a technology preview and not certified or supported for production deployment.

Step 3: Analyze Data

Following data acquisition and organization, the Analyze phase is where the raw data is processed in order to extract insight. With our MySQL data in HDFS, it is accessible to the whole ecosystem of Hadoop related-projects, including tools such as Hive, Pig and Mahout. This data could be processed by Map/Reduce jobs in Hadoop to provide a result set that is then loaded directly into other tools to enable the “Decide” stage, or Map/Reduce outputs serve as pre-processing before dedicated appliances further analyze the data. The data could also be processed via Apache Spark

11. Apache Spark is an open-source data

analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce for certain applications.

9 http://innovating-technology.blogspot.fi/2013/04/mysql-hadoop-applier-part-2.html

10 http://labs.mysql.com

11 http://spark.apache.org/

Page 18: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 18

As we have already seen, Sqoop and the Hadoop Applier are key technologies to connect MySQL with Hadoop available for use with multiple Hadoop distributions, e.g. Cloudera, HortonWorks and MapR.

Step 4: Decide

Results sets from Hadoop processing jobs are loaded back into MySQL tables using Apache Sqoop, where they become actionable for the organization. As with the Import process, Export is performed in two steps as shown in the figure below: 1. Sqoop analyzes MySQL to gather the necessary metadata for the data being exported. 2. Sqoop divides the dataset into splits and then uses individual map tasks to push the splits to

MySQL. Each map task performs this transfer over many transactions in order to ensure optimal throughput and minimal resource utilization.

Figure 11: Exporting Data from Hadoop to MySQL using Sqoop

The user would provide connection parameters for the database when executing the Sqoop export process, along with the HDFS directory from which data will be exported and the name of the MySQL table to be populated. Once the data is in MySQL, it can be consumed by BI tools such as Oracle Business Intelligence solutions, Pentaho, JasperSoft, Talend, etc. to populate dashboards and reporting software.

In many cases, the results can be used to control a real-time operational process that uses MySQL as its database. Continuing with the on-line retail example cited earlier, a Hadoop analysis would have been able to identify specific user preferences. Sqoop can be used to load this data back into MySQL, and so when the user accesses the site in the future, they will receive offers and recommendations based on their preferences and behavior during previous visits.

The following diagram shows the total workflow within a web architecture.

Page 19: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 19

Figure 12: MySQL & Hadoop Integration – Driving a Personalized Web Experience

Integrated Oracle Solution

Oracle also offers an integrated portfolio of big data products. For instance, for Web data acquired in MySQL, the picture could be the following:

Acquire Organize Analyze Decide

Web Data

Acquired in

MySQL

Analyzed with

Oracle Exadata

Organized with

Oracle Big Data

Appliance

Decide Using

Oracle Exalytics

You can learn more about Oracle Big Data solutions here: http://www.oracle.com/us/technologies/big-data/index.html MySQL Enterprise Edition is integrated and certified with the following products:

Oracle Enterprise Manager

Oracle GoldenGate

Oracle Secure Backup

Oracle Audit Vault & Database Firewall

Oracle Fusion Middleware

Page 20: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 20

MyOracle Online Support

Oracle Linux

Oracle VM

Oracle Clusterware

4. MySQL Big Data Best Practices

MySQL 5.6: Enhanced Data Analytics

MySQL 5.6 includes a host of new capabilities that enhance MySQL when deployed as part of a Big Data pipeline:

MySQL Optimizer: Significant Performance Improvement for Complex Analytical

Queries: A combination of Batched Key Access, Multi-Range Reads, Index Condition Pushdown, Subquery and File Sort optimizations have been proven to increase performance by over 250 times!

Improved diagnostics including Optimizer Traces, Performance Schema instrumentation and enhanced EXPLAIN functions enable developers to further tune their queries for highest throughput and lowest latency.

Full Text Search support for the InnoDB storage engine increases the range of queries and workloads that MySQL can serve.

Improved Security with major enhancements to how passwords are implemented, managed and encrypted further protects access to your most sensitive data.

For more details on those capabilities and MySQL 5.6, get the following Guide: http://www.mysql.com/why-mysql/white-papers/whats-new-mysql-5-6/

MySQL Enterprise Edition

For MySQL applications that are part of a Big Data infrastructure, the technical support, advanced features and management tools included in MySQL Enterprise Edition will help you achieve the highest levels of MySQL performance, scalability, security and uptime. In addition to the MySQL Database, MySQL Enterprise Edition includes:

The MySQL Enterprise Monitor:

The MySQL Enterprise Monitor provides at-a-glance views of the health of your MySQL databases. It continuously monitors your MySQL servers and alerts you to potential problems before they impact your system. It’s like having a “virtual DBA” assistant at your side to recommend best practices and eliminate security vulnerabilities, improve replication, and optimize performance. As a result, DBAs and system administrators can manage more servers in less time.

Page 21: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 21

The MySQL Enterprise Monitor is a web-based application that can manage MySQL within the safety of a corporate firewall or remotely in a public cloud. MySQL Enterprise Monitor provides:

Performance & Availability Monitoring - Continuously monitor MySQL queries and performance related server metrics

Visual Query Analysis – Monitor query performance and pinpoint SQL code that is causing a slow-down

InnoDB Monitoring - Monitor key InnoDB metrics that impact MySQL performance

MySQL Cluster Monitoring - Monitor key MySQL Cluster metrics that impact performance and availability

Replication Monitoring – Gain visibility into the performance, and health of all MySQL Masters and Slaves

Backup Monitoring – Ensure your online, hot backups are running as expected

Disk Monitoring – Forecast future capacity requirements using trend analysis and projections.

Security Monitoring - Identify and resolve security vulnerabilities across all MySQL servers

Operating System Monitoring - Monitor operating system level performance metrics such as load average, CPU usage, RAM usage and swap usage

As noted earlier, it is also possible to monitor MySQL via Oracle Enterprise Manager.

The MySQL Query Analyzer

The MySQL Query Analyzer helps developers and DBAs improve application performance by monitoring queries and accurately pinpointing SQL code that is causing a slowdown. Using the Performance Schema with MySQL Server 5.6, data is gathered directly from the MySQL server without the need for any additional software or configuration. Queries are presented in an aggregated view across all MySQL servers so DBAs and developers can filter for specific query problems and identify the code that consumes the most resources. With the MySQL Query Analyzer, DBAs can improve the SQL code during active development and continuously monitor and tune the queries in production.

Page 22: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 22

MySQL Workbench Enterprise Edition

MySQL Workbench is a unified visual tool that enables developers, DBAs, and data architects to design, develop and administer MySQL databases. MySQL Workbench provides advanced data modeling, a flexible SQL editor, and comprehensive administrative tools.

MySQL Workbench allows you to:

Design: MySQL Workbench includes everything a data modeler needs for creating complex ER models, forward and reverse engineering, and also delivers key features for performing difficult change management and documentation tasks that normally require much time and effort.

Develop: MySQL Workbench delivers visual tools for creating, executing, and optimizing SQL queries. The SQL Editor provides color syntax highlighting, reuse of SQL snippets, and

Page 23: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 23

execution history of SQL. The Database Connections Panel enables developers to easily manage database connections. The Object Browser provides instant access to database schema and objects.

Administer: MySQL Workbench provides a visual console to easily administer MySQL environments and gain better visibility into databases. Developers and DBAs can use the visual tools for configuring servers, administering users, and viewing database health.

Migrate: MySQL Workbench now provides a complete, easy to use solution for migrating Microsoft SQL Server, Microsoft Access, Sybase ASE, PostgreSQL, and other RDBMS tables, objects and data to MySQL. Developers and DBAs can quickly and easily convert existing applications to run on MySQL. Migration also supports migrating from earlier versions of MySQL to the latest releases.

MySQL Enterprise Backup MySQL Enterprise Backup performs online, non-blocking “Hot” backups of your MySQL databases. You get a consistent backup copy of your database to recover your data to a precise point in time. In addition, MySQL Enterprise Backup supports creating compressed backup files, and performing backups of subsets of InnoDB tables. Compression typically reduces backup size up to 90% when compared with the size of actual database files, helping to reduce storage costs. In conjunction with the MySQL binlog, users can perform point in time recovery.

MySQL Enterprise Scalability

MySQL Enterprise Scalability enables you to meet the sustained performance and scalability requirements of ever increasing user, query and data loads. The MySQL Thread Pool provides an efficient, thread-handling model designed to reduce overhead in managing client connections, and statement execution threads.

MySQL Enterprise Authentication

MySQL Enterprise Autehntication provides ready to use external authentication modules to easily integrate MySQL with existing security infrastructures including PAM and Windows Active Directory. MySQL users can be authenticated using Pluggable Authentication Modules ("PAM") or native Windows OS services.

MySQL Enterprise Encryption

To protect sensitive data throughout its lifecycle, MySQL Enterprise Encryption provides industry standard functionality for asymmetric encryption (Public Key Cryptography). MySQL Enterprise Encryption provides encryption, key generation, digital signatures and other cryptographic features to help organizations protect confidential data and comply with regulatory requirements including HIPAA, Sarbanes-Oxley, and the PCI Data Security Standard.

MySQL Enterprise Firewall MySQL Enterprise Firewall guards against cyber security threats by providing real-time protection against database specific attacks, such as an SQL Injection. MySQL Enterprise Firewall monitors

Page 24: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 24

for database threats, automatically creates a whitelist of approved SQL statements and blocks unauthorized database activity.

MySQL Enterprise Audit

MySQL Enterprise Audit enables you to quickly and seamlessly add policy-based auditing compliance to new and existing applications. You can dynamically enable user level activity logging, implement activity-based policies, manage audit log files and integrate MySQL auditing with Oracle and third-party solutions

MySQL Enterprise High Availability

MySQL Enterprise High Availability enables you to make your database infrastructure highly available. MySQL provides you with certified and supported solutions.

Oracle Premier Support for MySQL

MySQL Enterprise Edition provides 24x7x365 access to Oracle’s MySQL Support team, staffed by database experts ready to help with the most complex technical issues, and backed by the MySQL developers. Oracle’s Premier support for MySQL provides you with:

24x7x365 phone and online support

Rapid diagnosis and solution to complex issues

Unlimited incidents

Emergency hot fix builds forward compatible with future MySQL releases

Access to Oracle’s MySQL Knowledge Base

Consultative support services

The ability to get MySQL support in 29 languages

In addition to MySQL Enterprise Edition, the following services may also be of interest to Big Data professionals:

Oracle University Oracle University offers an extensive range of MySQL training from introductory courses (i.e. MySQL Essentials, MySQL DBA, etc.) through to advanced certifications such as MySQL Performance Tuning and MySQL Cluster Administration. It is also possible to define custom training plans for delivery on-site. You can learn more about MySQL training from the Oracle University here: http://www.mysql.com/training/

MySQL Consulting To ensure best practices are leveraged from the initial design phase of a project through to implementation and sustaining, users can engage Professional Services consultants. Delivered remote or onsite, these engagements help in optimizing the architecture for scalability, high availability and performance. You can learn more at http://www.mysql.com/consulting/

Page 25: Mysql Bigdata

Copyright © 2015, Oracle and/or its affiliates. All rights reserved.

Page 25

Conclusion

Big Data and the Internet of Things are generating significant transformations in the way organizations capture and analyze new and diverse data streams. As this paper has discussed, MySQL can be seamlessly integrated within a Big Data lifecycle. Using MySQL solutions with the Hadoop platform and following the best practices outlined in this document can enable you to yield more insight than was ever previously imaginable.

Additional Resources MySQL Whitepapers http://www.mysql.com/why-mysql/white-papers/ MySQL Webinars:

Live: http://www.mysql.com/news-and-events/web-seminars/index.html

On Demand: http://www.mysql.com/news-and-events/on-demand-webinars/ MySQL Enterprise Edition Demo: http://www.youtube.com/watch?v=guFOVCOaaF0 MySQL Cluster Demo: https://www.youtube.com/watch?v=A7dBB8_yNJI MySQL Enterprise Edition Trial: http://www.mysql.com/trials/ MySQL Case Studies: http://www.mysql.com/why-mysql/case-studies/ MySQL TCO Savings Calculator: http://mysql.com/tco To contact an Oracle MySQL Representative: http://www.mysql.com/about/contact/