B 4 gravty
-
Upload
line-corporation -
Category
Technology
-
view
3.615 -
download
0
Transcript of B 4 gravty
1 What Is Gravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans
1 What Is Gravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans
A Graph Database Is “A graph database is a database that uses
graph structures for semantic queries with nodes, edges and properties to represent and store data.” (Wikipedia)
Stores objects (vertices) and relationships (edges)
Provides graph search capabilities
Vertices and Edges in a Graph Database
Fr iends
Fr iends L ikes
Use Cases of a Graph Database
Facebook Social Graph
Social networks
Google PageRank
Ranking websites
Walmart and eBay
Product recommendation
Need for a Large Graph Database System
Social Graph LINE Timeline
LINE Talk Ranking
Recommendation
LINE Friends Shop
LINE News
Gravty
Need for a Large Graph Database System
Social Graph LINE Timeline
LINE Talk Ranking
Recommendation
LINE Friends Shop
LINE News
Gravty
7 billion vertices 100 billion edges 200 billion indexes 5 billion writes a day (create / update / delete)
Gravty Is A scalable graph database to search
relational information efficiently by searching through a large pool of data
using the graph search technique.
Requirements for Gravty
Easy to scale out
• To support ever-increasing data
Easy to develop
• Add, modify, and remove features as necessary
• Tailored to the LINE development environment
• Not dependent on LINE-specif ic components
Full control over everything!
Easy to use
• Graph query language • REST API
1 What Is Gravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans
Technology Stack and Architecture Data Model
Technology Stack and Architecture
Application
TinkerPop3 Gremlin-Console
TinkerPop3 Graph API
Graph Processing Layer
Storage Layer
MySQL (config, meta)
HBase Kafka
Gravty
MySQL (config, meta)
Kafka
Application
TinkerPop3 Gremlin-Console
TinkerPop 3.2.0 Graph API
Graph Processing Layer (OLTP only)
HBase
Storage Layer
Gravty
HBase 1.1.x Local Memory Kafka 0.10.0.0 Phoenix 4.8.0
Application
TinkerPop3 Gremlin-Console
TinkerPop3 Graph API
Gravty Storage Layer (Abstract Interface)
Phoenix Repository (Default)
Memory Repository (Standalone)
Graph Processing Layer
• Row key: vertex-id • Edges are stored in columns • Disadvantages
Data Model Flat-Wide Table
Column scan is slow Columns cannot be split
Row Column
vertex- id1 property property edge edge edge edge edge edge
ver tex- id2 …
vertex- id3 …
• Row key: edge-id
Data Model Tall-Narrow Table (Gravty)
SrcVertexId-Label-TgtVertexId
Row Column
svtxid1-label-tvtxid2 edge property
edge property
svtxid1-label-tvtxid3 …
…
• Edges are stored in rows • Advantages
More effective edge scan Parallel execution
Fr iends
Flat-Wide vs Tall-Narrow
g . V ( “ b r o w n ” ) . o u t ( “ f r i e n d s ” ) . i d ( ) . l i m i t ( 3 )
Brown
Cony
Moon
Sal ly
[cony, moon, sally]
Flat-Wide vs Tall-Narrow Flat-Wide Model
Brown edge edge edge edge edge edge
(1) Row scan
2 operations
(2 ) Co lumn scan
[cony, moon, sally]
‘likes’ ‘friends’
Flat-Wide vs Tall-Narrow Tall-Narrow Model (Gravty)
brown-friends-sally
(1) Row scan
1 operation
[cony, moon, sally]
brown-friends-moon
brown-friends-cony
• Can split by rows (region) • Can isolate hotspot rows • Can scan in parallel
Flat-Wide vs Tall-Narrow
g . V ( “ b r o w n ” ) . o u t ( “ f r i e n d s ” ) . o u t ( “ f r i e n d s ” ) .i d ( ) . l i m i t ( 1 0 )
4 searches in total • Flat-Wide = 8 operat ions • Tall-Narrow (Gravty) = 4 operat ions
1 What Is Gravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans
Faster, Compact Querying Avoiding Hot-Spotting Efficient Secondary Indexing
Faster, Compact Querying
g .V ( b r own ) . h asL ab e l ( " u se r " ) . o u t ( " f r i e n d s ” ) . o rd e r ( ) . b y ( “ n ame ” , O rde r. i n c r ) . l i m i t ( 5 )
Reducing graph traversal steps
GraphStep VertexStep FilterStep RangeStep FilterStep
GGraphStep GVertexStep
Faster, Compact Querying
g . V ( b r o w n ) . o u t E ( " f r i e n d s ” ) . l i m i t ( 5 ) . i n V ( ) . o r d e r ( ) . b y ( " n a m e " , O r d e r. i n c r ) . p r o p e r t i e s ( " n a m e " )
inV(): Pipelined iterator from outE() • TinkerPop: Sequential consuming • Gravty: Paral lel querying + pre-loading ver tex property
Querying in parallel and pre-loading vertex properties
outE( ) “name” : “Boss”
l imi t 5
f r iends
inV()
“na me ” : “ Edw ar d”
“name” : “Moon”
“name” : “ James”
“na me ” : “ J es s i c a”
“name” : “Cony”
“name” : “Sa l l y ”
Row keys that have sequential orders may cause RegionServers to suffer:
Hot-spotting problem with HBase RegionServer
EDGE TABLE
SrcVertexId Label TgtVertexId
u000001 1 u000002
u000001 1 u000003
u000002 1 u000001
u000003 1 u000001
u000004 2 u000009
• Heavy loads of writes or reads • Inefficient region splitting
Avoiding Hot-Spotting
Solutions to the hot-spotting problem - Pre-splitt ing regions - Salting row keys with a hashed prefix (Salting tables by Apache Phoenix)
But, there is a scan performance issue with the LIMIT clause SELECT * FROM index … LIMIT 100;
Avoiding Hot-Spotting
Avoiding Hot-Spotting Phoenix Salted Table
Scan 100 rows
Client side merge sort
Phoenix Client
Result
Scan 100 rows
Scan 100 rows
Scan 100 rows
Scan maximum 400 rows
Avoiding Hot-Spotting Custom Salting + Pre-splitting
hash (source-ver t ex - id )
Result
Phoenix Client
Scan 100 rows sequentially
Row Key Prefix
Indexed graph view for faster graph search
Asynchronous index processing using Kafka
Efficient Secondary Indexing
Tools for failure recovery
Default Phoenix IndexCommitter
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
Put
Dele te
Pu t
Indexer Coprocessor
Phoenix Driver
numConnections = regionServers * regionServers * needConnections
Index update
Index update Too many connections on each RegionServer (Network is heavily congested)
Synchronous processing of index update requests
Gravty IndexCommitter
HRegion
HRegion
HRegion
HRegion
HRegion
HRegion
Put
Dele te
Pu t
Indexer Coprocessor
Phoenix Driver
numConnections = indexers * regionServers * needConnect ions
Muta t ions
Asynchronous processing using Kafka
Kafka
Indexer
Indexer
Index update
Default Phoenix IndexCommitter
1. Phoen ix c l ien t UPSERT
INDEX 1
Phoenix Coprocessor
Region Server
Primary Table
Phoenix Coprocessor
Region Server
INDEX 2
Phoenix Coprocessor
Region Server
PUT
PUT / DELETE
PUT / DELETE 2. Reques t HBase muta t ions fo r indexes in para l le l
RETURN 3. Phoen ix c l ien t re tu rns
Gravty IndexCommitter
INDEX 1
Phoenix Coprocessor
Region Server
Primary Table
Phoenix Coprocessor
Region Server
INDEX 2
Phoenix Coprocessor
Region Server
1.PUT 2. HBase mutations for INDEX 1, 2
4. Consume 3.RETURN
Kafka Index Consumer
5. PUT / DELETE
5. PUT / DELETE
Secondary Indexing Metrics
Server TPS RegionServer Number of connections
3x 1/8
Reentrant event processing
Every row is versioned in HBase (timestamp)
Logging failures and replaying
failed requests
Time machine to resume at
certain runtime Resetting runtime offset
of Kafka consumers
Best-Effort Failover Fail fast, fix later
Monitoring Tools for Failure Recovery Setting alerts and displaying metrics
• Prometheus • Dropwizard metrics • jvm_exporter • Grafana • Ambari
1 What Is Gravty? 2 The Internals of Gravty 3 Fine-Tuning Gravty 4 Future Plans
Client
Graph API
Multiple Graph Clusters Before
Gravty
HBase Cluster
Client
Graph API
After
Gravty
HBase Cluster HBase Cluster
HBase Cluster
HBase Repository Storage Layer
Memory Repository (Standalone)
Phoenix Repository (Default)
HBase Repository
Abstract Inter face
HBase Phoenix Region
Coprocessor Local Memory
Graph analytics system graph computation
OLAP Functionality
TinkerPop Graph Computing API
We will open source Gravty