AlluxioSparkHadoopHDFS
,AlluxioPMC, Maintainer
2017/03/25@China Hadoop Summit 2017()
Alluxio1.4
AlluxioSpark DataFrame/RDD
AlluxioHDFSSLA
BIG DATA ECOSYSTEM Yesterday
BIG DATA ECOSYSTEM Today
4
BIG DATA ECOSYSTEM Issue
5
BIG DATA ECOSYSTEM With Alluxio
6
BIG DATA ECOSYSTEM With Alluxio
7
Alluxio
Alluxio(memory-centric)
Alluxio
Alluxio
201212AlluxioTachyon0.1.0
20171Alluxio1.4
Alluxio
20134100400AlluxioAlluxioIBMIntelRed HatUC BerkeleyYahoo
Popular Open Source Projects Growth
PASALab
INDUSTRY ADOPTION
12
13
14
Alluxio
Master-Worker Master
Worker
Worker
MEMSSDHDD
Client
MasterWorker
Under File System
Under File System
node 1 node 2 node 3
Master
Client
MEM
Worker1
SSD
HDD
MEM
Worker3
SSD
HDD
MEM
Worker2
SSD
HDD
Alluxio
Inode Tree
Inode
Inodeid/
/
Dir0/ Dir1/
Dir2/ File1File0
name : File1List : Block0, Block1, ...checkPointPath : hdfs://xxx:yyy/zzz
...
Alluxio
ReadType ---
WriteType ---
ReadType
CACHE_PROMOTE
WorkerWorkerWorkerWorker
CACHE WorkerWorker
NO_CACHE
WriteType
CACHE_THROUGH Worker
MUST_CACHE
Worker
THROUGH Worker
ASYNC_THROUGH Worker
Alluxio
Master ZooKeeperMaster
Journal : EditLog + Image
Worker Master
Checkpoint & Lineage
bin/alluxio fs [command]
cat
chmod
copyFromLocal
copyToLocal
fileInfo
ls
alluxio://:/
*`bin/alluxio fs rm /data/2014*`
mkdir
mv
rm
touch
mount
unmount
API
Java APIAlluxio
FileSystem fs = FileSystem.Factory.get();AlluxioURI path = new AlluxioURI("/myFile");FileOutStream out = fs.createFile(path);out.write(...);out.close();
FileSystem fs = FileSystem.Factory.get();AlluxioURI path = new AlluxioURI("/myFile");FileInStream in = fs.openFile(path);in.read(...);in.close();
Java API Dochttp://alluxio.org/documentation/master/api/java/
Hadoop FileSystem MapReduceSparkalluxio://hdfs://
http://alluxio.org/documentation/master/api/java/
Alluxio-FUSE
LinuxAlluxio Linux libfuse
Linux
$ alluxio-fuse.sh mount
open
read
lseek
write
Kernel
Userspace
cat /tmp/alluxio-file
glibc
VFS
FUSE
NFS
Ext4
...
glibc
libfuse
Alluxio
API
Alluxiokey-value
APIKeyValueSystem kvs = KeyValueSystem
.Factory().create();
KeyValueStoreWriter writer = kvs.createStore(
new AlluxioURI("alluxio://path/my-kvstore"));
writer.put("100", "foo");
writer.put("200", "bar");
writer.close();
KeyValueStoreReader reader = kvs.openStore(
new AlluxioURI("alluxio://path/kvstore/"));
reader.get("100");
reader.get(300); //null
reader.close();
AlluxioKV Store
batch put
K1get
2
V1 foo
StorageTier
SSD
StorageDir
Alluxio Worker StorageTierStorageDir
Allocator ---- StorageDir
GreedyAllocatorMaxFreeAllocatorRoundRobinAllocator Evictor ---- StorageDir
GreedyEvictorLRUEvictorLRFUEvictorPartialLRUEvictor
Alluxio
Alluxio Alluxio
Alluxio
Alluxio
Alluxio SparkHadoop MapReduceFlink
H20Impala
hdfs://ip:port/xxx -> alluxio://ip:port/xxx
Zeppelin
AlluxioAlluxio
Alluxio
Alluxio
POSIX
rwx
Web
Master WebUI
Worker WebUI
Co-located compute and data with memory-speed access to
data
Virtualized different storage systems under a unified namespace
Scale-out architecture
File system API, software only
Unification
New workflows
across any data in
any storage system
Orders of
magnitude
improvement in run
time
Choice in compute
and storage grow
each
independently, buy
only what is
needed
Performance Flexibility
Alluxio 1.4
AlluxioAPI
Alluxio 1.4.0UFS API
400
Alluxio 1.4
REST RESTAlluxio native Java APIJavaAlluxio
RESTAlluxioAlluxio JavaRESTAlluxio
AlluxioAlluxiojavaAlluxioAlluxioAlluxio
Alluxio 1.4
Packet Streaming Alluxio 1.4.0Alluxio
2IO
-
Alluxio 1.4
Apache HiveContributed By PASALab Apache HiveAlluxio,
(http://www.alluxio.org/docs/master/en/Running-Hive-with-Alluxio.html)
YARN /YARNAlluxio
http://www.alluxio.org/docs/master/en/Running-Hive-with-Alluxio.html
Alluxio 1.4
Alluxio Master MapReduce
1
Alluxio
Alluxio1.4
AlluxioSpark DataFrame
AlluxioHDFSSLA
Spark 2.0.0 + Alluxio 1.2.0
Single worker: Amazon r3.2xlarge61 GB MEM, 8-core CPU
Comparisons:
Alluxio
Spark Storage Level: MEMORY_ONLY
Spark Storage Level: MEMORY_ONLY_SER
Spark Storage Level: DISK_ONLY
19
23
0
50
100
150
200
250
0 10 20 30 40 50
Tim
e [seconds]
DataFrame Size [GB]
READING CACHED DATAFRAME (PARQUET)
Alluxio (textFile) DISK_ONLY
MEMORY_ONLY_SER MEMORY_ONLY
24
0 50 100 150 200 250
Alluxio
No Alluxio
Time [seconds]
READ 50 GB DATAFRAME
(SSD)
25
0 250 500 750 1000 1250 1500 1750
Alluxio
No Alluxio
Time [seconds]
READ 50 GB DATAFRAME
(S3)
10x average speedup, 17x peak speedup
Alluxio1.4
AlluxioSpark DataFrame/RDD
AlluxioHDFSSLA
AlluxioHDFSSLA
AlluxioHDFS 10
SLAservice-level agreement
1002
1
1. Alluxio2. AlluxioAlluxio10
IO
2
1. Alluxio2. AlluxioI/OIOCPU3.Alluxio1
I/O CPU
3
1. Alluxio
2. CPU
Alluxio
CPU I/O
4
1. Alluxio
I/O2. Alluxio
CPU
Alluxio1.4
AlluxioSpark DataFrame/RDD
AlluxioHDFSSLA
AlluxioHadoop/Spark AlluxioSpark
Alluxio
Alluxio
Alluxio
http://alluxio.org/documentation/master/cn/index.html
http://alluxio.org/documentation/master/cn/index.html
The End & Thank you!
http://alluxio.org/
http://alluxio.org/Top Related