基于Alluxio提升Spark和Hadoop HDFS的系统性能与 … MEM Worker3 SSD HDD MEM Worker2 SSD...
-
Upload
nguyenkhuong -
Category
Documents
-
view
226 -
download
4
Transcript of 基于Alluxio提升Spark和Hadoop HDFS的系统性能与 … MEM Worker3 SSD HDD MEM Worker2 SSD...
-
AlluxioSparkHadoopHDFS
,AlluxioPMC, Maintainer
2017/03/25@China Hadoop Summit 2017()
-
Alluxio1.4
AlluxioSpark DataFrame/RDD
AlluxioHDFSSLA
-
BIG DATA ECOSYSTEM Yesterday
-
BIG DATA ECOSYSTEM Today
4
-
BIG DATA ECOSYSTEM Issue
5
-
BIG DATA ECOSYSTEM With Alluxio
6
-
BIG DATA ECOSYSTEM With Alluxio
7
-
Alluxio
Alluxio(memory-centric)
Alluxio
-
Alluxio
201212AlluxioTachyon0.1.0
20171Alluxio1.4
-
Alluxio
20134100400AlluxioAlluxioIBMIntelRed HatUC BerkeleyYahoo
Popular Open Source Projects Growth
-
PASALab
-
INDUSTRY ADOPTION
12
-
13
-
14
-
Alluxio
Master-Worker Master
Worker
Worker
MEMSSDHDD
Client
MasterWorker
Under File System
Under File System
node 1 node 2 node 3
Master
Client
MEM
Worker1
SSD
HDD
MEM
Worker3
SSD
HDD
MEM
Worker2
SSD
HDD
-
Alluxio
Inode Tree
Inode
Inodeid/
/
Dir0/ Dir1/
Dir2/ File1File0
name : File1List : Block0, Block1, ...checkPointPath : hdfs://xxx:yyy/zzz
...
-
Alluxio
ReadType ---
WriteType ---
ReadType
CACHE_PROMOTE
WorkerWorkerWorkerWorker
CACHE WorkerWorker
NO_CACHE
WriteType
CACHE_THROUGH Worker
MUST_CACHE
Worker
THROUGH Worker
ASYNC_THROUGH Worker
-
Alluxio
Master ZooKeeperMaster
Journal : EditLog + Image
Worker Master
Checkpoint & Lineage
-
bin/alluxio fs [command]
cat
chmod
copyFromLocal
copyToLocal
fileInfo
ls
alluxio://:/
*`bin/alluxio fs rm /data/2014*`
mkdir
mv
rm
touch
mount
unmount
-
API
Java APIAlluxio
FileSystem fs = FileSystem.Factory.get();AlluxioURI path = new AlluxioURI("/myFile");FileOutStream out = fs.createFile(path);out.write(...);out.close();
FileSystem fs = FileSystem.Factory.get();AlluxioURI path = new AlluxioURI("/myFile");FileInStream in = fs.openFile(path);in.read(...);in.close();
Java API Dochttp://alluxio.org/documentation/master/api/java/
Hadoop FileSystem MapReduceSparkalluxio://hdfs://
http://alluxio.org/documentation/master/api/java/
-
Alluxio-FUSE
LinuxAlluxio Linux libfuse
Linux
$ alluxio-fuse.sh mount
open
read
lseek
write
Kernel
Userspace
cat /tmp/alluxio-file
glibc
VFS
FUSE
NFS
Ext4
...
glibc
libfuse
Alluxio
-
API
Alluxiokey-value
APIKeyValueSystem kvs = KeyValueSystem
.Factory().create();
KeyValueStoreWriter writer = kvs.createStore(
new AlluxioURI("alluxio://path/my-kvstore"));
writer.put("100", "foo");
writer.put("200", "bar");
writer.close();
KeyValueStoreReader reader = kvs.openStore(
new AlluxioURI("alluxio://path/kvstore/"));
reader.get("100");
reader.get(300); //null
reader.close();
AlluxioKV Store
batch put
K1get
2
V1 foo
-
StorageTier
SSD
StorageDir
Alluxio Worker StorageTierStorageDir
Allocator ---- StorageDir
GreedyAllocatorMaxFreeAllocatorRoundRobinAllocator Evictor ---- StorageDir
GreedyEvictorLRUEvictorLRFUEvictorPartialLRUEvictor
-
Alluxio
-
Alluxio Alluxio
Alluxio
Alluxio
-
Alluxio SparkHadoop MapReduceFlink
H20Impala
hdfs://ip:port/xxx -> alluxio://ip:port/xxx
Zeppelin
AlluxioAlluxio
-
Alluxio
Alluxio
POSIX
rwx
-
Web
Master WebUI
Worker WebUI
-
Co-located compute and data with memory-speed access to
data
Virtualized different storage systems under a unified namespace
Scale-out architecture
File system API, software only
-
Unification
New workflows
across any data in
any storage system
Orders of
magnitude
improvement in run
time
Choice in compute
and storage grow
each
independently, buy
only what is
needed
Performance Flexibility
-
Alluxio 1.4
AlluxioAPI
Alluxio 1.4.0UFS API
400
-
Alluxio 1.4
REST RESTAlluxio native Java APIJavaAlluxio
RESTAlluxioAlluxio JavaRESTAlluxio
AlluxioAlluxiojavaAlluxioAlluxioAlluxio
-
Alluxio 1.4
Packet Streaming Alluxio 1.4.0Alluxio
2IO
-
-
Alluxio 1.4
Apache HiveContributed By PASALab Apache HiveAlluxio,
(http://www.alluxio.org/docs/master/en/Running-Hive-with-Alluxio.html)
YARN /YARNAlluxio
http://www.alluxio.org/docs/master/en/Running-Hive-with-Alluxio.html
-
Alluxio 1.4
Alluxio Master MapReduce
1
Alluxio
-
Alluxio1.4
AlluxioSpark DataFrame
AlluxioHDFSSLA
-
Spark 2.0.0 + Alluxio 1.2.0
Single worker: Amazon r3.2xlarge61 GB MEM, 8-core CPU
Comparisons:
Alluxio
Spark Storage Level: MEMORY_ONLY
Spark Storage Level: MEMORY_ONLY_SER
Spark Storage Level: DISK_ONLY
19
-
23
0
50
100
150
200
250
0 10 20 30 40 50
Tim
e [seconds]
DataFrame Size [GB]
READING CACHED DATAFRAME (PARQUET)
Alluxio (textFile) DISK_ONLY
MEMORY_ONLY_SER MEMORY_ONLY
-
24
0 50 100 150 200 250
Alluxio
No Alluxio
Time [seconds]
READ 50 GB DATAFRAME
(SSD)
-
25
0 250 500 750 1000 1250 1500 1750
Alluxio
No Alluxio
Time [seconds]
READ 50 GB DATAFRAME
(S3)
10x average speedup, 17x peak speedup
-
Alluxio1.4
AlluxioSpark DataFrame/RDD
AlluxioHDFSSLA
-
AlluxioHDFSSLA
AlluxioHDFS 10
SLAservice-level agreement
1002
-
1
1. Alluxio2. AlluxioAlluxio10
IO
-
2
1. Alluxio2. AlluxioI/OIOCPU3.Alluxio1
I/O CPU
-
3
1. Alluxio
2. CPU
Alluxio
CPU I/O
-
4
1. Alluxio
I/O2. Alluxio
CPU
-
Alluxio1.4
AlluxioSpark DataFrame/RDD
AlluxioHDFSSLA
-
AlluxioHadoop/Spark AlluxioSpark
Alluxio
Alluxio
-
Alluxio
http://alluxio.org/documentation/master/cn/index.html
http://alluxio.org/documentation/master/cn/index.html
-
The End & Thank you!
http://alluxio.org/
http://alluxio.org/