Clustering with Spark- MapReduce Vichy’14cerin/VICHY2014/... · MapReduce Vichy’14 Tugdual...

Clustering with Spark-MapReduce

Vichy’14Tugdual Sarazin, Mustapha Lebbah, Hanene

tugdual.sarazin@altic.org@TugdualSarazin

● Business Intelligence● Data integration● Open Source● BigData and machine learning

● Computer science laboratory of Paris-Nord University● A3 team : Machine learning and applications● Encadrants: M. Lebbah, H. Azzag

MapReduce

Machine learning with Hadoop MapReduce

Iter. 1

HDFSRead

HDFSWrite

Iter. 2

HDFSRead

HDFSWrite

Machine learning with Spark

Iter. 1

HDFSRead

RAMWrite

Iter. 2

RAMRead

RAMWrite

Clustering (machine learning)

Types of clustering

● Hierarchical clustering

● Density clustering (e.g. DBSCAN)

● Centroid clustering (e.g. k-means)

K-means

data = spark.textFile("hdfs://...") .map(parsePoint)centroids = Array( Point(randX(), randY()), Point(randX(), randY()))

K-means - process the distances for each prototypes

y closestCentroid(p, centroids)

K-means - affectations

K-means - Map(affecations)

y val closest = data.map(p => (closestCentroid(p, centroids), (p, 1)) )

K-means - Reduce(update prototypes)

y val pointStats=closest.reduceByKey{ case ((p1, sum1), (p2, sum2)) => (p1 + p2, sum1 + sum2) }

K-means - Iteration 1

y val pointStats=closest.reduceByKey{ case ((p1, sum1), (p2, sum2)) => (p1 + p2, sum1 + sum2) } pointStats.foreach{case(id, value) => centroids(id) = value._1 / value._2 }

K-means - Iteration 2

y for (i <- 1 until 10) { val closest = data.map(p => (closestCentroid(p, centroids), (p, 1)) ) val pointStats=closest.reduceByKey{ case ((p1, sum1), (p2, sum2)) => (p1 + p2, sum1 + sum2) } pointStats.foreach{case(id, value) => centroids(id) = value._1 / value._2 }}

Self-Organizing Map

SOM benefits● Better clustering performances than k-

means● Linear complexity● Topological visualization

SOM MapReduce - iteration

Speed up

● 100 millions observations● SOM - Map : 10 x 10

Bitm : Biclustering Topological Map

BiClusteringRaw Data

Bitm : speedup

● 2 millions observations● SOM - Map : 5 x 5

Evolutions

● Improve performances● Feature selection● Increase the number of cores

○ Grid 5000○ Google Compute engine

Thank you!Questions?

Clustering with Spark- MapReduce Vichy’14cerin/VICHY2014/... · MapReduce Vichy’14 Tugdual...

Documents

Transcript of Clustering with Spark- MapReduce Vichy’14cerin/VICHY2014/... · MapReduce Vichy’14 Tugdual...

Eurail Map Great Swede n - Rail Plus · St. Etienne Roanne Lyon Grenoble Briançon Modane Chamonix Annecy Avignon Clermont-Ferrand Vichy Neussargues Aurillac Tulle Ussel Montluçon

Spark SQL: Relational Data Processing in Sparkweiwa/teaching/Fall16-COMP6611B/... · 2016-09-06 · Spark SQL: Relational Data Processing in Spark Michael Armbrusty, Reynold S. Xiny,

028381 Chevrolet Spark v4Chevrolet Spark (M200/M250) 03/2005-08/2014 Chevrolet / Daewoo Matiz (M200/M250) 03/2005 – 02/2010 TYPE: 028381 Waarde EC 94/20 Érték e7 00-0021 Valer

VICHY - cdn.arabianhorseresults.com

Trillingsberekening voor eenvoudige voetgangersbruggen Comfort voet- gangersbruggen - ABT · ABT ing. Mustapha Attahiri MSEng RC Ingenieursbureau Gemeente Rotterdam Millennium Bridge

Uitslagen Spijkenisse-Spark Marathon

Introducing Spark Campus

st Philip's Marsh St Anne' spark St Anneß/ Kin sw nham ...

20190511 Vichy A3 partantsK. Chaabane 7 GIORDANO DU PECH 20.870 Robin Bakker 2150 M b 3 Let's Go Along-Rivale du Pech/M. C. Lobjois C. Di Nardo 8aDa1aDa7a J.-L. de Marin de Montmarin

UNSA TRANSPORT URBAIN€¦ · UNSA TRANSPORT URBAIN Gilles ESCARGUEL Christophe ANGER Secrétaire Général Secrétaire Général Adjoint Mustapha BELKHADIM Trésorier Sabrina BASSA

Cps spark-quickstart-handleiding

Amplitude (a. u.) time (ns)cmdo.cnrs.fr/IMG/pdf/Dujardin_JNCO_2007.pdf · 2016. 4. 4. · - Frittage Flash (=Spark Plasma Sintering) (Thiais) : P uniaxiale + énergie apportée sous

Predictive & Advanced Analytics - Home - Adept Events · · 2017-07-06De opkomst van social media platformen heeft er daarnaast ... in-memory analytics using Apache Spark and SAS

SPARK 1 COMPANION LEERWERKBOEK (isbn:9789028964006) Opgelet, enkel te bestellen indien je een nieuwe leerling bent. SPARK 1 LEERWERKBOEK INCL ONLINE OEFENMATERIAAL

Spark 2 32 - VGAZ · 2017. 12. 4. · 4 27 24 2 Spark Juni 20 17 Juni 20 17 Spark 3 Column Samen su cesvol ondernemen Bij de voorplaat: Operator Sjon Sjoester in Rijkevoort. Hij en

VICHY HEMA SPORT-CAMP 2014 - Stade Clermontois Escrimestade-clermontois-escrime.com/wp-content/uploads/... · Présentation du stage VICHY HEMA SPORT-CAMP 2014 est un stage sportif

ENPC · 2020. 12. 2. · AICHA HALA HIND HADIL Mouaad NADIR Mustapha AMEL Louiza Khaoula Haroune RAYANE Feriel Oussama AHLAM Bahaa Eddine MOHAMMED ALI Meriem Kaouther Ahmed ABDELGHANI

C.H.U Mustapha Bacha Service ORL & CCF Pr. Djennaoui

Waarom dragen wij Dhr - Taekwondo Vereniging Moutarazak career of Mustapha Moutarazak... · • Visit to Kukkiwon, June, 2006 / Korea • Visit to Kukkiwon on 1987 and 1988 / Korea

Door: Mustapha, Thomas, Fadumo en Evy Wat te doen bij de Hall of Fame workshops Binnen in de Hall of Fame (plattegrond) Locatie Onze idee(ën) Poster.