Ml masterclass

31
Machine Learning Masterclass +

Transcript of Ml masterclass

Page 1: Ml masterclass

Machine Learning Masterclass

+

Page 2: Ml masterclass

ML Overview

● “AI”, “ML”, lots of hype - but what does it actually mean?

● Systems that learn from their experience over time, without explicit programming

● We aren’t in the business of building brains… (99% us aren’t at least) and you shouldn’t be either

Page 3: Ml masterclass

ML Fun: Where’s Wally (Waldo)

Page 4: Ml masterclass

ML Fun: Where’s Wally (Waldo)

Page 5: Ml masterclass

Classification

Page 6: Ml masterclass

The Problem

● Given a set of observations we want to be able to predict what class a new point belongs to

● e.g. does this patient have disease X given a set of measurements

Page 7: Ml masterclass

Example Algorithm: Random Forests

● We basically subdivide the data into two with an axis aligned line

Page 8: Ml masterclass

Example Algorithm: Random Forests

● We continue subdividing the data in the areas which have a bad mix of classes

Page 9: Ml masterclass

Example Algorithm: Random Forests

● We build many of these decision trees

● Each perform poorly individually

● Their combined vote is powerful

Page 10: Ml masterclass

Many Algorithms

Page 11: Ml masterclass

Regression

Page 12: Ml masterclass

The Problem

● Given a set of observations we want to be able to predict what value a new point belongs to

● e.g. how profitable will our website be next month? What’s the value of my house?

Page 13: Ml masterclass

Example Algorithm: Gaussian Processes

● We pick a method of how we wish to join the dots

● Simplest case we fit a line to the data

● Infinite functions can join the dots - simpler the better (Occam’s Razor)

Page 14: Ml masterclass

Example Algorithm: Gaussian Processes

● The ‘kernel’ describes what type of trends we expect and how to interpolate

https://github.com/jkfitzsimons/IPyNotebook_MachineLearning/blob/master/Just%20Another%20Kernel%20Cookbook....ipynb

Page 15: Ml masterclass

Example Algorithm: Gaussian Processes

● The ‘kernel’ describes what type of trends we expect and how to interpolate

Page 16: Ml masterclass

Feature Learning

Page 17: Ml masterclass
Page 18: Ml masterclass

Example Algorithm: Autoencoders

● The observations have an extremely complex relationship to the output

● We have a lot of data

● Most of the data is redundant

● We wish to learn the useful latent features

Page 19: Ml masterclass

Example Algorithm: PCA (EigenFaces)

Page 20: Ml masterclass

Example Algorithm: PCA (EigenFaces)

Page 21: Ml masterclass
Page 22: Ml masterclass
Page 23: Ml masterclass
Page 24: Ml masterclass

Yes, but how?

● How does one actually go about using it in any practical setting?

● Many applications invisible - hard to see the actual process

● There are principles and general concerns

● Four main issues: data, pipelining, error risk, institutionalization

Page 25: Ml masterclass

#1: All comes down to data● Quantity is important, but it’s far from being the only thing

● Hygiene is key - structured is better than unstructured, complete is better than partial

● Bottleneck is often knowing what data is important, matched to goals

● Data scientists spend 80%+ of their time cleaning + preprocessing data, before any analysis is done

● Side note: Data science != machine learning; some highly competent data scientists are skilled in ML methods, but they may not necessarily be able to create new algorithms

Page 26: Ml masterclass

#2: Data pipelining

● Having the data is no good if you can’t get it to where it needs to be● Operating in-place is the ultimate, but extremely difficult● The data lake problem: lake grows exponentially, replication● Define streaming vs batch (examples of streaming vs batch)

Page 27: Ml masterclass

#3: Error risk

● Machine learning models are never 100% accurate

● What happens when the model is wrong?

● Play out consequences, their magnitude, and scope

● The best applications have low risk high gain

Page 28: Ml masterclass

#4: Institutionalization● Every project must consider how the results will be used

● Who will use the results? Will the results be factored into decision-making, or will action be taken automatically?

● It’s not just about “doing machine learning”, it’s about creating a culture that uses ML as a core tool

● Data-driven decision making, only more evolved

● Leaders in the space make it so that every person in their organization can answer the “why” question

Page 29: Ml masterclass

A lot of work!

Page 30: Ml masterclass

The Upshot

● Google dropped energy usage in data centers by 40%, which translates to $100M USD / year● Self-driving cars are reality now (Uber, Tesla, countless others)● IBM Watson being used for developing cancer treatments and providing supporting diagnoses● Better security: access control at Amazon● Genome sequencing (makes heavy use of various ML methods)● CERN, LHC: Collision data (Higgs Boson, anyone?)● George Washington University: automatically learning optimal climate models

<shameless plug> Dubai Holding: increase profit margins by 25% in real estate businesses, $12B AED</shameless plug>

Page 31: Ml masterclass