ﻢِ ﺴﺑِ ا ﻦِﻤﺣﺮﻟا ﻪﻠﻟا ِﻢِ · personality and a great machine...
Transcript of ﻢِ ﺴﺑِ ا ﻦِﻤﺣﺮﻟا ﻪﻠﻟا ِﻢِ · personality and a great machine...
ن ام
ح حيم بسم الله الر لر
Machine LearningLecture 01 – Basics of Machine Learning
Dr. Rao Muhammad Adeel Nawab
Dr. Rao Muhammad Adeel Nawab 3
How to Work?
ا .�م ��ن
ا . �� �� �م ��نا اهللا � �� � ��� � .� �م ��ن
﮵تت﮴ :آ ك � ���ك نعبد وا �� �
تعني ا س�ر�
تا اهللا � �ى � �دت �� �: � . �ى
اور � � � � د �� �
Dr. Rao Muhammad Adeel Nawab 4
. � � � �ر�. � �� �د�� �ں � �� � �
�ط : د� تقمي رص �ط ٱٱلمس� �ن ٱ�نعمت �لهي ٱٱهد� ٱٱلرص م ٱٱ��ر�
ت. � ا�م �� �� راه د� ان ��ں � راه � � �: �
Power of Effort & Dua
Dr. Rao Muhammad Adeel Nawab 5
�هم� خر يل وا�رت يل الل�متنا ال� ما �ل
�ب�انك ال �مل لنا ا س�
نت العلمي الحكمي �ك ٱ� ن�ا
ح يل صدري رب ارش يل امري و�رس
وا�لل عقدة من لساين یفقهوا قويل
Dua – Take Help from Allah Before Starting Any Task
Dr. Rao Muhammad Adeel Nawab 6
Five Types of Training
PoliceEliteRangersArmyCommando
Dr. Rao Muhammad Adeel Nawab 7
Commando Training
Commando Commando is a Man of Character and he must Safeguard his Character
Dr. Rao Muhammad Adeel Nawab 8
ادب
��با ادب �با �، � ادب �، وہ �� � � �� �� � � �
ا� � آ � � ا� � �م � �وہ،� �ٹ ��ٹ
Dr. Rao Muhammad Adeel Nawab 9
ا� � �
ا ��ا� � �� را� � ��ت
Dr. Rao Muhammad Adeel Nawab 10
ا� � �
� �وم � وہ ادب� � �وم ��ا� � اهللا عننه�ل ا�� رو� ( ى
ن)ر�
Dr. Rao Muhammad Adeel Nawab 11
Dr. Rao Muhammad Adeel Nawab – About Me
University of Sheffield, UKPhD
COMSATS University Islamabad, Lahore CampusAssistant Professor
NLP Group, CUI, Lahore CampusGroup Lead
Dr. Rao Muhammad Adeel Nawab 12
Publications
Research Supervisions 10 PhD Studentsunder supervision (in different countries)
35+ MPhil StudentsSupervised
34 International Publications13 Impact Factor Journal Papers
21 International Conference/Workshop Papers
Citations & H-Index220+ Citations
H-Index = 9
Dr. Rao Muhammad Adeel Nawab 13
About ME: Research Profile on Google Scholar
https://scholar.google.com/citations?user=fUn6DXYAAAAJ&hl=en
Dr. Rao Muhammad Adeel Nawab 14
Course Details
Instructor: Dr. Rao Muhammad Adeel Nawab
Email: [email protected]
Office: Room no. 04, Faculty Block
CUI Profile: https://lahore.comsats.edu.pk/Employees/74
Dr. Rao Muhammad Adeel Nawab 15
Course Details – Classes For MS/PhD Course
Lecture 01 & 02
D-110 (D Block)
Tuesday (17:30 – 20:30)
Dr. Rao Muhammad Adeel Nawab 16
Course Details – Classes For BS Course
Lec 01Lec 02
Lec 01Lec 02
N-4 (N Block)C-6 (C Block)
N-10 (N Block)N-9 (N Block)
Monday (11:30 – 13:00)Monday (14:30 – 16:00)
Wednesday (11:30 – 13:00)Wednesday (14:30 – 16:30)
Dr. Rao Muhammad Adeel Nawab 17
Course Details – For MS/PhD Course (Cont…)
25%
25%
50%CourseworkMidterm ExaminationFinal Examination
Google Classroom Code: iw1bdyNote: Join using CUI-Lahore email ID
Assessment
Office Hours: Email requests for appointment
Dr. Rao Muhammad Adeel Nawab 18
Course Details – For BS Course (Cont…)
10%15%
25%
50%S-I
S-II
Coursework
Final Examination
Google Classroom Code: gjftt1zNote: Join using CUI-Lahore email ID
Assessment
Office Hours: Email requests for appointment
Dr. Rao Muhammad Adeel Nawab 19
Mainly get Excellence in two things
Course Focus
Become a great Human Being
Become a great Machine Learning Engineer
Dr. Rao Muhammad Adeel Nawab 20
Course Aims
1
2
To introduce with the main concepts that are essential to become a great personality and a great machine learning engineer
To develop skills to systematically learn any concept
To introduce some of the main topics in machine learning, especially learning from examples (automated concept learning or classification)
Course Aims
3
Dr. Rao Muhammad Adeel Nawab 21
Course Aims (Cont…)
5
4
To develop skills in designing and building serious artificial intelligence programs.
To develop skills to understand, analyze and pre-process a dataset / corpus to train and test machine learning algorithmsCourse
Aims
Dr. Rao Muhammad Adeel Nawab 22
Course Learning Outcomes (CLOs)
By the end of this course, the students should be able toUnderstand what daily tasks are important to have ahealthy (physically, mentally and socially) andcharacterful personality
Understand some of the main approaches/algorithmsthat are used for representing concepts and learningthem automatically
Analyze datasets and transform them intorepresentations that will permit machine learningalgorithms to learn concepts/discover patterns in them
Evaluate competing machine learning algorithms overthe same dataset(s)
Dr. Rao Muhammad Adeel Nawab 23
Pre-Requisites
Good Programming Skills
Dr. Rao Muhammad Adeel Nawab 24
Course Content – Key TopicsBasics of Machine LearningConcept Learning and General-to-Specific OrderingDecision Tree LearningArtificial Neural NetworkDeep Learning (MS / PhD)Evaluating HypothesisBayesian LearningGenetic Algorithms
Multi-Label Classification (MS / PhD)Instance Based Learning
Course Project Presentations
Dr. Rao Muhammad Adeel Nawab 25
References
Book 1T. Mitchell.
Machine Learning.WCB/McGraw-Hill,Boston, 1997.http://www.cs.cmu.edu/~tom/mlbook.html
Book 2I. H. Witten and E.Frank. Data Mining:Practical MachineLearning Tools andTechniques withJavaImplementations,3rd Edition, MorganKaufmann, SanFrancisco, 2011.
Book 3S. Russell and P.Norvig. ArtificialIntelligence: AModern Approach(2nd ed), PearsonEducation, 2003.
Book 4N. J. Nilsson.Introduction toMachine Learning,Draft of a proposedtextbook, 1997.Available at:http://robotics.stanford.edu/people/nilsson/mlbook.html.
Dr. Rao Muhammad Adeel Nawab 26
Journals and Conferences
JournalMachine Learning
JournalJournal of Machine Learning Research
ConferenceNeural Information Processing Systems (NIPS)
ConferenceInternational Conference on Machine Learning
JournalJournal of Intelligent Systems
JournalNeural Computation
Dr. Rao Muhammad Adeel Nawab 27
Machine Learning Toolkits
Scikit-learn (a.k.a. sk-learn)http://scikit-learn.org/stable/
Documentationhttp://scikit-learn.org/stable/documentation.html
WEKAhttps://www.cs.waikato.ac.nz/ml/weka/
Documentationhttps://www.cs.waikato.ac.nz/ml/weka/documentation.html
Python Java
Dr. Rao Muhammad Adeel Nawab 28
Lecture OutlineWhat is Machine Learning Learning Input-Output Functions – General SettingsTypes of Machine LearningPhases of Machine LearningTraining RegimesTreating a Problem as a Machine Learning Problem – A Step by Step Example
Reading Chapter 1 of MitchellChapter 1 of Witten & Frank
What is Machine Learning
Dr. Rao Muhammad Adeel Nawab 30
Ultimate Goal
Ultimate Goal
To develop such a machine which behaves like human
Dr. Rao Muhammad Adeel Nawab 31
Machine Learning
The study of how to design computerprograms whose performance at some taskimproves through experience
Dr. Rao Muhammad Adeel Nawab 32
Machine Learning
A computer program is said to be learnFrom experience EWith respect to some class of tasks TPerformance measure P
If its performance at tasks in T as measured by P improves with experience E
Dr. Rao Muhammad Adeel Nawab 33
Machine Learning - ExampleLearn to Drive a Car
Class of Tasks (T)Starting Car, Changing Gear, Control on Accelerator, Control on Breaks, Parking Car
Performance Measure (P)Efficiency
Experience (E)No. of hours spent in driving car
A person is said to be learn to drive a car if:His / her performance P on class of tasks T is improving with experience E
Dr. Rao Muhammad Adeel Nawab 34
Machine Learning vs Traditional Programming
Machine Learning
Traditional Programming
Data
ProgramComputer Output
Data
OutputComputer Program
Dr. Rao Muhammad Adeel Nawab 35
Types of LearningInductive LearningDeductive Learning
Information
Dr. Rao Muhammad Adeel Nawab 36
Deductive Learning
Deductive learning Works on existing facts and knowledge Does not generate "new" knowledge at allMakes the reasoning system more efficient
Dr. Rao Muhammad Adeel Nawab 37
Deductive Learning – Example
Example
Concept to be Learned - Throwing a ball in airHow Deductive Learning works?
I know Newton's Laws of GravitationSo, I conclude that if I let a ball go, it will certainly fall downwards
Dr. Rao Muhammad Adeel Nawab 38
Inductive Learning
Inductive learning Takes examples of a concept and generalizes rather than starting withexisting knowledgeGenerates “new” knowledgeHas “scope of error”
Several successful systems developed usinginductive learning approach
Dr. Rao Muhammad Adeel Nawab 39
Inductive Learning – Example
Take examples of the concept to be learned1 example – 1 time I throw ball in the air50 examples – 50 times I throw ball in the air100 examples – 100 times I throw ball in the air
Learn from ExamplesI throw a ball 100 times in the air and learned that every time
(100 times) I throw the ball in the air, it falls downwardGeneralize the Concept Learned from Examples
I conclude that if I let a ball go, it will certainly fall downwards
Concept to be Learned - Throwing a ball in airHow Inductive Learning works?
Dr. Rao Muhammad Adeel Nawab 40
Machine Learning – Disciplines Contributing
Machine Learning -Disciplines
Contributing
Statistics
Artificial Intelligence
Biology
Cognitive Science
Information Theory
Philosophy
Control Theory
Computational Complexity
Dr. Rao Muhammad Adeel Nawab 41
Machine Learning – Why to Study
Why to Study
Technological or Engineering MotivationTo build computer systems that can improve their performance at tasks with experience (data)Massive growth in on-line data
Cognitive Science MotivationTo understand better how humans learn by modeling the learning process
Dr. Rao Muhammad Adeel Nawab 42
Machine Learning – Why to Study (Cont…)
Why to Study
To understand better properties of various algorithms for function (or concept to be learned) approximation
How the data must be represented?How much data they require?How accurate they can be?How to choose optimal data for training?
Dr. Rao Muhammad Adeel Nawab 43
Machine Learning - Areas of Application
Machine Learning – Areas
of Application
Data miningMedicineBusinessAgricultureComputer gamesSoftware applicationsPersonalized / SelfCustomizing program
Dr. Rao Muhammad Adeel Nawab 44
Machine Learning – Why it is Hard
You See Your ML Algorithm Sees
Dr. Rao Muhammad Adeel Nawab 45
Machine Learning – Why it is Hard (Cont…)
What is a “2”?
Dr. Rao Muhammad Adeel Nawab 46
Machine Learning – When to Use
When patterns exist in our data.even if we don’t know what they are?
When we have lots of data
Dr. Rao Muhammad Adeel Nawab 47
Machine Learning - Summary
Data = Model + Error
Learn from Data
Learning Input – Output Functions - General Settings
Dr. Rao Muhammad Adeel Nawab 49
What is to be Learned
Function
Program
Finite state machine
Grammar
Problem solving system
Most of ML revolves around Learning Input-Output Functions or Function Approximation or Function Learning or Concept Learning
Dr. Rao Muhammad Adeel Nawab 50
Concept Learning
How Concept Learning Works?Takes examples of a concept and Tries to build a general description of the concept
Concept Learning A major subclass of inductive learning
Very often, the examples are described using attribute-value pairs
Dr. Rao Muhammad Adeel Nawab 51
Concept Learning
Learn a generalized target function f (or target concept c) from a set of examples
Target function (f) cannot be completely learned, however it can be approximated
Hypothesis (h) – is an approximation of the target function f
Goal
Scope of Error in Inductive Learning
Dr. Rao Muhammad Adeel Nawab 52
Learning Input Output Functions – General Settings
Goal - We are trying to learn a function ff is often referred to as the target function
Input f Outputf takes a vector-valued input a n-tuple x = (x1, x2, . . .xn)f itself may be vector-valued yielding a k-tuple asoutput
Often f produces a single output value E.g. k = 1 (in such cases the output is not thought of
as a vector)
Dr. Rao Muhammad Adeel Nawab 53
Job of the learner is to output a hypothesis hwhich is its guess or approximation of the
target function f
h is assumed a priori to be drawn from a class of functions H
Note that f may or may not be in H and this may / may not be known
Learning Input Output Functions – General Settings
Dr. Rao Muhammad Adeel Nawab 54
Learning Input Output Functions – General Settings
Target function f
x1x2...
xn
X =
y1y2...
yn
Y or
h1 h2
hi hk
Hypothesis Space 𝓗𝓗
Training Example = {x1, …. xm} +f(xi) for each xi ϵ TE
Learner
Dr. Rao Muhammad Adeel Nawab 55
Concept Learning - Representation
HypothesisWill be discussed in next lecture
Examples or DataWill be discussed in this lecture
12
Machine is dump, need to represent two things
Dr. Rao Muhammad Adeel Nawab 56
Example
Example = Input + Output
Example (a.k.a. instance, data point, observation)
Dr. Rao Muhammad Adeel Nawab 57
Example - RepresentationRepresentation of Input Representation of Output
Attribute-value pair Attribute-value pairInput can be
Single valued or vector valued (mostly vector valued)
Output can beSingle valued or vector valued (mostly single valued)
“Values of attributes” can beCategorical /Ordinal
e.g. Male, Female, Yes, NoNumeric
Discrete – e.g.10, 25, 10000Continuous – e.g. 3.5, 5.9
“Values of attributes” can beCategorical / Ordinal
e.g. Male, Female, Yes, NoNumeric
Discrete – e.g. 10, 25, 10000Continuous – e.g. 3.5, 5.9
Dr. Rao Muhammad Adeel Nawab 58
Input
Gender of a HumanOutput
Human
Instance Instance = Input + Output
Example – Gender Identification
Concept Learning - Representation of Examples/Instances
Dr. Rao Muhammad Adeel Nawab 59
Representation of InputSet of attributes with possible values
HINT: Try to identify the most discriminating attributes for a learning problem
Representation of OutputSingle Attribute with possible valuesGender – Male, Female
Attribute Possible Values
Height Short Medium Tall
Weight Small Medium HeavyBeard Yes No -
Concept Learning - Representation of Examples/Instances
Dr. Rao Muhammad Adeel Nawab 60
Note the difference between
Attribute
Attribute Value / Value of Attributee.g. Height is an “attribute” and Tall is “value of attribute”
To SummarizeInstance is a “vector of attribute values”
Concept Learning - Representation of Examples/Instances
Dr. Rao Muhammad Adeel Nawab 61
Below are 3 possible instances for the GenderIdentification learning problem
Instance no.
Height Weight Beard Gender
1 Short Medium No Female
2 Tall Heavy Yes Male
3 Medium Medium No Female
Concept Learning - Representation of Examples/Instances
Dr. Rao Muhammad Adeel Nawab 62
Example 01 - Learning Input-Output Functions
1234
14916
Input Output
Here the concept to learn is x2
Instance # Input Output1 1 12 2 43 3 94 4 16
Input
SingleOutput
Single
Dr. Rao Muhammad Adeel Nawab 63
Example 02 - Learning Input-Output Functions
Input
SingleOutput
Vector ValuedVector of “input attribute values”
[1, 2, 3][2, 3, 4][3, 4, 5][4, 5, 6]
14916
Input Output
Here the concept to learn is: [a,b,c] -> a*c – b
Instance # Input Output1 [1,2,3] 12 [2,3,4] 53 [3,4,5] 114 [4,5,6] 19
Dr. Rao Muhammad Adeel Nawab 64
Example 03 - Learning Input-Output FunctionsInput
SingleOutput
Vector ValuedVector of “input attribute values” Output
Set of Input Vectors
Dr. Rao Muhammad Adeel Nawab 65
Learning Input-Output Functions - Summary
Summary Learn from input to predict output
Dr. Rao Muhammad Adeel Nawab 66
Machine Learning - Real World Examples
Gender of the person (Male, Female) Output
Human (represented as a fixed set of attributes)
Goal Learn from Input (human - fixed set of attributes) to predict Output (Male, Female)
Input
Task 01 – Gender Identification
Dr. Rao Muhammad Adeel Nawab 67
Machine Learning - Real World Examples
Examples
Summary – Gender Identification TaskInput – Structured Data (fixed set of attributes)Output – Fixed Set
Instance no. Height Weight Beard Gender 1 Short Medium No Female2 Tall Heavy Yes Male3 Medium Medium No Female
Task 01 – Gender Identification
Dr. Rao Muhammad Adeel Nawab 68
Machine Learning - Real World Examples
Gender of the person who wrote the comment / review (Male, Female)
Output
Comment / Review written by a human (Text Data)
Goal Learn from Input (review / comment) to predict Output (Male, Female)
Input
Task 02 – Gender Identification
Dr. Rao Muhammad Adeel Nawab 69
Machine Learning - Real World Examples
Instance no. Comment / Review Gender 1 iPhone7 is a good mobile Male2 Battery of this phone is bad Female3 I am using iphone7 Male
Task 02 – Gender Identification
Examples
Summary – Gender Identification TaskInput – Unstructured Data (of Variable Length)Output – Fixed Set
Dr. Rao Muhammad Adeel Nawab 70
Machine Learning - Real World Examples
Gender of the person in the image (Male, Female) Output
Image (Image Data)
Goal Learn from Input (image) to predict Output (Male, Female)
Input
Task 03 – Gender Identification
Dr. Rao Muhammad Adeel Nawab 71
Machine Learning - Real World Examples
Examples
Male MaleFemale
Task 03 – Gender Identification
Summary – Gender Identification TaskInput – Unstructured Data (of Variable Length) in the
form of pixels (extracted from input image)Output – Fixed Set
Dr. Rao Muhammad Adeel Nawab 72
Machine Learning - Real World Examples
Polarity of the comment / review (Positive, Negative, Neutral)OutputComment / Review written by a human (text data)
Goal Learn from Input (comment / review) to predict Output (Positive, Negative, Neutral)
Input
Instance no. Comment / Review Gender 1 iPhone7 is a good mobile Positive 2 Battery of this phone is bad Negative3 I am using iphone7 Neutral
Task 04 – Sentiment AnalysisEx
ampl
es
Dr. Rao Muhammad Adeel Nawab 73
Machine Learning - Real World Examples
Task 04 – Sentiment Analysis
Summary – Sentiment Analysis TaskInput – Unstructured Data (of Variable Length)Output – Fixed Set
Dr. Rao Muhammad Adeel Nawab 74
Machine Learning - Real World Examples
Text in Urdu (Target Language) Output
Text in English (Source Language)
Goal Learn from Input (source language) to predict Output (Translation in target language)
Input
Task 05 – Machine Translation
Dr. Rao Muhammad Adeel Nawab 75
Machine Learning - Real World Examples
Instance no. Source Language Target Language 1 Allah is one هللا ایک ہے۔2 Understanding is deeper
than Love.تفہیم محبت سے زیاده گہری ہے۔
3 The only thing that is constant in this world is
change.
اس دنیا میں واحد چیز جو مستقل ہے وه ہے تبدیلی۔
Task 05 – Machine Translation
Examples
Dr. Rao Muhammad Adeel Nawab 76
Machine Learning - Real World Examples
Task 05 – Machine Translation
Summary – Machine Translation TaskInput – Unstructured Data (of Variable Length)Output – Unstructured Data (of Variable Length)
Dr. Rao Muhammad Adeel Nawab 77
Your Turn
To-Do Write input, output and goal forfollowing tasks
SummarizationObject Detection in ImageVehicles CategorizationAuthor IdentificationPlagiarism DetectionSpeaker Identification
Types of Machine Learning
Dr. Rao Muhammad Adeel Nawab 79
Data vs Information
Data
Raw facts and figures
Dr. Rao Muhammad Adeel Nawab 80
Data vs InformationVarieties of Data
Structured Data Unstructured Data Semi-Structured Data
Data is stored, processed, and manipulated in a traditional Relational Database Management System (RDBMS)
Data that is commonly generated from human activities and doesn’t fit into a structured database format
Data doesn’t fit into a structured database system, but is none-the-less structured by tags that are useful for creating a form of order and hierarchy in the data
Dr. Rao Muhammad Adeel Nawab 81
Data vs Information
DataMain Forms of Data
InformationProcessed form of data
Text
AudioVideoImage
Dr. Rao Muhammad Adeel Nawab 82
Data Annotation
Data annotationa.k.a. data labeling / data taggingis the process of labeling data to make it usablefor machine learningis performed by domain experts (humans –a.k.a. annotators / taggers / raters)requires a lot of effort, time and cost
Dr. Rao Muhammad Adeel Nawab 83
Example 01 - Data AnnotationRaw Data
iPhone7 is a good mobileBattery of this phone is bad
I am using iphone7
Data Annotation – Sentiment AnalysisComment / Review Sentiment
iPhone7 is a good mobile Positive Battery of this phone is
badNegative
I am using iphone7 Neutral
Data Annotation – Gender IdentificationComment / Review Gender
iPhone7 is a good mobile Male Battery of this phone is
badMale
I am using iphone7 Female
Dr. Rao Muhammad Adeel Nawab 84
Example 02 - Data AnnotationRaw Data Data Annotation – Gender Identification
Male
Male
Female
Dr. Rao Muhammad Adeel Nawab 85
Example 02 - Data AnnotationRaw Data Data Annotation – Emotion Analysis
Angry
Surprise
Happy
Dr. Rao Muhammad Adeel Nawab 86
Example 02 - Data AnnotationRaw Data Data Annotation – Age Group Identification
31-60
10-18
19-30
Dr. Rao Muhammad Adeel Nawab 87
Data and Annotation
For machine learning, data is mainly available as
Un-annotated DataAnnotated Data
Semi-annotated Data
Dr. Rao Muhammad Adeel Nawab 88
Data and Annotation
Un-annotated Data Output is not associated with the inputs
ExamplesInput Output
iPhone 7 is a good mobile -Battery of this phone is bad -
I am using iPhone 7 -
Dr. Rao Muhammad Adeel Nawab 89
Data and Annotation
Annotated Data Output is associated with all the inputs
ExamplesInput Output
iPhone 7 is a good mobile MaleBattery of this phone is bad Male
I am using iPhone 7 Female
Dr. Rao Muhammad Adeel Nawab 90
Data and Annotation
Semi-annotated Data Output is associated with some of the inputs
ExamplesInput Output
iPhone 7 is a good mobile MaleBattery of this phone is bad -
I am using iPhone 7 Female
Dr. Rao Muhammad Adeel Nawab 91
Balanced Data vs Unbalanced Data
For more accurate learning, it is important to have balanced data
For each class, the dataset must contain the same number of instances
Balanced Data
Dr. Rao Muhammad Adeel Nawab 92
Balanced Data vs Unbalanced Data (Cont…)
Example – Gender Identification
Two Classes = Male and FemaleDataset = 300 examplesUnbalanced datasets
Male = 100, Female = 200Male = 200, Female = 100
Balanced datasetMale = 150, Female = 150
Dr. Rao Muhammad Adeel Nawab 93
Types of Learning
01 Supervised Learning (or Classification)
02 Unsupervised Learning (or Clustering)
03 Semi Supervised Learning
Dr. Rao Muhammad Adeel Nawab 94
Supervised Learning (or Classification)
If the training examples have associated output values, then the learning setting is called supervised learning
Dr. Rao Muhammad Adeel Nawab 95
Supervised Learning (or Classification)
For Supervised LearningTrain data – is annotatedTest data – must be annotated
Dr. Rao Muhammad Adeel Nawab 96
Types of Supervised Learning
ClOutput is categorical (or discrete)Example – Gender Prediction
Classification
RegressionOutput is numeric (or continuous)Example – House Price Prediction
Dr. Rao Muhammad Adeel Nawab 97
Supervised Learning – Example
Set of Input Vectors
Output
Dr. Rao Muhammad Adeel Nawab 98
Unsupervised Learning (or Clustering)
If the training examples do not have associated output values, then the learning setting is called unsupervised learning
Dr. Rao Muhammad Adeel Nawab 99
Unsupervised Learning (or Clustering)(Cont…)
Useful for splitting datasets into partitions,which is known as clusteringFor Unsupervised Learning
Train data – is not annotatedTest data – must be annotated
Dr. Rao Muhammad Adeel Nawab 100
GivenA set of text documents (or examples)A distance measure (Euclidean Distance)
Find
12
Clusters (sports, showbiz, religion, crime, politics)such that
Text documents in one cluster are more similar toone anotherText documents in separate clusters are lesssimilar to one another
Example of Unsupervised Learning – Document Clustering
Dr. Rao Muhammad Adeel Nawab 101
Semi Supervised Learning
If some training examples have associated output values, then the learning setting is called semi supervised learning
Dr. Rao Muhammad Adeel Nawab 102
Semi Supervised Learning (Cont…)
For Semi Supervised LearningTrain data – is semi-annotatedTest data – must be annotated
Dr. Rao Muhammad Adeel Nawab 103
For accurate learning, we need
01
0302
For any learning setting, test data must be annotated to evaluate the performance of a model
High quality data
Large amount of data
Balanced data
Summary - Points to Consider in any Learning Setting
Phases in Machine Learning
Dr. Rao Muhammad Adeel Nawab 105
Phases in Machine Learning
Machine learning has three main phases
01
0302 Testing
Training
Application
Dr. Rao Muhammad Adeel Nawab 106
Data SplitFor both training and testing phases we need data, therefore, we split the available data into
0102 Test Data (or Test set)
Train Data (or Train set)
Standard Approach for data split
Note – Train set and test set must be disjoint i.e. example accruing in the train set should not occur in the test set and vice versa
Use 2 / 3 data as train setUse 1 / 3 data as test set
Dr. Rao Muhammad Adeel Nawab 107
Data Split (Cont…)
Two main approaches to data split
Class Balanced SplitRandom Split
Dr. Rao Muhammad Adeel Nawab 108
Data Split (Cont…)
Example 01 – Gender Identification (Balanced Data)
Dataset = 600 instances (Male = 300, Female = 300)Train-Test Split Ratio is 67%-33%
Random Split • Train set = 400 instances
(Male = 250, Female = 150 )• Test set = 200 instances
(Male = 50, Female = 150 )
Class Balanced Split• Train set = 400 instances
(Male = 200, Female = 200 )• Test set = 200 instances
(Male = 100, Female = 100 )
Dr. Rao Muhammad Adeel Nawab 109
Data Split (Cont…)
Example 02 – Gender Identification (Unbalanced Data)Dataset = 900 instances (Male = 600, Female = 300)Train-Test Split Ratio is 67%-33%
Random Split • Train set = 600 instances
(Male = 500, Female = 100 )• Test set = 300 instances
(Male = 100, Female = 200 )
Class Balanced Split• Train set = 600 instances
(Male = 400, Female = 200 )• Test set = 300 instances
(Male = 200, Female = 100 )
Dr. Rao Muhammad Adeel Nawab 110
Data Split (Cont…)
It is good to split data in a train-test ratio of 67%-33% using the class balanced split approach
Dr. Rao Muhammad Adeel Nawab 111
Phases in Machine Learning
Training PhaseMachine Learning Algorithm (or learning algorithm) learns from the training data (or training examples)The output of the training phase is a Machine Learning Model (or model) which you can then use to make predictions
Training Phase
Dr. Rao Muhammad Adeel Nawab 112
Phases in Machine Learning (Cont…)
Testing PhaseThe performance of the model (created in the training phase) is evaluated on the test data using evaluation measure(s)
Standard evaluation measures include Accuracy, Precision, Recall, F-measure, Area Under the Curve (AUC), Mean Absolute Error (MAE), Mean Square Error (MSE), Root Mean Square Error (RMSE) etc.
Testing Phase
Dr. Rao Muhammad Adeel Nawab 113
Training phase creates the “model” using the “data” i.e. training data
Training and Testing Phases
Recall the Equation
Testing phase checks the “error” in the “model” using the test data
Data = Model + Error
Dr. Rao Muhammad Adeel Nawab 114
Application Phase
Application Phase
The trained model (or model) is deployed in the real world to make predictions on new (unseen) data
Note – If a model produces an Accuracy of 80% on a large test set then we would say that in the real world it will correctly classify 80% of unseen data (or instances)
Dr. Rao Muhammad Adeel Nawab 115
Machine Learning – Assumption
AssumptionIf a trained model performs well on large test data, it will perform well on real-time data
Dr. Rao Muhammad Adeel Nawab 116
Summary – Phases in Machine Learning
Machine learning models learn from existing data (or training data) to make predictions on
new (unseen) data
Performance of a trained model is evaluated on test data using evaluation measure(s)
If a trained model performs well on large test data, it is likely to perform well in the application
phase
Machine Learning -Training Regimes
Dr. Rao Muhammad Adeel Nawab 118
Training Regimes
Considerable variation possible in how training examples are presented/used
01
0302 Incremental Method
Batch Method
On-line Method
Main Types of Training Regimes
Dr. Rao Muhammad Adeel Nawab 119
Types of Training Regimes
Batch Method
All training examples are available and used all at once to compute h
Dr. Rao Muhammad Adeel Nawab 120
Types of Training Regimes (Cont…)
Incremental Method
All training examples are used iteratively to refine acurrent hypothesis until some stopping conditionIn the incremental method one member of the trainingset is selected at a time and used to modify the currenthypothesisMembers of the training set can be selected at random(with replacement) or the set can be cycled throughiteratively
Dr. Rao Muhammad Adeel Nawab 121
Types of Training Regimes (Cont…)
On-line Method
If training instances become available one at a time and are used as they become available, the method is called an on-line method
e.g. a robot which is learning a hypothesis from sensory inputs which controls its actions (and hence determines its future sensory inputs)
Dr. Rao Muhammad Adeel Nawab 122
Direct Training vs Indirect Training
Each training instance has associated target value
Direct Training
Dr. Rao Muhammad Adeel Nawab 123
Direct Training vs Indirect Training (Cont…)
Only sequences of training instances have associated outcome e.g. in chess want to learn function from board position to next move
May not know whether individual moves are correct Only whether sequence of moves leads to win or lossLearner must decide what moves in sequence are good/bad(the credit assignment problem)
Indirect Training
Treating a Problem as a Machine Learning Problem – A Step by Step Example
Dr. Rao Muhammad Adeel Nawab 125
Problem – Gender Identification
Dr. Rao Muhammad Adeel Nawab 126
Problem Identify the gender of a person
Treating a Problem as a ML Problem - Example
Dr. Rao Muhammad Adeel Nawab 127
0102 Automatic Approach - Involves Machines
Manual Approach - Involves Humans
Two Main Approaches to Solve the Problem
Treating a Problem as a ML Problem - Example
Dr. Rao Muhammad Adeel Nawab 128
Limitations of Manual Approach
It is not practical when we have huge amount of data
Solution – Build automatic intelligent systems to identify the
gender of a person
Dr. Rao Muhammad Adeel Nawab 129
Types of Automatic Approach
Two main types of Automatic Approach
01
02 Machine Learning based Approach (or Statistical / Probabilistic Approach)
Rule Based ApproachHumans (domain experts) manually extract rules from data
Machines automatically extract rules (or learn) from data
Dr. Rao Muhammad Adeel Nawab 130
Types of Automatic Approach (Cont…)
Our Focus
Treat the problem of gender identification as a supervised classification task
Dr. Rao Muhammad Adeel Nawab 131
Supervised Classification Task
To treat the problem of gender identification as asupervised classification task, we need
Large amount of labeled data
High quality labeled data
Balanced data
Dr. Rao Muhammad Adeel Nawab 132
01 Data Collection and Preparation Identification of Input and OutputRepresentation of Input and Output
Selection of Attributes / Features
Selection of Attribute Values
Gender Identification - Supervised Classification Task
A Four Step Process
Dr. Rao Muhammad Adeel Nawab 133
01 Data Collection and Preparation (Cont…) Data SourceFeature Extraction from DataData Split into Train / Test setsSelection of Machine Learning Algorithm(s) and Evaluation Measures
Gender Identification - Supervised Classification Task
A Four Step Process (Cont…)
Dr. Rao Muhammad Adeel Nawab 134
A Four Step Process (Cont…)
02 Training Phase
03 Testing Phase
04 Application Phase
Gender Identification - Supervised Classification Task
Dr. Rao Muhammad Adeel Nawab 135
Identification of Input and Output
Problem – Gender Identification
Input = HumanOutput = Gender
Data Collection and Preparation
Dr. Rao Muhammad Adeel Nawab 136
Human – can be represented as a set of attributes / features
Data Collection and Preparation (Cont…)
Representation of Input and Output
Representation of Input
Representation of Output
Attribute-Value Pair
Vector – Valued
Gender – can be represented as a single attributeSingle – Valued
Dr. Rao Muhammad Adeel Nawab 137
Problem Dependent
Selection of Attributes / Features
Data Dependent
Two Step Process
01 List down all possible attributes / features
02 Select the most discriminating ones
Data Collection and Preparation (Cont…)
Dr. Rao Muhammad Adeel Nawab 138
Example – Input Attributes for Gender Identification Problem
01 A possible set of attributes / features
02 Most discriminating ones may include
Weight, Height, Hair Length, Scarf, Beard, No. of eyes, No. of Nose, No. of hands, No. of foot
Weight, Height, Hair Length, Scarf, Beard
Data Collection and Preparation (Cont…)
Dr. Rao Muhammad Adeel Nawab 139
Selection of Attribute Values
Depends upon what type of information an attribute / feature carryDecide two things
Data TypePossible Values
Data Collection and Preparation (Cont…)
Dr. Rao Muhammad Adeel Nawab 140
Example – Gender Identification
Object / Instance/ ExampleAttribute Data Type Possible Values
Height (cm) Numeric Range: 160- 185Weight (lb) Numeric Range: 110 – 220Hair Length Categorical Bald/ Short/ Medium/ LongBeard Categorical True / FalseScarf Categorical True / FalseGender Categorical Male / Female
Data Collection and Preparation (Cont…)
Dr. Rao Muhammad Adeel Nawab 141
We may take 10 volunteers from Machine Learning class (5 Male and 5 Female) as source of data
What will be your main source(s) of data
Data Source
Ensure reliability and quality of data source
Example – Gender Identification
Data Collection and Preparation (Cont…)
Dr. Rao Muhammad Adeel Nawab 142
Feature Extraction from Data
Feature Extraction
Is the process of extracting “values of attributes” from data
Data Collection and Preparation (Cont…)
Dr. Rao Muhammad Adeel Nawab 143
Measurement – Height
Manual Measurement
Feature Extraction from Data
“Attribute Values” can be extracted through
Observation – e.g. No. of hands
Automatic Measurement
Extract “values of attributes” for both “input attribute(s)” and “output attribute(s)”
Data Collection and Preparation (Cont…)
Dr. Rao Muhammad Adeel Nawab 144
Feature Extraction from Data
Data Collection and Preparation (Cont…)
Dr. Rao Muhammad Adeel Nawab 145
Data Split into Train / Test Sets
Use a Train-Test split ratio of 67%-33% to split data using the “class balanced split” approach
Data Collection and Preparation (Cont…)
Dr. Rao Muhammad Adeel Nawab 146
Data Split into Train / Test Sets
Trainset
Testset
Whole Data
Data Collection and Preparation (Cont…)
Dr. Rao Muhammad Adeel Nawab 147
Selection of Machine Learning Algorithms and Evaluation MeasuresSelection of Machine Learning Algorithms
Depends on type of dataExample
For textual data - Naïve Bayes, Random Forest etc. are reported to be more suitable For image data - Neural Networks are reported to be more suitable
Data Collection and Preparation (Cont…)
Dr. Rao Muhammad Adeel Nawab 148
Selection of Evaluation Measures
Depends on class distribution
For balanced data – Accuracy is more suitableFor unbalanced data – F1 and Area Under the Curve are more suitable
Selection of Machine Learning Algorithms and Evaluation Measures
Data Collection and Preparation (Cont…)
Dr. Rao Muhammad Adeel Nawab 149
Selection of Evaluation Measures (Cont…)Depends on Learning Setting
For Classification tasks – Accuracy, F1 etc. are more suitableFor Regression tasks – Mean Absolute Error, Root Mean Square Error etc. are more suitable
Selection of Machine Learning Algorithms and Evaluation Measures
Data Collection and Preparation (Cont…)
Dr. Rao Muhammad Adeel Nawab 150
Selection of Evaluation Measures (Cont…)Depends on Learning Problem
For Gender Identification – Accuracy is more suitable (assuming balanced data)For Plagiarism Detection – Precision, Recall and F1 are more suitable
Selection of Machine Learning Algorithms and Evaluation Measures
Data Collection and Preparation (Cont…)
Dr. Rao Muhammad Adeel Nawab 151
h1 h2
hi hk
Training Example = {x1, …. xm} + f(xi) for each xi ϵ TE
h, where h ≈ fh best fits the training data
Learner
HypothesisSpace H
Training Phase
Dr. Rao Muhammad Adeel Nawab 152
Prediction
Actual Predicted Accuracy179.1 185 Long Yes No Male Male 1160.5 130 Short No No Female Male 0177.8 160 Bald No No Male Female 0161.1 100 Medium No No Female Female 1
Accuracy = Correctly Classified / Total instances = 2 / 4 = 0.50 or 50%
TrainedModel
Testing Phase
Dr. Rao Muhammad Adeel Nawab 153
Learned Model
Trainset
Unseen Example
Extract Features as in Train set Prediction
Application Phase
Dr. Rao Muhammad Adeel Nawab 154
The goal of Machine Learning algorithms is to learn from existing data to make predictions on
new (unseen) data
Majority of Machine Learning involves learning input-output functions i.e. learn from input to
predict output
For any learning setting, we need large amount of data, high quality data and preferably balanced
data
Lecture Summary
Dr. Rao Muhammad Adeel Nawab 155
The three main phases of machine learning are: (1) Training – learn from training data and output a model, (2) Testing – evaluate the performance of
the model on test data and (3) Application –deploy the trained model in the real world
Machine learning can also be summarized in the following equation:
Data = Model + Error
Lecture Summary