5 H x 2 H 4 x i (0,1,0,1,0) = 5) H T fi H p w x w T H x 4, w 3, H T p H … · 2021. 1. 26. ·...

2
BITS F464: Machine Learning 05 Towards Bayesian Machine Learning Dr. Kamlesh Tiwari Assistant Professor, Department of CSIS, BITS Pilani, Pilani Campus, Rajasthan-333031 INDIA Jan 27, 2021 ONLINE (Campus @ BITS-Pilani Jan-May 2021) http://ktiwari.in/ml Let’s Predict Consider table below: Can you determine the value of ??? To do this One have to determine how Y depends upon attributes x 1, x 2, x 3, x 4, x 5 y = f (x 1, x 2, x 3, x 4, x 5) What about f = mod ( w i × x i , 2) with (w1,w2,w3,w4,w5) = (0,1,0,1,0) Machine Learning (BITS F464) M W F (10-11AM) online@BITS-Pilani Lecture-05(Jan 27, 2021) 2/7 Let’s change our focus a bit Input: x Output: y Training data: (x (1) , y (1) ), (x (2) , y (2) ), ..., (x (m) , y (m) ) Target function f : x y Hypothesis h : x y Instead of reporting y let’s report P(y ) For a given input x , output is not True/False But, P(True)= 0.3 P(False)= 0.7 Machine Learning (BITS F464) M W F (10-11AM) online@BITS-Pilani Lecture-05(Jan 27, 2021) 3/7 Bias: How to come up with these values Observe the data Assume data for 10 coin tosses be -H-H-T-H-H-T-T-H-H-H- Now again the coin is to be ipped, what is expected output? P(H)= 0.7 and P(T )= 0.3 You can incorporate your bias Assume you know the coin was unbiased (and you have a high condence on this) You hypothetically consider 100 more ips 1 . Being a fair coin it would give 50 H and 50 T So your estimate for probability of head is as below P(H)=(7 + 50)/(110)= 0.518 and P(T )= 0.482 Why 100? why not 10000? OK; if you have more condence on bias take 10000. 1 Also called laplacian smoothing Machine Learning (BITS F464) M W F (10-11AM) online@BITS-Pilani Lecture-05(Jan 27, 2021) 4/7 Bayes Theorem Example: Three companies A, B and C makes 35%, 35% and 30% of all the lamps in market. Probability of their lamp being defective is 1.5%, 1% and 2% respectively. What is the probability that a randomly selected defective lamp was manufactured in factory C? Bayes Theorem P(A|B)= P(B|A)P(A) P(B) Prior, posterior, likelihood and evidence (P(A), P(A|B), P(B|A), P(B)) What we want is P(C|D) P(D)= P(A)P(D|A)+ P(B)P(D|B)+ P(C)P(D|C)= .35 0.015 + .35 0.01 + .3 0.02 = 0.01475 P(C|D)= P(D|C)P(C) P(D) = 0.02 0.3 0.01475 = 0.407 Machine Learning (BITS F464) M W F (10-11AM) online@BITS-Pilani Lecture-05(Jan 27, 2021) 5/7 Probability of observing a dataset Assume you are ipping a biased coin where p(H)= 0.4. What is the probability that you see this dataset D =< H, H, T , T , H, H > p(H)= 0.4 p(T )= 1 p(H)= 1 0.4 = 0.6 If all the trails are independent then p(D|θ) = p(H) × p(H) × p(T ) × p(T ) × p(H) × p(H) = 0.4 4 × 0.6 2 = 0.009216 Note: Order of elements in the data set do not matter in the trial. So p(< H, H, H, H, T , T >) is same (in fact any other permutation) What is θ It is the parameter. For our case it represents p(H)= 0.4 Machine Learning (BITS F464) M W F (10-11AM) online@BITS-Pilani Lecture-05(Jan 27, 2021) 6/7

Transcript of 5 H x 2 H 4 x i (0,1,0,1,0) = 5) H T fi H p w x w T H x 4, w 3, H T p H … · 2021. 1. 26. ·...

Page 1: 5 H x 2 H 4 x i (0,1,0,1,0) = 5) H T fi H p w x w T H x 4, w 3, H T p H … · 2021. 1. 26. · 482 con more e v ha ou y if OK; 10000? not y wh 100? y Wh fi on dence 10000. e tak

BIT

SF4

64:M

achi

neLe

arni

ng

05Tow

ardsB

ayesian

Machin

eLearn

ingD

r.K

amle

shTi

war

iA

ssis

tant

Pro

fess

or,D

epar

tmen

tofC

SIS

,B

ITS

Pila

ni,P

ilani

Cam

pus,

Raj

asth

an-3

3303

1IN

DIA

Jan

27,2

021

ON

LIN

E(C

ampu

s@

BIT

S-P

ilani

Jan-

May

2021

)

http://ktiwari.in/ml

Let’s

Pre

dict

Con

side

rtab

lebe

low

:C

anyo

ude

term

ine

the

valu

eof

???

Todo

this

One

have

tode

term

ine

how

Yde

pend

sup

onat

trib

utes

x1,

x2,x

3,x4

,x5

y=

f(x1

,x2,

x3,x

4,x5

)

Wha

tabo

ut

f=

mod

(�w

x i,2

)

with

(w1,

w2,

w3,

w4,

w5)

=(0

,1,0

,1,0

)

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

05(J

an27

,202

1)2

/7

Let’s

chan

geou

rfoc

usa

bit

Inpu

t:x

Out

put:

yTr

aini

ngda

ta:(x

(1) ,

y(1)),(x

(2) ,

y(2)),

...,(

x(m) ,

y(m) )

Targ

etfu

nctio

nf:x→

yH

ypot

hesi

sh:

x→

y

Inst

ead

ofre

port

ing

yle

t’sre

port

P(y)

Fora

give

nin

putx

,out

puti

sno

tTru

e/Fa

lse

But

,

P(T

rue)

=0.

3

P(F

alse

)=

0.7

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

05(J

an27

,202

1)3

/7

Bia

s:H

owto

com

eup

with

thes

eva

lues

Obs

erve

the

data

Ass

ume

data

for1

0co

into

sses

be-H

-H-T

-H-H

-T-T

-H-H

-H-

Now

agai

nth

eco

inis

tobe

flipp

ed,w

hati

sex

pect

edou

tput

?P(H

)=

0.7

and

P(T

)=

0.3

You

can

inco

rpor

ate

your

bias

Ass

ume

you

know

the

coin

was

unbi

ased

(and

you

have

ahi

ghco

nfide

nce

onth

is)

You

hypo

thet

ical

lyco

nsid

er10

0m

ore

flips

1 .B

eing

afa

irco

init

wou

ldgi

ve50

Han

d50

TS

oyo

ures

timat

efo

rpro

babi

lity

ofhe

adis

asbe

low

P(H

)=

(7+

50)/(1

10)=

0.51

8an

dP(T

)=

0.48

2W

hy10

0?w

hyno

t100

00?

OK

;ify

ouha

vem

ore

confi

denc

eon

bias

take

1000

0.1

Als

oca

lled

lapl

acia

nsm

ooth

ing

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

05(J

an27

,202

1)4

/7

Bay

esTh

eore

mE

xam

ple:

Thre

eco

mpa

nies

A,B

and

Cm

akes

35%

,35%

and

30%

ofal

lthe

lam

psin

mar

ket.

Pro

babi

lity

ofth

eirl

amp

bein

gde

fect

ive

is1.

5%,1

%an

d2%

resp

ectiv

ely.

Wha

tis

the

prob

abili

tyth

ata

rand

omly

sele

cted

defe

ctiv

ela

mp

was

man

ufac

ture

din

fact

ory

C?

Bay

esTh

eore

m

P(A

|B)=

P(B

|A)P

(A)

P(B

)

Prio

r,po

ster

ior,

likel

ihoo

dan

dev

iden

ce(P

(A),

P(A

|B),

P(B

|A),

P(B

))

Wha

twe

wan

tis

P(C

|D)

P(D

)=

P(A

)P(D

|A)+

P(B

)P(D

|B)+

P(C

)P(D

|C)=

.35∗0

.015

+.3

5∗0

.01+

.3∗0

.02=

0.01

475

P(C

|D)=

P(D

|C)P

(C)

P(D

)=

0.02

∗0.3

0.01

475

=0.

407

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

05(J

an27

,202

1)5

/7

Pro

babi

lity

ofob

serv

ing

ada

tase

t

Ass

ume

you

are

flipp

ing

abi

ased

coin

whe

rep(

H)=

0.4.

Wha

tis

the

prob

abili

tyth

atyo

use

eth

isda

tase

tD=<

H,H

,T,T

,H,H

>

p(H)=

0.4

p(T)=

1−

p(H)=

1−

0.4=

0.6

Ifal

lthe

trai

lsar

ein

depe

nden

tthe

np(

D|θ)

=p(

H)×

p(H)×

p(T)×

p(T)×

p(H)×

p(H)

=0.

44×

0.62

=0.

0092

16

Not

e:O

rder

ofel

emen

tsin

the

data

setd

ono

tmat

teri

nth

etr

ial.

So

p(<

H,H

,H,H

,T,T

>)

issa

me

(infa

ctan

yot

herp

erm

utat

ion)

Wha

tisθ

Itis

the

para

met

er.

Foro

urca

seit

repr

esen

tsp(

H)=

0.4

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

05(J

an27

,202

1)6

/7

Page 2: 5 H x 2 H 4 x i (0,1,0,1,0) = 5) H T fi H p w x w T H x 4, w 3, H T p H … · 2021. 1. 26. · 482 con more e v ha ou y if OK; 10000? not y wh 100? y Wh fi on dence 10000. e tak

Than

kYo

u!

Than

kyo

uve

rym

uch

for

your

atte

ntio

n!

Que

ries

?

Mac

hine

Lear

ning

(BIT

SF4

64)

MW

F(1

0-11

AM

)onl

ine@

BIT

S-P

ilani

Lect

ure-

05(J

an27

,202

1)7

/7