ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ...

40
Prof. Elabassi 11/17/2016 Prof. Elabassi 1 ﺍﻟﺒﻴﺎﻧﺎﺕ ﲢﻠﻴﻞ ﻭﺍﻟﻜﻴﻔﻴ ﺍﻟﻜﻤﻴﺔ ﺑﻬﺎ ﻭﺍﻟﺘﻨﻘﻴﺐ ﺍﻟﻘﺮﺍﺭﺍﺕ ﻹﲣﺎﺫ ﺍﻟﺪﻛﺘﻮﺭ ﺍﻻﺳﺘﺎﺫ ﺇﻋﺪﺍﺩ/ ﺍﻟﻌﺒﺎﺳﻲ ﻋﺒﺪﺍﳊﻤﻴﺪ ﺍﳌﻌﻬﺪ ﻋﻤﻴﺪ ﻧﻮﻓﻤﱪ2016 ﺍﻟﻘﺎﻫﺮﺓ ﺟﺎﻣﻌﺔ ﺍﻹﺣﺼﺎﺋﻴﺔ ﻭﺍﻟﺒﺤﻮﺙ ﺍﻟﺪﺭﺍﺳﺎﺕ ﻣﻌﻬﺪProf. Elabassi 1 11/17/2016 The Decision Making Process Begin Here: Identify the Problem Data Information Knowledge Decision Descriptive Statistics, Probability, Computers Experience, Theory, Literature, Inferential Statistics, Computers Prof. Elabassi 2 11/17/2016

Transcript of ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ...

Page 1: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 1

ةالكمية والكيفيحتليل البيانات إلختاذ القراراتوالتنقيب بها

عبداحلميد العباسي / إعداد االستاذ الدكتورعميد املعهد

2016 نوفمرب

جامعة القاهرةمعهد الدراسات والبحوث اإلحصائية

Prof. Elabassi1 11/17/2016

The Decision Making Process

Begin Here:

Identify theProblem

Data

Information

Knowledge

Decision

Descriptive Statistics,Probability, Computers

Experience, Theory,Literature, InferentialStatistics, Computers

Prof. Elabassi2 11/17/2016

Page 2: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 2

Prof. Elabassi3 11/17/2016

Prof. Elabassi4 11/17/2016

Page 3: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 3

Prof. Elabassi5 11/17/2016

Prof. Elabassi6 11/17/2016

Page 4: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 4

Prof. Elabassi7 11/17/2016

Prof. Elabassi8 11/17/2016

Page 5: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 5

Prof. Elabassi9 11/17/2016

Prof. Elabassi10 11/17/2016

Page 6: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 6

Prof. Elabassi11 11/17/2016

Prof. Elabassi12 11/17/2016

Page 7: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 7

Prof. Elabassi13 11/17/2016

Prof. Elabassi14 11/17/2016

Page 8: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 8

Prof. Elabassi15 11/17/2016

Prof. Elabassi16 11/17/2016

Page 9: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 9

Prof. Elabassi17 11/17/2016

Prof. Elabassi18 11/17/2016

Page 10: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 10

Prof. Elabassi19 11/17/2016

Prof. Elabassi20 11/17/2016

Page 11: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 11

Prof. Elabassi21 11/17/2016

• Making statements about a population by examining sample results

Sample statistics Population parameters

(known)   Inference (unknown, but can

be estimated from

sample evidence)

Sample Population

Inferential Statistics

Prof. Elabassi22 11/17/2016

Page 12: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 12

Sampling Techniques

Convenience

Sampling Techniques

Nonstatistical Sampling

Judgment

Statistical Sampling

SimpleRandom

Systematic

StratifiedCluster

Prof. Elabassi23 11/17/2016

Data Types

Data

Qualitative(Categorical)

Quantitative(Numerical)

Discrete Continuous

Examples:

Marital Status Political Party Eye Color

(Defined categories) Examples:

Number of Children Defects per hour

(Counted items)

Examples:

Weight Voltage

(Measured characteristics)

Prof. Elabassi24 11/17/2016

Page 13: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 13

Levels of Measurementand Measurement Scales

Interval Data

Ordinal Data

Nominal Data

Height, Age, Weekly Food Spending

Service quality rating, Standard & Poor’s bond rating, Student letter grades

Marital status, Type of car owned

Ratio Data

Temperature in Fahrenheit, Standardized exam score

Categories (no ordering or direction)

Ordered Categories (rankings, order, or scaling)

Differences between measurements but no true zero

Differences between measurements, true zero exists

EXAMPLES:

Prof. Elabassi25 11/17/2016

Pie Chart Example

Percentages are rounded to the nearest percent

Current Investment Portfolio

Savings15%

CD 14%

Bonds 29%

Stocks

42%

Investment Amount PercentageType (in thousands $)

Stocks 46.5 42.27

Bonds 32.0 29.09

CD 15.5 14.09

Savings 16.0 14.55

Total 110 100

(Variables are Qualitative)

Prof. Elabassi26 11/17/2016

Page 14: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 14

Bar Chart Example 2

Newspaper readership per week

0

10

20

30

40

50

0 1 2 3 4 5 6 7

Number of days newspaper is read per week

Fre

uen

cy

Prof. Elabassi27 11/17/2016

Side‐by‐Side Chart Example

•Sales by quarter for three sales territories:

0

10

20

30

40

50

60

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

EastWestNorth

1st Qtr 2nd Qtr 3rd Qtr 4th QtrEast 20.4 27.4 59 20.4West 30.6 38.6 34.6 31.6North 45.9 46.9 45 43.9

Prof. Elabassi28 11/17/2016

Page 15: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 15

Summary Measures

Center and Location

Mean

Median

Mode

Other Measures of Location

Weighted Mean

Describing Data Numerically

Variation

Variance

Standard Deviation

Coefficient of Variation

RangePercentiles

Interquartile RangeQuartiles

Prof. Elabassi29 11/17/2016

Same center,different variation

Measures of Variation

Variation

Variance Standard Deviation

Coefficient of Variation

Range InterquartileRange

Measures of variation give information on the spread or variability of the data

values.

Prof. Elabassi30 11/17/2016

Page 16: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 16

المضلع التكرارى

0

2

4

6

8

10

12

14

0

2

4

6

8

10

12 المنحنى التكرارىالمدرج التكرارى

0

2

4

6

8

10

12

14

التوزيع الطبيعى

Prof. Elabassi31 11/17/2016

A Classification of Univariate Techniques

Independent Related

Independent Related* Two- Group test

* Z test* One-Way

ANOVA

* Pairedt test

* Chi-Square* Mann-Whitney* Median* K-S* K-W ANOVA

* Sign* Wilcoxon* McNemar* Chi-Square

Metric Data Non-numeric Data

Univariate Techniques

One Sample Two or More Samples

One Sample Two or More Samples

* t test* Z test

* Frequency* Chi-Square* K-S* Runs* Binomial

Prof. Elabassi32 11/17/2016

Page 17: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 17

راتمعامالت االرتباط تبعا لقياس المتغي

Prof. Elabassi33 11/17/2016

Scatter Plots of Data with Various Correlation Coefficients

Y

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = +.3r = +1

Y

Xr = 0

Prof. Elabassi34 11/17/2016

Page 18: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 18

(continued)

Random Error for this Xi value

Y

X

Observed Value of Y for Xi

Predicted Value of Y for Xi

ii10i εXββY

Xi

Slope = β1

Intercept = β0

εi

Simple Linear Regression Model

Prof. Elabassi35 11/17/2016

The Multiple Regression Model

Idea: Examine the linear relationship between1 dependent (Y) & 2 or more independent variables (Xi)

ikik2i21i10i εXβXβXββY

Multiple Regression Model with k Independent Variables:

Y-intercept Population slopes Random Error

kik2i21i10i XbXbXbbY

Estimated(or predicted)

value of YEstimated slope coefficientsEstimated

intercept

Prof. Elabassi36 11/17/2016

Page 19: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 19

Logistic Regression

• Used when the dependent variable Y is binary (i.e., Y takes on only two values)

• Examples

– Customer prefers Brand A or Brand B

– Employee chooses to work full‐time or part‐time

– Loan is delinquent or is not delinquent

– Person voted in last election or did not

• Logistic regression allows you to predict the probability of a particular categorical response

Prof. Elabassi37 11/17/2016

Logistic Regression

• Logistic regression is based on the odds ratio, which represents the probability of a success compared with the probability of failure

• The logistic regression model is based on the natural log of this odds ratio

(continued)

success ofy probabilit1

success ofy probabilit ratio Odds

Prof. Elabassi38 11/17/2016

Page 20: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 20

Logistic Regression

ikik2i21i10 εXβXβXββratio) ln(odds

Where k = number of independent variables in the model

εi = random error in observation i

kik2i21i10 XbXbXbbratio) odds edln(estimat

Logistic Regression Model:

Logistic Regression Equation:

(continued)

Prof. Elabassi39 11/17/2016

Estimated Odds Ratio and Probability of Success

• Once you have the logistic regression equation, compute the estimated odds ratio:

• The estimated probability of success is

ratio) odds edln(estimateratio odds Estimated

ratio odds estimated1

ratio odds estimatedsuccess ofy probabilit Estimated

Prof. Elabassi40 11/17/2016

Page 21: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 21

• The relationship between the dependent variable and an independent variable may not be linear

• Can review the scatter diagram to check for  non‐linear relationships

• Example: Quadratic model

– The second independent variable is the square of the first variable

Nonlinear Relationships

i21i21i10i εXβXββY

Prof. Elabassi41 11/17/2016

Quadratic Regression Model

• where:

β0 = Y intercept

β1 = regression coefficient for linear effect of X on Y

β2 = regression coefficient for quadratic effect on Y

εi = random error in Y for observation i

i21i21i10i εXβXββY

Model form:

Prof. Elabassi42 11/17/2016

Page 22: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 22

Quadratic Regression Model

Quadratic models may be considered when the scatter diagram takes on one of the following shapes:

X1

Y

X1X1

YYY

β1 < 0 β1 > 0 β1 < 0 β1 > 0

β1 = the coefficient of the linear termβ2 = the coefficient of the squared term

X1

i21i21i10i εXβXββY

β2 > 0 β2 > 0 β2 < 0 β2 < 0

Prof. Elabassi43 11/17/2016

The Importance of Forecasting

Governments forecast unemployment, interest rates, and expected revenues from income taxes for policy purposes

Marketing executives forecast demand, sales, and consumer preferences for strategic planning

College administrators forecast enrollments to plan for facilities and for faculty recruitment

Retail stores forecast demand to control inventory levels, hire employees and provide training

Time-Series Forecasting

Prof. Elabassi Chap 1-44

Page 23: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 23

Common Approaches to Forecasting

Used when historical data are unavailable

Considered highly subjective and judgmental

Common Approaches to Forecasting

Causal

Quantitative forecasting methods

Qualitative forecasting methods

Time Series

Use past data to predict future values

Prof. Elabassi Chap 1-45

Time-Series Components

Time Series

Cyclical Component

Irregular Component

Trend Component

Seasonal Component

Overall, persistent, long-term movement

Regular periodic fluctuations,

usually within a 12-month period

Repeating swings or

movements over more than one

year

Erratic or residual

fluctuations

Prof. Elabassi Chap 1-46

Page 24: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 24

Multiplicative Time-Series Model with a Seasonal Component

Used primarily for forecasting

Allows consideration of seasonal variation

where Ti = Trend value at time i

Si = Seasonal value at time i

Ci = Cyclical value at time i

Ii = Irregular (random) value at time i

iiiii ICSTY

Prof. Elabassi Chap 1-47

Sales vs. Smoothed Sales

Fluctuations have been smoothed

NOTE: the smoothed value in this case is generally a little low, since the trend is upward sloping and the weighting factor is only .2

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 10Time Period

Sa

les

Sales Smoothed

Prof. Elabassi Chap 1-48

Page 25: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 25

Trend-Based Forecasting

Forecast for time period 6:(continued)

Sales trend

01020304050607080

0 1 2 3 4 5 6

Year

sale

s

79.33

(6) 9.571421.905Y

Prof. Elabassi Chap 1-49

Variance

Means:

Correlation,Regression

Variances:

Interval or Ratio(such asheights,weights)

One Population

Nominal (data consistingof proportionsor frequency

counts fordifferent

categories

More thanTwo Populations

One Population

Two Populations

Mean

Contingency Table(multiple rows,

Columns)

Two Populations:

Independent:

Matched Pairs:

Multinomial(one row)

One Population

Estimatingwith Confidence

Interval:

HypothesisTesting with

Large Sample:

Estimatingwith Confidence

Interval:

HypothesisTesting:

EstimatingProportion with

Confidence

HypothesisTesting:

Ordinal(such as data

consisting of ranks)

Two Populations

FrequencyCounts forCategories

Proportions

What is thelevel of

measurementof the data?

Level ofMeasurement

Number ofPopulations

Claim orParameter

Inference

More thanTwo Populations

Prof. Elabassi Chap 1-50

Page 26: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 26

Prof. Elabassi 5111/17/2016

Prof. Elabassi 5211/17/2016

Page 27: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 27

Prof. Elabassi 5311/17/2016

Prof. Elabassi 5411/17/2016

Page 28: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 28

Prof. Elabassi 5511/17/2016

Prof. Elabassi 5611/17/2016

Page 29: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 29

Prof. Elabassi 5711/17/2016

على نفس المستوىيجب ان تكون البيانات الواحدة) المعادلة(فى النموذج

استخدام تحليل متعدد المستويات

Multilevel

مراعاة معايير الجودة فى النموذجProf. Elabassi 5811/17/2016

Page 30: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 30

Prof. Elabassi 5911/17/2016

Prof. Elabassi 6011/17/2016

Page 31: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 31

Introduction: What is SPSS?

• Originally it is an acronym of Statistical Packagefor the Social Science but now it stands forStatistical Product and Service Solutions.

• Since SPSS 17 it is called PASW, PredictiveAnalysis SoftWare, and SPSS is now named asPASW Statistics.

• One of the most popular statistical packageswhich can perform highly complex datamanipulation and analysis with simple instructions.

Prof. Elabassi 6111/17/2016

The Basic Analysis1- Frequencies2- Descriptives3- Explore4- Crosstabs

Prof. Elabassi 6211/17/2016

Page 32: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 32

Test Hypothesis1- One sample2- Two Samples (R-I)3- Anova (F test)4- Ancova

Modelling1- Cor. & Reg.2- Log. & Dis.Prof. Elabassi 6311/17/2016

Regression Analysis

• Click ‘Analyze,’ ‘Regression,’ then click‘Linear’ from the main menu.

Prof. Elabassi 6411/17/2016

Page 33: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 33

Regression Analysis

• Clicking OK gives the result

Prof. Elabassi 6511/17/2016

Data MiningMay be defined as follows:data mining is a collection of techniques for efficient automateddiscovery of previously unknown, valid, novel, useful andunderstandable patterns in large databases. The patterns mustbe actionable so they may be used in an enterprise’s decisionmaking.

Why Data Mining Now?

Data are being produced

Data are being stored in data warehouses

Computing power if more affordable

Competitive pressures are enormous

Availability of easy to use data mining software

Prof. Elabassi66 11/17/2016

Page 34: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 34

Cross Industry Standard Process ‐ DM

Iterative CRISP‐DM process shown in outer circle

Most significant dependencies between phases shown

Next phase depends on results from preceding phase

Returning to earlier phase possible before moving forward

Prof. Elabassi67 11/17/2016

Data Mining Tasks•Description•Estimation•Classification•Prediction

•Clustering•Affinity Analysis 

Supervised

Directed

Unsupervised

Undirected

Difference; target variable—numeric or categorical

Difference between prediction and (classification and estimation) is future

Matching Data Mining Tasks to Data Mining Algorithms

Estimation Multiple Linear Regression, Neural Networks

Classification Decision Trees, Logistic Regression, Neural Networks, k‐NN

Prediction Estimation & Classification for future values

Clustering  k‐means, Kohonen Self Organizing Maps

Affinity Ana. Association Analysis, sometimes referred to as Market Basket AnalysisProf. Elabassi68 11/17/2016

Page 35: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 35

Data Mining: Confluence of Multiple Disciplines

Data Mining

Database Technology Statistics

MachineLearning

PatternRecognition

Algorithm

Distributed Computing

Visualization

Prof. Elabassi69 11/17/2016

______

______

______

Transformed Data

Patternsand

Rules

Target Data

RawData

KnowledgeInterpretation& Evaluation

Integration

Un

de

rsta

nd

ing

Knowledge Discovery Process

DATAWarehouse

Knowledge

Prof. Elabassi70 11/17/2016

Page 36: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 36

Knowledge Discovery Process

–Data mining: the core of knowledge discovery 

process.

Data Cleaning

Data Integration

Databases

Preprocessed Data

Task-relevant DataData transformations

Selection

Data Mining

Knowledge Interpretation

Prof. Elabassi71 11/17/2016

Data Mining Models and Tasks

Prof. Elabassi72 11/17/2016

Page 37: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 37

KDD Process Ex:  Shuttle Data• Selection:

– Select data (which missions etc) to use

• Preprocessing:– Remove Spikes

• Transformation:– DFT, DWT, PAA etc

• Data Mining:– Look for Rules…

• Interpretation/Evaluation:– Show rules to domain experts

• Potential User Applications:– Prediction of Failures

0 100 200 300 400 500 600 700 800 900 10000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

00.10.20.30.40.50.60.70.80.9

1

0 100 200 300 400 500 600 700 800 900 1000

Prof. Elabassi73 11/17/2016

Data Mining Development•Similarity Measures•Hierarchical Clustering•IR Systems•Imprecise Queries•Textual Data•Web Search Engines

•Bayes Theorem•Regression Analysis•EM Algorithm•K‐Means Clustering•Time Series Analysis

•Neural Networks•Decision Tree Algorithms

•Algorithm Design Techniques•Algorithm Analysis•Data Structures

•Relational Data Model•SQL•Association Rule Algorithms•Data Warehousing•Scalability Techniques

Prof. Elabassi74 11/17/2016

Page 38: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 38

A Numeric Example

•Feed forward restricts network flow to single direction•Fully connected•Flow does not loop or cycle•Network composed of two or more layers

x0

x1

x2

x3

Node 1

Node     2

Node 3

Node B

Node A

Node Z

W1A

W1B

W2A

W2B

WAZ

W3A

W3B

W0A

WBZ

W0Z

W0B

Input Layer Hidden Layer Output Layer

Prof. Elabassi75 11/17/2016

Linear neurons

• These are simple but computationally limited– If we can make them learn we may get insight into more complicated

neurons.

ii

iwxby output

bias

index overinput connections

i inputth

ith

weight on

input

Prof. Elabassi76 11/17/2016

Page 39: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 39

Prof. Elabassi77 11/17/2016

Prof. Elabassi78 11/17/2016

Page 40: ﺕﺍﺭﺍﺮﻘﻟﺍ ﺫﺎﲣﻹ ﺎﻬﺑ ﺐﻴﻘﻨﺘﻟﺍﻭissr.cu.edu.eg/media/1315/da.pdfLevels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal

Prof. Elabassi 11/17/2016

Prof. Elabassi 40

Prof. Elabassi79 11/17/2016

.n, #VAR …استعرض معلومات عنه MV.SAVإذا إتيحت لك البيانات المرفقة بملف .للبيانات N~(30,10)قم بتوليد متغير العمر عشوائيا1..بالمتوسط 20إستبدل الحاالت التي يقل عمرها عن 2..أحسب المقاييس األساسية لجميع المتغيرات3.).مرات 4كرر البيانات (مشاهدات 4البيانات باعتبار كل حالة تمثل ) أوزن(رجح 4..ج بدونأحسب المقاييس األساسية لجميع المتغيرات بعد الترجيح وقارنها مع النتائ5..بغرض أن البيانات تقيس آراء واتجاهات أحسب معامل الثبات والصدق المناسب6..75ثم Y (GPA)=60اختبر أن متوسط 7. GRD.والدرجةWORتختلف في متوسطها وترتبط بالعمل Yيقال أن 8.GRD.والدرجة WORيوجد عالقة إرتباط لكل من العمل 9.

.X1,X5يقال بوجود عالقة معنوية ذات داللة إحصائية بين متوسط وقيم 10.

.وترتيب أهميتها Y (GPA)ماهى أهم المتغيرات الكمية المؤثرة في 11.

.وترتيب أهميتها Y (GPA)ماهى أهم المتغيرات الكمية والوصفية المؤثرة في 12.

.وترتيب أهميتها WORماهى أهم المتغيرات المؤثرة في 13.

.وترتيب أهميتها GRDماهى أهم المتغيرات المؤثرة في 14.

.باستخدام أسلوب الشبكات العصبية وقارن بين النتائج 14و 12حقق 15.

.المتغيرات بعدد أقل)قلص(وادمج ، )مجموعات(عناقيد 4في ضع الحاالت والمتغيرات 16.

.اتصاعديا معتبرها تسلسل زمنى مقدر اتجاه Yرتب ، X3,Yقدر أفضل منحنى لكل 17. Prof. Elabassi80 11/17/2016