MELJUN CORTES DATA TYPES Rm104tr-13

download MELJUN CORTES DATA TYPES Rm104tr-13

of 39

Transcript of MELJUN CORTES DATA TYPES Rm104tr-13

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    1/39

    Lesson 13 - 1

    Year 1

    CS113/0401/v1

    LESSON 13TYPES OF DATA

    Qualitative

    Not usually numeric No particular order

    Examples:

    Colour, Types of Materials

    Quantitative Numeric

    Ordered

    Measurable

    Continuous E.g. Length, Age, Weight

    Discrete

    E.g. Shoe size, Number ofpeople

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    2/39

    Lesson 13 - 2

    Year 1

    CS113/0401/v1

    First stage in making raw data

    understandable

    RAW DATA

    Number of sheets of listing paper

    used by each of 120 jobs

    Not easily digested!

    17

    24 11

    14

    18

    17

    7

    5

    21

    6 11 18 22 14 6 17

    14

    8

    12132712 189

    14

    18 14

    13

    21

    8

    27

    9

    11

    16 27 21 14 11 19 7

    10

    29

    17121419 129

    23

    17 24

    7

    13

    14

    17

    21

    8

    17 19 24 26 2 5 18

    14

    16

    7162813 148

    19

    27 9

    18

    8

    24

    19

    7

    13

    14 16 19 11 17 23 12

    25

    16

    15102118 1411

    9

    14 28

    20

    12

    16

    10

    8

    9

    11 22 10 17 9 18 12

    24

    8

    716520 710

    DATA TABULATION (1)

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    3/39

    Lesson 13 - 3

    Year 1

    CS113/0401/v1

    Category

    (No of sheets

    used)

    Tally Frequency

    0 - 115 - 261111 1111 1111 1111 1111 1

    10 - 371111 1111 1111 1111 1111

    1111 1111 11

    15- 311111 1111 1111 1111 1111

    1111 120 - 161111 1111 1111 1

    25 - 91111 1111

    120Total

    Frequency distribution table

    DATA TABULATION (2)

    Tabulate in (discrete) categories

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    4/39

    Lesson 13 - 4

    Year 1

    CS113/0401/v1

    FREQUENCY DISTRIBUTION(1)

    Raw data

    Raw data are collected data

    which have been organized

    numerically

    Array

    An array is an arrangement of

    raw numerical data in ascendingor descending order of

    magnitude. The difference

    between the largest and smallest

    number is called the range of the

    data

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    5/39

    Lesson 13 - 5

    Year 1

    CS113/0401/v1

    FREQUENCY DISTRIBUTION(2)

    Frequency distribution

    When summarizing a large

    number of raw data it is often

    useful to distribute the data into

    classes or categories and to

    determine the number of

    individuals belonging to each

    class, called the class frequency

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    6/39

    Lesson 13 - 6

    Year 1

    CS113/0401/v1

    EXAMPLE

    A set of 100 students obtainedfrom an alphabetical listing of an

    university record.

    Their weights ranging from 60kgto 74kg are tabulated.

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    7/39

    Lesson 13 - 7

    Year 1

    CS113/0401/v1

    Mass ( kilograms) Number of Students

    60 - 62

    63 - 65

    66 - 68

    69 - 71

    72 - 74

    5

    18

    42

    27

    8

    Total 100

    EXAMPLE

    The first class or category, for

    example consists of masses from 60

    to 62 kg and is indicated by the

    symbol 60 - 62. Since 5 students

    have masses belonging to this class,

    the corresponding class frequency is

    5.

    Data organized and summarized in

    the above frequency distribution are

    often called grouped data

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    8/39

    Lesson 13 - 8

    Year 1

    CS113/0401/v1

    CLASS INTERVAL

    A symbol defining a class such as60 - 62 is called a class interval.

    The end numbers 60 and 62, are

    called the class limits.

    The smaller number 60 is the

    lower class limit and the larger

    number 62 is the upper class

    limit.

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    9/39

    Lesson 13 - 9

    Year 1

    CS113/0401/v1

    CLASS MARK

    A class mark is the midpoint ofthe class interval and is obtained

    by adding the lower and upper

    class limits and dividing by two

    In the previous examples, the

    class mark of the interval 60 - 62

    is (60 + 62) / 2 = 61

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    10/39

    Lesson 13 - 10

    Year 1

    CS113/0401/v1

    MEDIAN (1)

    The median of a set of numbers

    arranged in order of magnitude isthe middle value or the arithmetic

    mean of the two middle values.

    Example 1

    The set of numbers

    3, 4, 4, 5, 6, 8, 8, 8, 10

    For an odd number of data the

    median occurs at position

    (N + 1) / 2

    = 10 / 2

    = 5th position

    Therefore the median = 6

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    11/39

    Lesson 13 - 11

    Year 1

    CS113/0401/v1

    MEDIAN (2)

    Example 2 The set of numbers

    5, 5, 7, 9, 11, 12, 15, 18

    For even number of data themedian is the average of the two

    middle values

    The median= (Pos 4 + Pos 5) / 2

    = (9 + 11) / 2

    = 10

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    12/39

    Lesson 13 - 12

    Year 1

    CS113/0401/v1

    For grouped data the median,obtained by interpolation is given

    by

    MEDIAN = L1 + C

    Where

    L1 = lower class boundary of the

    median class(I.e. the classcontaining the median).

    N = number of items in the data

    (I.e. total frequency)

    median

    - 1

    N

    2

    MEDIAN (1)

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    13/39

    Lesson 13 - 13

    Year 1

    CS113/0401/v1

    MEDIAN (2)

    1 = sum of frequenciesof all classes lower

    than the median

    class

    median = frequency of median

    class

    c = size of median classinterval

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    14/39

    Lesson 13 - 14

    Year 1

    CS113/0401/v1

    MEDIAN OF A GROUPEDFREQUENCY DISTRIBUTION

    Draw a Cumulative FrequencyDiagram

    Search for the middle value on

    the c axis and read off thecorresponding value on the x axis

    This is the median

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    15/39

    Lesson 13 - 15

    Year 1

    CS113/0401/v1

    MEDIAN FROM A

    CUMULATIVE FREQUENCYDIAGRAM

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    16/39

    Lesson 13 - 16

    Year 1

    CS113/0401/v1

    MODE (1)

    The mode of a set of numbers is

    that value which occurs with the

    greatest frequency, I.e. it is the

    most common value. The mode

    may not exit, and even of it does

    exists it may not be unique

    Example

    The set

    2, 2, 5, 7, 9, 9, 9, 10, 11, 12, 18

    has mode 9

    Example

    The set

    3, 5, 8, 10, 12, 15, 16

    has no mode

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    17/39

    Lesson 13 - 17

    Year 1

    CS113/0401/v1

    MODE (2)

    Example

    The set

    2, 3, 4, 4, 4, 5, 5, 7, 7, 7, 9

    has mode 4 and 7 and is

    called bimodal

    A distribution having only one

    mode is called unimodal

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    18/39

    Lesson 13 - 18

    Year 1

    CS113/0401/v1

    MODE OF A FREQUENCYDISTRICUTION

    Ungrouped data

    Mode is the x value which has

    the highest value of

    Grouped data

    Cant find mode, only the modal

    class

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    19/39

    Lesson 13 - 19

    Year 1

    CS113/0401/v1

    x f

    51 - 55

    55 - 60

    61 - 65

    12

    16

    10

    MODAL CLASS

    55 - 60 is the modal class

    We dont know x values before

    grouping, so we cant find the

    mode exactly

    N.B.

    Actual mode might not even be in

    this class

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    20/39

    Lesson 13 - 20

    Year 1

    CS113/0401/v1

    In cases where grouped datawhere frequency curve has been

    constructed to fit the data, the

    mode will be the value (or values)

    of x corresponding to the

    maximum point (or points) on thecurve, From a frequency

    distribution or histogram the

    mode can be obtained from the

    following formula,

    Mode = L1 + ((

    1 + 2

    1* c

    MODE (1)

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    21/39

    Lesson 13 - 21

    Year 1

    CS113/0401/v1

    Where

    L1 = lower class boundary ofmodal class

    (i.e. class containing the

    mode).

    1 = excess of modal frequency

    over frequency of next lower

    class

    2 = excess of modal frequency

    over frequency of the next

    higher class

    c = size of modal class interval

    MODE (2)

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    22/39

    Lesson 13 - 22

    Year 1

    CS113/0401/v1

    GROUPED MODE FROMHISTOGRAM (1)

    Can only ESTIMATE

    Assume mode is in Modal Class

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    23/39

    Lesson 13 - 23

    Year 1

    CS113/0401/v1

    Calculation Mode Estimate

    = 25 + 5 x

    = 25 + 5 x

    = 25 + 1.9

    = 26.9

    40

    40 + 64

    40

    104

    GROUPED MODE FROMHISTOGRAM (2)

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    24/39

    Lesson 13 - 24

    Year 1

    CS113/0401/v1

    X =

    X1 + X2 + X3+ .. + Xn

    N

    =

    n

    i=1

    Xi

    N

    ARITHMETIC MEAN (1)

    The arithmetic mean or the meanof a set of N numbers X1, X2, X3,

    ..., Xn is donoted by X is defined

    as

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    25/39

    Lesson 13 - 25

    Year 1

    CS113/0401/v1

    ARITHMETIC MEAN (2)

    Eight numbers:

    7, 21, 13, 17, 23, 18, 9, 20

    Add them = 128

    Divide by 8 = 16

    This is the arithmetic mean

    It is the the most common

    definition of average

    It only works with quantitative

    data

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    26/39

    Lesson 13 - 26

    Year 1

    CS113/0401/v1

    X =

    1X1 + 2X2+ .. + nXn

    1 + 2 + . n

    =

    n

    i=1

    iXi

    in

    i=1

    X

    ARITHMETIC MEAN (3)

    If the number X1, X2, X3, ..., Xnoccurs 1, 2, 3, ..., n times

    respectively, the arithmetic mean

    is

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    27/39

    Lesson 13 - 27

    Year 1

    CS113/0401/v1

    MEAN OF A FREQUENCY

    DISTRIBUTION

    Mean age = = 20.77

    (rounded to nearest integer, 21)

    2077100

    Age (x) xFrequency ()

    17

    18

    19

    20

    21

    22

    23

    24

    25

    26

    3

    8

    14

    21

    24

    13

    7

    6

    3

    1

    51

    144

    266

    420

    504

    286

    161

    144

    75

    26

    = 100 x = 2077

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    28/39

    Lesson 13 - 28

    Year 1

    CS113/0401/v1

    HISTOGRAMS (1)

    Only used for quantitative data

    Histogram is like a bar chart, but

    with no gaps between bars and

    calibrated horizontal axis

    Order of bars depends on value

    and on horizontal scale

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    29/39

    Lesson 13 - 29

    Year 1

    CS113/0401/v1

    HISTOGRAMS (2)

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    30/39

    Lesson 13 - 30

    Year 1

    CS113/0401/v1

    HISTOGRAMS (3)

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    31/39

    Lesson 13 - 31

    Year 1

    CS113/0401/v1

    AREA IN HISTOGRAMS

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    32/39

    Lesson 13 - 32

    Year 1

    CS113/0401/v1

    Line of Code No of Programs

    100 -

    150 -

    125 -

    39

    51

    42

    24

    12

    3

    325 - 349

    300 -

    21275 -

    30250 -

    200 -

    175 -

    225 -

    12

    6

    CUMULATIVE FREQUENCYDIAGRAMS (1)

    Table 1:

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    33/39

    Lesson 13 - 33

    Year 1

    CS113/0401/v1

    Line of Code(less than)

    CumulativeFrequency

    100

    150

    125

    132

    81

    39

    15

    3

    0

    325

    300

    201275

    171250

    200

    175

    225

    222

    234

    240350

    CUMULATIVE FREQUENCYDIAGRAMS (2)

    Table 2:

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    34/39

    Lesson 13 - 34

    Year 1

    CS113/0401/v1

    020

    40

    60

    80

    100120

    140

    160

    180

    200

    220

    240

    0 50 100 150 200 250 300 350

    Lines of code (less than)

    CummulativeFrequency

    CUMULATIVE FREQUENCYDIAGRAMS(3)

    Cumulative

    Frequency

    Curve

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    35/39

    Lesson 13 - 35

    Year 1

    CS113/0401/v1

    n

    i=1

    (Xi - X) 2

    N

    STANDARD DEVIATION (1)

    The Standard Deviation of a setof N numbers X1, X2, ..., Xn is

    denoted by S.D. and is defined by

    S.D. =

    Where

    X = Arithmetic Mean

    N = Total Number of element in

    the set

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    36/39

    Lesson 13 - 36

    Year 1

    CS113/0401/v1

    nj=1

    [ j (Xj- X)2 ]

    n

    j=1

    i Xi2

    i

    i Xi2-

    i( )

    S.D.

    or

    S.D. =

    STANDARD DEVIATION (2)(GROUPED DATA)

    If X1, X2, ..., Xn occurs withfrequencies 1, 2, ..., n

    respectively, the standard

    deviation can be written as

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    37/39

    Lesson 13 - 37

    Year 1

    CS113/0401/v1

    Question 6 c) NCC 1/93

    On test the actual access times

    for 50 hard disc drives weredistributed as follows:

    Calculate the mean access time andthe standard deviation.

    Time (ms)

    No. of Drives

    22.6

    3

    22.7

    1

    23.022.9

    106

    22.8 23.223.1

    914 25

    23.3

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    38/39

    Lesson 13 - 38

    Year 1

    CS113/0401/v1

    Alternative Question 6cx22.6

    22.7

    22.8

    22.9

    23.0

    23.1

    23.2

    23.3

    f fx fx2

    1

    3

    6

    10

    14

    9

    5

    2

    22.6

    68.1

    136.8

    229.0

    322.0

    207.9

    116.0

    46.6

    510.76

    1545.87

    3119.04

    5244.10

    7406.00

    4802.49

    2691.20

    1085.781149.0 26405.24 (1 mark for each total) 2

    2[1] [1]

    Mean = 1149

    50

    = 22.98 [1]

    S.D =fx2f

    ( X )2

    =26405.24

    50(22.98)2

    = 0.156

    [1]

  • 7/29/2019 MELJUN CORTES DATA TYPES Rm104tr-13

    39/39

    Year 1

    The variance of a set of data isdefined as the square of the

    standard deviation and is thus

    given by (S.D.)

    Variance =

    i.e.

    Variance = (S.D.)2

    n

    j=1

    [ j (Xj- X)2 ]

    n

    j=1j

    VARIANCE