STATISTICS 1 SCHEME OF WORK 2005-2006

S1 UNIT 14 Weeks: - 35 lessons.  The course has been allocated 30 hours, leaving 5 hours for assessment, revision and lessons missed.

The textbook used is Heinemann S1 New Edition, although the equivalent reference from the Heinemann T1 Old Edition is given in brackets.  Other sources include Crawshaw and Chambers (if unstated, assume this is the source) and also worksheets of past examination questions for certain topics, where starred (see MC for a copy of these materials).

Homework of approximately 1 hour duration is to be given after each lesson, including revision, review exercise and past paper questions whenever possible.

The external assessment will be by examination (1½ hours, 75 marks) with approximately 7 questions.  It is more than likely that there will be a question on each topic area.  The formulae that candidates are expected to know are indicated in the appropriate sections in this scheme.  Other formulae and tables can be found in the booklet Mathematical Formulae including Statistical Formulae and Tables – please allow the pupils to familiarise themselves with this document as early as possible in the course.

In order to assist pupils’ learning, give short, timed tests as lesson starters, using examination style questions as early on in the course as is possible.

Time

Specification

Notes

Page and Exercise

Other Sources

½ hour

Mathematical models in probability and statistics:

Introduce the course content and ideas

Discuss the basic ideas of mathematical modelling as applied in probability and statistics

See insert “What is Statistics all about?”

Learn the definition of a model: “A statistical procedure devised to describe or make predictions about the expected behaviour of a real world problem”.

P1-3 (P1-3)

 

5½ hours

Probability*:

Elementary Probability, including terminology, sample space diagrams and complementary events.

The addition rule & Venn diagrams.

The multiplication rule and conditional probability.  Making use of tree diagrams.

Independent events and mutually exclusive events

P(A’) = 1 – P(A) is required

P(A B) = P(A) + P(B) – P(A B)

P(A B) = P(A) P(B/A)

Include examples that sample both with and without replacement.  Also look at situations where different orders of events result in the same outcome (e.g. P(2 People pass & 1 fails)), briefly introducing factorial notation as an alternative.

P(B/A) = P(B), P(A/B) = P(A), P(A B) = P(A)P(B) for indep.; P(A B) = 0 for exclusive.

[Overall Revision: Ex5E P92-4]

Ex5AP73; (Ex5A P99)

Ex5BP78; (Ex5B P105)

Ex5D P90-91

Ex5CP90; (Ex5C P112)

Ex3AP141-2

Ex3BP147

Ex3DP157

Ex3GP168-9

Ex3HP173; Ex3IP177

4 hours

Measures of location and dispersion*:

Types of data – vocabulary

Measures of location:

-Mean

-Median & Quantiles : see Waldo site “Linear Interpolation to get the median”.

-Mode

Measures of dispersion:

-Variance & Standard Deviation:

-Range and Interpercentile ranges

Other:

Interpretation of measures of location and dispersion and the relative dis/advantages of each

Skewness and the concept of outliers

Discrete, continuous, (qualitative), (quantitative), grouped, ungrouped (define class width, boundaries etc.)

“One number to suitably represent a set of data”

-Include use of coding and combined mean examples.  Formulae not in booklet

-Suppose f is the fraction associated with the quantile and n is the total frequency.  If fn is an integer r, the quantile is the mean of the rth and the (r+1)th observation.  If fn is not an integer but lies between r and (r+1), the quantile is the (r+1)th observation

“One number to signify the spread of a data set”

Standard Deviation = Variance

Include examples with coding

The use of “n” as a divisor is adequate for S1

Demonstrate both formulae as Σ(x-μ)/n is useful in regression later on.

IQR = Q 3 – Q 1 is required and could be from a cumulative frequency polygon or by simple interpolation from a table.  For grouped data, interpolation at the value of fn is adequate.

e.g. most data should lie within one standard deviation of the mean.  It may be best to use median & IQR as opposed to mean & st.dev. if the data contains outliers

Any rule to identify an outlier will be specified in the question e.g. any value that is more than 1.5x the st.dev. away from the mean

Read P5-7, 16-18

Ex3AP48-52 (Ex4AP62)

CodingP52Q15; P111Q50

Ex4AP64-8 (Ex4BP78)

Ex1oP90

Coding Ex1iP49

Ex2dP131

SeeCh2 P98 & Ex2aP106

3 hours

Graphical Representation of Sample data*:

-Stem & Leaf diagrams

-Boxplots

-Histograms: see Waldo site “Histograms”

Back-to-back ones will be required to compare distributions

Include labels and a scale and, when comparing two boxplots, draw them on the same scale.  Candidates may be asked to represent outliers with crosses past the end of the whisker

Including grouped data with unequal classes (frequency density = frequency/class width)

Include a brief look at relative frequency ones

Ex2AP13 (Ex3AP21q6-12)

Ex4AP64qus12, 15

Ex2BP26Qu7-11 (Ex3BP35Qu6-12)

Ex2dP131

Ex1bP20 –do an inverse question to find frequency

6 hours

Discrete Random Variables*:

Concept and definition

The Probability function p(x) and the cumulative distribution function F(x)

Mean and Variance of a d.r.v.

Mean and Variance for a linear combination of random variables

The discrete uniform distribution

Use p(x) = P(X = x) and F(x 0 ) = P(X x 0 ) = Σp(x)

Use of E(X) and E(X²) for calculating the variance

E(aX + b) = aE(X) + b

Var(aX + b) = a²Var(X) are both needed

The mean and variance of this distribution

Ex8AP150(Ex6AP126)

Ex8BP157(Ex6BP133)

Ex8CP160 (Ex6DP149-select)

Ex8DP185(Ex7AP164)

Ex4aP223 (worksheet)

Ex4b, 4c, 4d P229

6 hours

Continuous Random Variables*:

Brief introduction (more in S2)

Normal Distribution

Knowledge of shape, properties and symmetry

Mean and Variance

Cumulative distribution function and the use of tables

(see Waldo site “Normal Distribution”)

Focus on the reason we use areas to represent probabilities (link to P1 Integration)

Probability density function is not required

Derivation of these is not required

Derivation of this is not required.  The standardisation formula must be learned.  Pupils may interpolate but it is not required.

Questions may involve the solution of simultaneous equations – this may, therefore, involve the “reverse” normal table

Ex9AP177(Ex8BP209)

Ex7a,7b,7c,7d,7g P373

5 hours

Correlation & Regression:

Scatter diagrams and correlation

Product Moment Correlation Coefficient – its use, interpretation and limits

Line Graphs – “explanatory” (independent-x) and “response” (dependent-y)

Linear Regression: “Least Squares Regression Line” and the interpretation of the regression line

Line of Best Fit through average point

Derivation and tests of significance will not be required

Use to make predictions within the range of the explanatory variable and discuss the dangers of extrapolation (does a line extend back or forward with meaning?).

Variables other than x, y may be used and a linear change of variable may be required

Derivations will not be required

Answers can be given to, say, 3s.f., but full values must be substituted back when, say, finding “c”

When interpreting, more than a comment such as “negative correlation” is required.  E.g. in a graph of weight against time, the y-axis intercept signifies birth weight and the gradient growth rate.

Ex6AP126

As above

Read Pages 131-9

Ex7AP139

Ex12aP635

Ex12dP669

Ex12bP643; Ex12cP656

What is Statistics all about?

Consider these 3 classrooms, each containing 30 pupils:

If a child is picked at random and found to have an IQ of 95, what classroom was the child taken from? Discuss.

(The answer could have been A with some chance that we were wrong and should have chosen B.  It also could have been B with some chance that we were wrong and should have chosen A).

Hence, statistics is all about making decisions when we donshapeimage_2.pngshapeimage_1.pngshapeimage_.png’t really know what the answer should be.

Errors:

    It is possishapeimage_5.pngshapeimage_4.pngshapeimage_3.pngble, then, that we can make errors in statistics.  Ideally, we want the chance that we have made an error to be as small as possible.

    There are also different types of errors that we could make.  Consider this table that shows what might happen when I try to cross a road:

 

I CROSS

I STAY

QUIET

X

BUSY

X

If the road is quiet and I decide to cross then I will be okay.  Similarly, if the road is busy and I decide to stay then I will not be run over.  

The problems occur when I decide to cross when it is busy, or stay when it is quiet.  Both of these are errors.  However, the consequence of these two types of errors is clearly very different.  In the former case, I will risk being run over which is a serious error.  In the latter instance, I will not suffer such drastic consequences, but passers by may think it very strange that I have decided not to cross when clearly the road is safe.

Hence, statistics is all about having the confidence to make appropriate decisions after consulting the available evidence, despite the fact that there could be risk involved.