STATISTICS 1 SCHEME OF WORK 2005-2006

S1 UNIT 14 Weeks: - 35 lessons. The course has been allocated 30 hours, leaving 5 hours for assessment, revision and lessons missed.

The textbook used is Heinemann S1 New Edition, although the equivalent reference from the Heinemann T1 Old Edition is given in brackets. Other sources include Crawshaw and Chambers (if unstated, assume this is the source) and also worksheets of past examination questions for certain topics, where starred (see MC for a copy of these materials).

Homework of approximately 1 hour duration is to be given after each lesson, including revision, review exercise and past paper questions whenever possible.

The external assessment will be by examination (1½ hours, 75 marks) with approximately 7 questions. It is more than likely that there will be a question on each topic area. The formulae that candidates are expected to know are indicated in the appropriate sections in this scheme. Other formulae and tables can be found in the booklet Mathematical Formulae including Statistical Formulae and Tables – please allow the pupils to familiarise themselves with this document as early as possible in the course.

In order to assist pupils’ learning, give short, timed tests as lesson starters, using examination style questions as early on in the course as is possible.

Time	Specification	Notes	Page and Exercise	Other Sources
½ hour	• Mathematical models in probability and statistics: Introduce the course content and ideas Discuss the basic ideas of mathematical modelling as applied in probability and statistics	See insert “What is Statistics all about?” Learn the definition of a model: “A statistical procedure devised to describe or make predictions about the expected behaviour of a real world problem”.	P1-3 (P1-3)
5½ hours	• Probability*: Elementary Probability, including terminology, sample space diagrams and complementary events. The addition rule & Venn diagrams. The multiplication rule and conditional probability. Making use of tree diagrams. Independent events and mutually exclusive events	P(A’) = 1 – P(A) is required P(A ∪ B) = P(A) + P(B) – P(A ∩ B) P(A ∩ B) = P(A) P(B/A) Include examples that sample both with and without replacement. Also look at situations where different orders of events result in the same outcome (e.g. P(2 People pass & 1 fails)), briefly introducing factorial notation as an alternative. P(B/A) = P(B), P(A/B) = P(A), P(A ∩ B) = P(A)P(B) for indep.; P(A ∩ B) = 0 for exclusive.	[Overall Revision: Ex5E P92-4] Ex5AP73; (Ex5A P99) Ex5BP78; (Ex5B P105) Ex5D P90-91 Ex5CP90; (Ex5C P112)	Ex3AP141-2 Ex3BP147 Ex3DP157 Ex3GP168-9 Ex3HP173; Ex3IP177
4 hours	• Measures of location and dispersion*: Types of data – vocabulary Measures of location: -Mean -Median & Quantiles : see Waldo site “Linear Interpolation to get the median”. -Mode Measures of dispersion: -Variance & Standard Deviation: -Range and Interpercentile ranges Other: Interpretation of measures of location and dispersion and the relative dis/advantages of each Skewness and the concept of outliers	Discrete, continuous, (qualitative), (quantitative), grouped, ungrouped (define class width, boundaries etc.) “One number to suitably represent a set of data” -Include use of coding and combined mean examples. Formulae not in booklet -Suppose f is the fraction associated with the quantile and n is the total frequency. If fn is an integer r, the quantile is the mean of the rth and the (r+1)th observation. If fn is not an integer but lies between r and (r+1), the quantile is the (r+1)th observation “One number to signify the spread of a data set” Standard Deviation = √ Variance Include examples with coding The use of “n” as a divisor is adequate for S1 Demonstrate both formulae as Σ(x-μ)/n is useful in regression later on. IQR = Q 3 – Q 1 is required and could be from a cumulative frequency polygon or by simple interpolation from a table. For grouped data, interpolation at the value of fn is adequate. e.g. most data should lie within one standard deviation of the mean. It may be best to use median & IQR as opposed to mean & st.dev. if the data contains outliers Any rule to identify an outlier will be specified in the question e.g. any value that is more than 1.5x the st.dev. away from the mean	Read P5-7, 16-18 Ex3AP48-52 (Ex4AP62) CodingP52Q15; P111Q50 Ex4AP64-8 (Ex4BP78)	Ex1oP90 Coding Ex1iP49 Ex2dP131 SeeCh2 P98 & Ex2aP106
3 hours	• Graphical Representation of Sample data*: -Stem & Leaf diagrams -Boxplots -Histograms: see Waldo site “Histograms”	Back-to-back ones will be required to compare distributions Include labels and a scale and, when comparing two boxplots, draw them on the same scale. Candidates may be asked to represent outliers with crosses past the end of the whisker Including grouped data with unequal classes (frequency density = frequency/class width) Include a brief look at relative frequency ones	Ex2AP13 (Ex3AP21q6-12) Ex4AP64qus12, 15 Ex2BP26Qu7-11 (Ex3BP35Qu6-12)	Ex2dP131 Ex1bP20 –do an inverse question to find frequency
6 hours	• Discrete Random Variables*: Concept and definition The Probability function p(x) and the cumulative distribution function F(x) Mean and Variance of a d.r.v. Mean and Variance for a linear combination of random variables The discrete uniform distribution	Use p(x) = P(X = x) and F(x 0 ) = P(X ≤ x 0 ) = Σp(x) Use of E(X) and E(X²) for calculating the variance E(aX + b) = aE(X) + b Var(aX + b) = a²Var(X) are both needed The mean and variance of this distribution	Ex8AP150(Ex6AP126) Ex8BP157(Ex6BP133) Ex8CP160 (Ex6DP149-select) Ex8DP185(Ex7AP164)	Ex4aP223 (worksheet) Ex4b, 4c, 4d P229
6 hours	• Continuous Random Variables*: Brief introduction (more in S2) Normal Distribution Knowledge of shape, properties and symmetry Mean and Variance Cumulative distribution function and the use of tables (see Waldo site “Normal Distribution”)	Focus on the reason we use areas to represent probabilities (link to P1 Integration) Probability density function is not required Derivation of these is not required Derivation of this is not required. The standardisation formula must be learned. Pupils may interpolate but it is not required. Questions may involve the solution of simultaneous equations – this may, therefore, involve the “reverse” normal table	Ex9AP177(Ex8BP209)	Ex7a,7b,7c,7d,7g P373 →
5 hours	• Correlation & Regression: Scatter diagrams and correlation Product Moment Correlation Coefficient – its use, interpretation and limits Line Graphs – “explanatory” (independent-x) and “response” (dependent-y) Linear Regression: “Least Squares Regression Line” and the interpretation of the regression line	Line of Best Fit through average point Derivation and tests of significance will not be required Use to make predictions within the range of the explanatory variable and discuss the dangers of extrapolation (does a line extend back or forward with meaning?). Variables other than x, y may be used and a linear change of variable may be required Derivations will not be required Answers can be given to, say, 3s.f., but full values must be substituted back when, say, finding “c” When interpreting, more than a comment such as “negative correlation” is required. E.g. in a graph of weight against time, the y-axis intercept signifies birth weight and the gradient growth rate.	Ex6AP126 As above Read Pages 131-9 Ex7AP139	Ex12aP635 Ex12dP669 Ex12bP643; Ex12cP656

What is Statistics all about?

Consider these 3 classrooms, each containing 30 pupils:

If a child is picked at random and found to have an IQ of 95, what classroom was the child taken from? Discuss.

(The answer could have been A with some chance that we were wrong and should have chosen B. It also could have been B with some chance that we were wrong and should have chosen A).

Hence, statistics is all about making decisions when we don’t really know what the answer should be.

Errors:

It is possible, then, that we can make errors in statistics. Ideally, we want the chance that we have made an error to be as small as possible.

There are also different types of errors that we could make. Consider this table that shows what might happen when I try to cross a road:

	I CROSS	I STAY
QUIET	√	X
BUSY	X	√

If the road is quiet and I decide to cross then I will be okay. Similarly, if the road is busy and I decide to stay then I will not be run over.

The problems occur when I decide to cross when it is busy, or stay when it is quiet. Both of these are errors. However, the consequence of these two types of errors is clearly very different. In the former case, I will risk being run over which is a serious error. In the latter instance, I will not suffer such drastic consequences, but passers by may think it very strange that I have decided not to cross when clearly the road is safe.

Hence, statistics is all about having the confidence to make appropriate decisions after consulting the available evidence, despite the fact that there could be risk involved.