Assignment II for Biostats Course VHM 801 at AVC - Winter semester 2018
The assignment is worth 15% of the final course mark. Please be aware that by handing
in the home assignment you implicitly acknowledge to have read and accepted
the instructions for home assignments as described
on the VHM 801 homepage.
The assignment uses information from studies conducted in the early 1990s about the impact of chess
training on reading skills. The researchers hypothesized that chess play
would develop skills of importance for reading, and hence also lead to
improvements in reading ability. Students who had taken a standardized
reading (DRP) test and had shown interest in chess (according to certain criteria) were offered chess
instruction after school; participation was voluntary. After the chess instruction period, the
students took the DRP test again; the testing was standardized to a one
year period between the pre- and post-tests. The scale of DRP tests is adjusted
to the grade level (so that the 50% percentile corresponds to the grade median for all students),
and the scores obtained in the study were converted to grade-level percentiles.
Therefore, a test score of e.g. 28 means
that the student's result corresponded to the 28% percentile for
her/his grade. A total of 53 students participated in the part of the
study presented here.
A dataset for the study is available
in Minitab format and as a comma-separated file,
for import into Stata and other statistical software. The data contain the variables: subject id, pre-score (percentile)
of the DRP test, and post-score (percentile) of the DRP test. The home assignment has six
questions (a)-(f) which should all be answered.
-
A question of interest was whether the participants, prior to chess instruction, were
representative for their grade(s) in terms of reading skills. Use
statistical inference to assess whether participants' reading skills were systematically lower or
higher than 50%. State your results and conclusions carefully. Moreover, list and discuss
critically the
assumptions the inference is based on; here, as well as in following questions,
include as much information from a descriptive data analysis as
necessary for your arguments.
-
The main focus of interest was the comparison of reading
skills prior to and after chess instruction. Should the test scores before and after chess instruction
be considered as paired or
independent samples? Explain your reasoning. Additionally describe, as well as you can from the
information provided, the population we can make inference about based on
the data and the results (to be obtained).
-
Estimate the mean improvement (possibly negative) in reading score
percentiles after the chess instruction. Calculate 90% and 95% confidence intervals for the mean improvement, and
interpret these intervals carefully. Make sure to explain clearly how
the two intervals differ in their interpretation.
-
Use a statistical test to assess whether there was
an improvement in the DRP scores after chess instruction. State your
hypotheses, report the value of your test statistic, and give the
P-value for the test. Provide your conclusions, both statistical and
subject matter. Explain also here which assumptions the inference
for this and the previous question (c) rely on, and discuss briefly their
validity.
-
According to one source discussing these data, the data for subject no. 7 could be considered
as a "mild outlier". Inspect the data and try to deduce how this conclusion
may have been reached - do you agree with the assessment?
(Hint: No statistical analysis is required for the discussion.)
Here we will tentatively also perform an analysis without the data for subject for no. 7,
in order to explore how they affect the results.
Specifically, repeat the analyses of parts (c) (95% CI only)
and (d) for the reduced dataset. Did removing the (possible) outlier change your
results? Comment on your findings.
-
One way to quantify whether the improvement of 42 percentage points for subject no. 7
should be considered as an
outlier is to compute the probability that such an extreme observation happened
by chance only. This question will take you through the steps to approximately compute that
probability. The first step is to set up a statistical model on which the
calculation will be based. Here we will assume a normal distribution with mean and standard deviation equal to
the corresponding sample values in the sample of improvements (post-score minus pre-score) where the value 42 has been removed.
Determine these estimates. Next compute the probability that one observation
from this (normal) distribution differs as much from the mean as the value 42 does
(either on the positive or negative side). Finally, compute the
probability that in a sample of 53 independent observations from this distribution
at least one of them
differs as much from the mean as the value 42 does. Interpret the
computed probability - does it seem reasonable that the value 42
happened by chance only? As an optional bonus question (which may offset
loss of points in other questions), discuss critically the assumptions involved in
the calculation and whether they might have affected (biased) the results, and if so, indicate also
the direction of the bias.
(Note: Other methods for statistical assessment of outliers exist, but it
is not expected or recommended that you include those in your discussion. If you nevertheless
decide to include methods not covered by Sessions 1-7 of VHM 801, you have to explain them
in enough detail to demonstrate that you understand how they work and which assumptions they
are based on.)
Henrik Stryhn
(hstryhn@upei.ca) 2018-02-17