Assignment II for Biostats Course VHM 801 at AVC - Fall semester 2022
The assignment is worth either 10% or 15% of the final course mark. Questions 1-3 constitute an
assignment for 10%, whereas Questions 1-5 constitute an assignment for 15%. Please be aware that by handing
in the home assignment you implicitly acknowledge to have read and accepted
the instructions for home assignments as described
on the VHM 801 homepage.
We consider data from an experimental study done in a laboratory to investigate the
implications of performing duplicate measurements on the same sample. The
measurements of interest were concentrations of a particular chemical
component in samples submitted to the lab. A total of 20 such
samples (expected to contain slightly different concentrations) were selected
and arbitrarily divided into two groups, say A and B, with 10 samples in
each group.
A lab technician was asked to measure the concentrations of the
10 samples in group A, using for each sample duplicate measurements. In
the table below, the duplicate measurements for each of the 10 samples
are called A1 and A2, where A1 is the first measurement and A2 is the
second measurement taken immediately following the first one.
| Sample | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10
|
|---|
| A1 | 5.25 | 6.10 | 5.70 | 5.76 | 5.00
| 6.04 | 5.49 | 5.33 | 4.84 | 5.89
|
|---|
| A2 | 5.17 | 6.19 | 5.61 | 5.67 | 4.94
| 6.02 | 5.55 | 5.42 | 4.94 | 5.81
|
|---|
Next, each of the 10 samples in group B were split into two equal parts,
and the resulting 20 (sub)samples were given to the technician for
measurement of concentrations. The subsamples were presented to the
technician in random order, and the technician was not told (or could otherwise identify)
which subsamples were derived from the same sample. The resulting
duplicate measurements for the 10 samples, labelled as B1 and B2 corresponding to the order
in which the technician carried out the measurements, are shown in the next table.
| Sample | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20
|
|---|
| B1 | 4.79 | 5.61 | 5.28 | 5.89 | 5.42
| 4.91 | 5.83 | 5.85 | 5.25 | 5.01
|
|---|
| B2 | 4.80 | 5.55 | 5.03 | 6.11 | 5.32
| 5.04 | 6.02 | 5.77 | 5.39 | 5.22
|
|---|
The data, consisting of these measurements in four separate columns, are
available in Minitab format as well as a comma-separated file, for
import into Stata and other statistical software.
As mentioned above, you may either answer Questions 1-3 for the 10% version of the assignment
or all Questions 1-5 for the 15% version of the assignment. You need to indicate
clearly which version of the assignment you answer. Make sure to include
explanations and justifications for all procedures and calculations
used, including statement and validation (as well as possible from the data at hand) of the assumptions made.
- In order to assess whether the 20 samples were reasonably representative of samples
typically measured at the laboratory, it was of interest to describe the distribution
of measurements obtained from the samples. Use descriptive statistics to describe the distribution,
and compute a 90% confidence interval for the mean concentration. (Hint: For this question,
you need to decide which parts of the data to use - different valid options exist, and you should explain
and justify your choice.)
- Carry out a statistical analysis to determine whether any systematic
differences seem to be present in the mean concentrations obtained in the
initial measurement and the subsequent duplicate measurement when carried out as
described for group A. State relevant assumptions and hypotheses for
your analysis, and draw conclusions.
- The main purpose of the study was to compare the two different
scenarios for duplicate measurements in groups A and B. Carry out a
similar analysis as above for group B, and compare with the results from
group A. Irrespective of whether your analyses give similar or different
results among the two groups, continue the analysis by directly comparing
the differences between the first and second samples in the scenarios, either
by a confidence interval or by a statistical test. Summarize all your analyses into conclusions
about whether systematic mean differences seem to exist between first and
second samples, both within and between the two groups/scenarios.
- Characterize the differences between the scenarios in terms of blinding (of measurements).
It is sometimes claimed that duplicate measurements conducted by the
same technician without blinding will tend to produce values that are
unrealistically close, possibly because knowledge about the first
measurement affects the second measurement. For the scenarios in
groups A and B, calculate a statistic (estimate) to quantify the
variability (or spread) between two duplicate measurements. Compare
these estimates between the two scenarios and draw conclusions. (Hint: We have not
discussed how to statistically compare such estimates between groups, so you are not
expected to base your discussion on a statistical comparison; if you nevertheless decide
to do so, you must describe the methods you use.)
- If the above claim that non-blinded duplicate measurements lead to
unrealistically close values was true, what would this imply for the
numerical (or absolute) deviations (or differences) between the first and second
measurements, compared to blinded duplicate measurements? Construct
suitable variable(s) to statistically assess whether the data show any
evidence of such differences between the results obtained in groups A
and B, conduct an analysis and draw conclusions. Make sure to pay
attention to the assumptions behind the analysis, and if any concerns
are noted, try to corroborate your conclusion by supplementary analysis.
Henrik Stryhn
(hstryhn@upei.ca) 2022-10-26