Assignment II for Biostats Course VHM 801 at AVC - Fall semester 2022

The assignment is worth either 10% or 15% of the final course mark. Questions 1-3 constitute an assignment for 10%, whereas Questions 1-5 constitute an assignment for 15%. Please be aware that by handing in the home assignment you implicitly acknowledge to have read and accepted the instructions for home assignments as described on the VHM 801 homepage.

We consider data from an experimental study done in a laboratory to investigate the implications of performing duplicate measurements on the same sample. The measurements of interest were concentrations of a particular chemical component in samples submitted to the lab. A total of 20 such samples (expected to contain slightly different concentrations) were selected and arbitrarily divided into two groups, say A and B, with 10 samples in each group.

A lab technician was asked to measure the concentrations of the 10 samples in group A, using for each sample duplicate measurements. In the table below, the duplicate measurements for each of the 10 samples are called A1 and A2, where A1 is the first measurement and A2 is the second measurement taken immediately following the first one.

Sample12345678910
A15.256.105.705.765.00 6.045.495.334.845.89
A25.176.195.615.674.94 6.025.555.424.945.81

Next, each of the 10 samples in group B were split into two equal parts, and the resulting 20 (sub)samples were given to the technician for measurement of concentrations. The subsamples were presented to the technician in random order, and the technician was not told (or could otherwise identify) which subsamples were derived from the same sample. The resulting duplicate measurements for the 10 samples, labelled as B1 and B2 corresponding to the order in which the technician carried out the measurements, are shown in the next table.

Sample11121314151617181920
B14.795.615.285.895.42 4.915.835.855.255.01
B24.805.555.036.115.32 5.046.025.775.395.22

The data, consisting of these measurements in four separate columns, are available in Minitab format as well as a comma-separated file, for import into Stata and other statistical software.

As mentioned above, you may either answer Questions 1-3 for the 10% version of the assignment or all Questions 1-5 for the 15% version of the assignment. You need to indicate clearly which version of the assignment you answer. Make sure to include explanations and justifications for all procedures and calculations used, including statement and validation (as well as possible from the data at hand) of the assumptions made.

  1. In order to assess whether the 20 samples were reasonably representative of samples typically measured at the laboratory, it was of interest to describe the distribution of measurements obtained from the samples. Use descriptive statistics to describe the distribution, and compute a 90% confidence interval for the mean concentration. (Hint: For this question, you need to decide which parts of the data to use - different valid options exist, and you should explain and justify your choice.)

  2. Carry out a statistical analysis to determine whether any systematic differences seem to be present in the mean concentrations obtained in the initial measurement and the subsequent duplicate measurement when carried out as described for group A. State relevant assumptions and hypotheses for your analysis, and draw conclusions.

  3. The main purpose of the study was to compare the two different scenarios for duplicate measurements in groups A and B. Carry out a similar analysis as above for group B, and compare with the results from group A. Irrespective of whether your analyses give similar or different results among the two groups, continue the analysis by directly comparing the differences between the first and second samples in the scenarios, either by a confidence interval or by a statistical test. Summarize all your analyses into conclusions about whether systematic mean differences seem to exist between first and second samples, both within and between the two groups/scenarios.

  4. Characterize the differences between the scenarios in terms of blinding (of measurements). It is sometimes claimed that duplicate measurements conducted by the same technician without blinding will tend to produce values that are unrealistically close, possibly because knowledge about the first measurement affects the second measurement. For the scenarios in groups A and B, calculate a statistic (estimate) to quantify the variability (or spread) between two duplicate measurements. Compare these estimates between the two scenarios and draw conclusions. (Hint: We have not discussed how to statistically compare such estimates between groups, so you are not expected to base your discussion on a statistical comparison; if you nevertheless decide to do so, you must describe the methods you use.)

  5. If the above claim that non-blinded duplicate measurements lead to unrealistically close values was true, what would this imply for the numerical (or absolute) deviations (or differences) between the first and second measurements, compared to blinded duplicate measurements? Construct suitable variable(s) to statistically assess whether the data show any evidence of such differences between the results obtained in groups A and B, conduct an analysis and draw conclusions. Make sure to pay attention to the assumptions behind the analysis, and if any concerns are noted, try to corroborate your conclusion by supplementary analysis.

Henrik Stryhn (hstryhn@upei.ca) 2022-10-26