Assignment II for Biostats Course VHM 801 at AVC - Fall semester 2023

The assignment is worth 10% of the final course mark. Please be aware that by handing in the home assignment you implicitly acknowledge to have read and accepted the instructions for home assignments as described on the VHM 801 homepage.

The data for this assignment (Minitab format, comma-separated file) originate from a University Cancer Center. In a trial assessing the impact of drugs used in combination with chemotherapy for leukemia, one of the treatment groups (say, for treatment A) included 21 patients who had the following remission times (in weeks, obtained from weekly visits to the clinic):

4 8 11 1 15 23 11 2 2 4 3 2 5 22 17 8 12 8 12 8 5
The remission time is the time during which there are signs of a decrease in or disappearance of signs and symptoms of cancer (NCI Dictionary of Cancer Terms). In other words, the remission time ends when the cancer starts to grow again. The specific interest in these data could be to compare remission times in this group with other treatment groups, either in this trial or in previous trials. One particular other treatment (say B) has previously in large trials been shown to have a mean remission time of approximately 11.5 weeks. We also know that in previous trials, treatment A patients have had a standard deviation of remission times of approximately 6.2 weeks.

The home assignment has five questions which should all be answered. Generally speaking, for statistical inference you should always explicitly state and motivate the statistical model your analysis is based on. If there is a choice between different models, with different assumptions, you should choose the assumptions you find most reasonable in view of the information you have obtained about and from your data.

  1. Carry out a brief descriptive analysis to determine the main characteristics of the distribution of remission times for patients in this treatment group (A).

  2. If the standard deviation from previous trials is assumed to be valid also for the present trial, compute a 99% confidence interval for treatment A's mean remission time, and carry out a statistical test to compare its mean remission time with that treatment B.

  3. Same question as above (Question 2), if the standard deviation from previous trials is not considered valid for this trial. (Hint: If you find it useful to carry out the analysis on a transformed scale (such as by square-root or logarithmic transformation), your confidence interval (once backtransformed to original scale) will be for the median, and your comparison with treatment B will effectively be based on the median instead of the mean, and that is completely acceptable. In this situation, assume for simplicity that the median remission time for treatment B also equals approximately 13 weeks.)

  4. Same question as above (Question 3), for a comparison with another treatment group (say C) in the same trial with 25 patients, a sample mean (and median) of 11.5 weeks and (for simplicity) the same sample standard deviation as in the data for treatment group A (also on any transformed scales).

  5. Within the population of patients treated with chemotherapy and drugs, there could easily (for some treatments) be patients who were still in remission when the data collection was completed, or whose cancer remission was only observed up till a certain point in time where they for other reasons could no longer be followed (e.g., because they moved away from the Cancer Center or decided to no longer participate in the treatment program). Discuss whether such patients should be (i) included in the data (with the remission time computed from the time of their last visit to the Cancer Center), or (ii) excluded from the data. Describe the effect on the distribution of remission times and on the statistical inference for Question 3, if either action (i) or action (ii) had been chosen. Note that a valid description for one of the actions (i) or (ii) is sufficient to achieve a full score for this question, but that you are not asked to describe more advanced statistical methods (beyond the course) to account for such incompletely observed data.


Henrik Stryhn (hstryhn@upei.ca) 2023-10-12