Assignment II for Biostats Course VHM 801 at AVC - Fall semester 2023
The assignment is worth 10% of the final course mark. Please be aware that by handing
in the home assignment you implicitly acknowledge to have read and accepted
the instructions for home assignments as described
on the VHM 801 homepage.
The data for this assignment (Minitab format, comma-separated file) originate from a University Cancer Center.
In a trial assessing the impact of drugs used in combination with chemotherapy for leukemia,
one of the treatment groups (say, for treatment A) included 21 patients who had the following
remission times (in weeks, obtained from weekly visits to the clinic):
4 8 11 1 15 23 11 2 2 4 3 2 5 22 17 8 12 8 12 8 5
The remission time is the time during which there are signs of
a decrease in or disappearance of signs and symptoms of cancer
(NCI Dictionary of Cancer Terms).
In other words, the remission time ends when the cancer starts to grow again.
The specific interest in these data could be to compare remission times
in this group with other treatment groups, either in this trial or in previous trials.
One particular other treatment (say B) has previously in large trials been shown to have a mean remission time of
approximately 11.5 weeks.
We also know that in previous trials, treatment A patients
have had a standard deviation of remission times of approximately 6.2 weeks.
The home assignment has five questions which should all be answered.
Generally speaking, for statistical inference you should always explicitly state and motivate the
statistical model your analysis is based on. If there is a choice between different models,
with different assumptions, you should choose the assumptions you
find most reasonable in view of the information you have obtained about and from
your data.
- Carry out a brief descriptive analysis to determine the main
characteristics of the distribution of remission times for patients in this treatment group (A).
- If the standard deviation from previous trials is assumed to be
valid also for the present trial, compute a 99% confidence interval for treatment A's mean remission
time, and carry out a statistical test to compare its mean remission time with
that treatment B.
- Same question as above (Question 2), if the standard deviation from previous trials is not considered
valid for this trial. (Hint: If you find it useful to carry out the analysis on a transformed scale
(such as by square-root or logarithmic transformation), your confidence interval (once backtransformed to
original scale) will be for the median, and your comparison with treatment B
will effectively be based on the median instead of the mean, and that is completely acceptable.
In this situation, assume for simplicity that the
median remission time for treatment B also equals approximately 13 weeks.)
- Same question as above (Question 3), for a comparison with another treatment group (say C) in the same
trial with 25 patients, a sample mean (and median) of 11.5 weeks and (for simplicity) the same
sample standard deviation as in the data for treatment group A (also on any transformed scales).
- Within the population of patients treated with chemotherapy and drugs, there could
easily (for some treatments) be patients who were still in remission when the data collection was completed, or whose
cancer remission was only observed up till a certain point in time
where they for other reasons could no longer be followed (e.g., because they moved away from the Cancer
Center or decided to no longer participate in the treatment program).
Discuss whether such patients should be (i) included in the data (with the
remission time computed from the time of their last visit to the Cancer
Center), or (ii) excluded from the data. Describe the effect on the distribution of
remission times and on the statistical inference for Question 3, if either action (i)
or action (ii) had been chosen. Note that a valid description for one of the actions
(i) or (ii) is sufficient to achieve a full score for this question, but that you are not asked
to describe more advanced statistical methods (beyond the course) to account for such incompletely observed data.
Henrik Stryhn
(hstryhn@upei.ca) 2023-10-12