Assignment I for Biostats Course VHM 801 at AVC - Fall semester 2023
The assignment is worth 10% of the final course mark. Please be aware that by handing
in the home assignment you implicitly acknowledge to have read and accepted
the instructions for home assignments as described
on the VHM 801 homepage.
This assignment is based on a subset of data that were collected in a oyster
field trial in the Richibouctou River (Eastern New Brunswick, Canada) from May to October 2009.
The trial was conducted by Denise Méthé from DFO in Moncton, and
the research formed part of her PhD project at AVC.
The objective of the research was to evaluate the effects of the location (site)
and source of juvenile oysters on their growth during one growth season. The trial
involved two sites and two sources of oysters, in both cases characterized
by either a low salinity or high salinity environment. It is well-known that the salinity
of the water is an important health and growth parameter for oysters, and the specific hypothesis
investigated was whether movement of oysters from a low salinity to a high salinity environment, or vice versa,
had any beneficial growth effects. At each of the two study sites
oysters from two sources were deposited in May into several cages in the river
after physical measurements were obtained. In October,
the cages were retrieved from the water, and new physical measurements
were obtained. The following variables are included in the
data:
- cage: cage id (with no intrinsic meaning),
- site: oyster growth site (1=high salinity, 2=low salinity)
- source: source of oyster (1=high salinity, 2=low salinity)
- length1: length in May (mm),
- width1: width in May (mm),
- height1: height in May (mm),
- weight1: weight in May (g),
- length2: length in October (mm),
- width2: width in October (mm),
- height2: height in October (mm),
- weight2: weight in October (g).
Each row in the data set gives values obtained on one oyster.
For reasons of data confidentiality, only 240 oysters were included in
the data file.
The dataset is available in Minitab format and as a comma-separated file, for
import into Stata and other statistical software.
The home assignment has four questions (a-d) which should all be answered.
- Select three variables in the dataset: two continuous variables containing values of the same parameter
at the beginning and end of the study (e.g., length1 and length2) and one categorical variable (possibly dichotomous,
i.e., with only two categories). Apart from these restrictions
you are free to select the variables as you want. Carry out a
descriptive analysis of each of
your three selected variables, including both a graphical representation and
descriptive statistics. Choose the graphical representation and the statistics you find most
useful to show each distribution,
in consideration of the variable's type and range of values. Where appropriate,
comment specifically on the distribution's center, spread and shape.
If your descriptive analysis identifies any "suspected outliers", discuss
whether these should be considered as truly outlying observations, in the sense that they
don't really belong to the distribution, or whether they should be considered as part
of the distribution. (Note: for simplicity (and not as a
realistic approach in practice!), for each descriptive analysis you are here expected to ignore the values
of all other variables, except that your discussion of suspected outliers may involve both the first and
second measurement of your continuous variable.)
- The main objective of the study was to compare growth between
oysters of different origins and grown at different sites.
Compute for each oyster its growth, in terms of the continuous variable you focused on for (a),
through the study period. Carry out a second descriptive analysis, this time with the purpose of
comparing the treatments (Hint: you should identify 4 treatments, disregarding the cages) with respect to your growth variable. Select
descriptive statistics and graphical display(s) that you find useful to
illustrate differences between treatments in the distributions of your
growth variable. Describe your findings and try to draw conclusions. Note that you are not expected to
compute any statistical tests to compare the distributions.
- A common preliminary step of a formal statistical analysis to compare the distributions of a variable
between treatments, consists in assessing whether it would
seem reasonable to assume the values of this variable to be normally
distributed. As the analysis intends to distinguish between the treatments, the assessment needs
to be done separately for each treatment group. Carry out such an assessment of normality for your growth
variable from (b) and the 4 treatment groups. If you conclude that your variable, for one or several of the treatments,
is not normally distributed, describe how its distribution seems to differ
from a normal distribution.
- This last part of the assignment discusses the randomization procedure for the trial.
For simplicity, we will focus on certain aspects of the randomization instead of illustrating a full
randomization.
- Consider first randomization at one of the two sites. Several cages are available to hold the oysters,
which will be from two different sources.
Do you think it is better to fill each cage with oysters from one source only or to mix the two sources
in each cage? Explain your answer.
- Irrespective of your answer to (i), assume now that oysters from the two sources
are to be mixed within a cage. The cage holds 32 oysters, laid out in a rectangular format with
4 rows and 8 columns. Demonstrate a randomization of the 32 oysters to the cage. (You may either use
Minitab or a table of random numbers.) Think about, and discuss, whether special consideration should be given to the
layout in rows and columns, and if so how that might be done. (Note: several sensible options
exist, so the discussion is more important than a single "correct" answer.)
- Finally, describe in general terms how oysters can be randomly allocated to the two study sites.
You don't need to illustrate a specific randomization scheme, but your
description and discussion should cover the main principles and potential difficulties involved.
Henrik Stryhn
(hstryhn@upei.ca) 2023-09-27