Assignment I for Biostats Course VHM 801 at AVC - Fall semester 2020
The assignment is worth 15% of the final course mark. Please be aware that by handing
in the home assignment you implicitly acknowledge to have read and accepted
the instructions for home assignments as described
on the VHM 801 homepage.
This assignment is based on a small subset of the data collected for a study in
shellfish aquaculture in Prince Edward Island some years back. The
description of the context of the study refers to the time of the study, not the situation as
of today.
Background (the PEImussels website has more details)
On PEI, mussels are grown in socks placed vertically in the water
column along mussel longlines placed horizontally in the water surface in one of the many mussel-growing
areas (estuaries). Mussel seed (of size about one inch) is put into the socks (termed the 'socking' procedure) and
grows there during one or several growing seasons (April-November). Key
production parameters for the mussels are their number (because many of
the initially socked mussels will vanish), weight, length and condition
index (CI; a measure of the ratio between meat and shell weights).
In recent years, the mussel industry has been challenged by the emergence
of a new invasive species, tunicates. The tunicates, of which 4 distinct species have been
observed on PEI, have become important fouling organisms
for the production. They attach to the socks
and to the other equipment involved, to two effects. The first is that they may
compete for food with the mussels and may in worst case cause
loss of all mussels in a sock. The second is that because of their large numbers
and weight they increase the production costs substantially. The
mussel industry is actively involved in studies to find ways of dealing
with the tunicates. Key indicators of the fouling impact of the
tunicates are their number, length and weight.
Present study
A study was carried out to study the impact of socking time
(fall, spring) and three socking conditions (A, B, C) imposed at the time of socking. A total of 90 socks
were sampled (partially harvested; a 30 cm section of each sock) in October,
and for each sock the variables listed in the table
below were recorded. For both socking times and each of the three socking conditions 15 mussel socks were used.
The objective of the study was to determine whether socking times
and conditions had any impact on the mussel production and the severity of the fouling
by tunicates. The data were recorded at the sock level, meaning that the
focus is on average or total values of mussel and tunicate parameters per
sock.
- Sock: sock number (without any intrinsic meaning),
- Time: socking time (fall of the preceding year, spring of the sampling year),
- Condition: socking condition imposed at the time of socking (A, B, C)
- M Abundance: mussel abundance (number of mussels per sock section)
- M Weight: total mussel weight for the sock section (g)
- M Length: average mussel length (mm)
- M CI: average mussel condition index (%)
- T Abundance: tunicate abundance (number of tunicates > 5 mm per sock section)
- T Weight: total tunicate weight for the sock section (g)
- T Length: average tunicate length (mm) among tunicates > 5mm
The dataset is available in Minitab format and as a comma-separated file, for
import into Stata and other statistical software.
The home assignment has six questions which should all be answered.
- Characterize the study type (e.g., experimental or another type), and describe the variable
type for all the variables. (Hint: you may use some of the following descriptors for variables:
categorical, nominal, ordinal, quantitative, discrete, continuous.)
Irrespective of whether you consider the study as an experiment or not, briefly
characterize the study (as described above; it is not recommended to involve the details
about the layout of the last question) by the terminology used for
experiments, such as factor, treatment, block and replication.
- Select two of the outcomes in the dataset;
you may choose freely among the outcome variables. Carry out a descriptive analysis of
your two selected variables. For this question, you are required to disregard the information about
socking times and socking conditions, and consider all the values (for each variable) as a single sample. Your descriptive
analysis should include both a graphical representation and
descriptive statistics. Comment specifically on the distribution's center, spread and shape, as
well as any potential outliers. We will revisit the question of
potential outliers below, so you are not required to come up with a
definitive assessment of outliers here.
-
The main objective of the study was to compare the mussel production and tunicate infestation outcomes between
different socking times and socking conditions. Carry out a second descriptive analysis,
this time with the purpose of comparing the relevant groups of mussels/tunicates
(Hint: you should identify 6 groups.) with respect to the same two outcomes you analyzed previously.
Select descriptive statistics and graphical display(s) that you find useful to compare the groups.
Describe your findings and try to draw conclusions. Note that you are not expected
to compute any statistical tests to compare the distributions.
-
Continue your previous descriptive analysis by describing the groupwise distributions of your two selected
outcomes. As a precursor to formal statistical inference it is common to assess
whether it would seem reasonable to assume the data to be normally
distributed. Because the analysis of interest here intends to distinguish between the groups, the assessment needs
to be done separately for each group. Include such assessments for
normality in your descriptive analysis. Summarize the results from this and the previous question
into a short description of the distributions.
If you note
potential outliers, comment on whether these should be considered as truly outlying observations, in the sense that they
don't really belong to the distribution, or whether the values should be considered as part
of the distribution. If you conclude that a variable is not normally distributed, describe how its distribution seems to differ
from a normal distribution. Finally, comment briefly on whether some of your findings from the second question
seem to have been artifacts caused by pooling together data together across the groups.
-
Two possible sources of bias in the comparisons between groups could be
(i) differences in the mussel seed used for the different groups and (ii) differences in the
environmental conditions for socks of the different groups. For this
question, give recommendations for how the mussel seed should be
selected and distributed onto the groups. Beware that each sock contains hundreds
of seeds, so complex individual handling of the seeds will not
be feasible. Your aim for this question should be to describe general
principles for what to do (or maybe not to do), without going into description of complex procedures.
-
In this part, we will review the layout of the socks that were included in the study.
The study actually included two additional sampling times (June and August) for a
total of 270 socks. The task for you is to discuss how to carry out a randomization of the
relevant steps in the study layout, as described below.
- First, to facilitate the handling the 270 socks were positioned in groups containing 5 identical socks
(i.e. identical in every respect).
Therefore, for the purpose of outlining the layout we focus on the
270/5=54 different sock groups. Your answers should also focus on the 54
sock groups.
- Second, these 54 sock groups were held on 3 different longlines (18
sock groups per longline).
- Third, each longline was divided into two halves (say east and west, for a longline situated in
the east-west direction), and one half line was used for the fall socking and
the other half line for the spring socking.
- Fourth, on each half line 9 sock
locations were identified for the sock groups, and these 9 locations
were divided into 3 blocks of size 3. Specifically, on every
half longline the first 3 sock locations (when traversing the
longline from one end to the other) formed one block, the next 3
sock locations formed a second block, and the last 3 sock locations
formed the third block. The 3 socking conditions should be
distributed within each of the blocks.
Discuss how randomization can be built into the last two steps above, and describe how you would carry out the
actual randomization. You may either use Minitab (or other statistical software) or a table of random digits. Make sure
to explain how your randomization could be reproduced. (Hint: It
is strongly recommended that you draw a sketch of the study layout, corresponding to the description above, although you are not required to
include such a sketch with your answers.)
Henrik Stryhn
(hstryhn@upei.ca) 2020-10-01