Assignment I for Biostats Course VHM 801 at AVC - Fall semester 2020

The assignment is worth 15% of the final course mark. Please be aware that by handing in the home assignment you implicitly acknowledge to have read and accepted the instructions for home assignments as described on the VHM 801 homepage.

This assignment is based on a small subset of the data collected for a study in shellfish aquaculture in Prince Edward Island some years back. The description of the context of the study refers to the time of the study, not the situation as of today.

Background (the PEImussels website has more details)
On PEI, mussels are grown in socks placed vertically in the water column along mussel longlines placed horizontally in the water surface in one of the many mussel-growing areas (estuaries). Mussel seed (of size about one inch) is put into the socks (termed the 'socking' procedure) and grows there during one or several growing seasons (April-November). Key production parameters for the mussels are their number (because many of the initially socked mussels will vanish), weight, length and condition index (CI; a measure of the ratio between meat and shell weights).

In recent years, the mussel industry has been challenged by the emergence of a new invasive species, tunicates. The tunicates, of which 4 distinct species have been observed on PEI, have become important fouling organisms for the production. They attach to the socks and to the other equipment involved, to two effects. The first is that they may compete for food with the mussels and may in worst case cause loss of all mussels in a sock. The second is that because of their large numbers and weight they increase the production costs substantially. The mussel industry is actively involved in studies to find ways of dealing with the tunicates. Key indicators of the fouling impact of the tunicates are their number, length and weight.

Present study
A study was carried out to study the impact of socking time (fall, spring) and three socking conditions (A, B, C) imposed at the time of socking. A total of 90 socks were sampled (partially harvested; a 30 cm section of each sock) in October, and for each sock the variables listed in the table below were recorded. For both socking times and each of the three socking conditions 15 mussel socks were used. The objective of the study was to determine whether socking times and conditions had any impact on the mussel production and the severity of the fouling by tunicates. The data were recorded at the sock level, meaning that the focus is on average or total values of mussel and tunicate parameters per sock.

The dataset is available in Minitab format and as a comma-separated file, for import into Stata and other statistical software.

The home assignment has six questions which should all be answered.

  1. Characterize the study type (e.g., experimental or another type), and describe the variable type for all the variables. (Hint: you may use some of the following descriptors for variables: categorical, nominal, ordinal, quantitative, discrete, continuous.) Irrespective of whether you consider the study as an experiment or not, briefly characterize the study (as described above; it is not recommended to involve the details about the layout of the last question) by the terminology used for experiments, such as factor, treatment, block and replication.

  2. Select two of the outcomes in the dataset; you may choose freely among the outcome variables. Carry out a descriptive analysis of your two selected variables. For this question, you are required to disregard the information about socking times and socking conditions, and consider all the values (for each variable) as a single sample. Your descriptive analysis should include both a graphical representation and descriptive statistics. Comment specifically on the distribution's center, spread and shape, as well as any potential outliers. We will revisit the question of potential outliers below, so you are not required to come up with a definitive assessment of outliers here.

  3. The main objective of the study was to compare the mussel production and tunicate infestation outcomes between different socking times and socking conditions. Carry out a second descriptive analysis, this time with the purpose of comparing the relevant groups of mussels/tunicates (Hint: you should identify 6 groups.) with respect to the same two outcomes you analyzed previously. Select descriptive statistics and graphical display(s) that you find useful to compare the groups. Describe your findings and try to draw conclusions. Note that you are not expected to compute any statistical tests to compare the distributions.

  4. Continue your previous descriptive analysis by describing the groupwise distributions of your two selected outcomes. As a precursor to formal statistical inference it is common to assess whether it would seem reasonable to assume the data to be normally distributed. Because the analysis of interest here intends to distinguish between the groups, the assessment needs to be done separately for each group. Include such assessments for normality in your descriptive analysis. Summarize the results from this and the previous question into a short description of the distributions.
    If you note potential outliers, comment on whether these should be considered as truly outlying observations, in the sense that they don't really belong to the distribution, or whether the values should be considered as part of the distribution. If you conclude that a variable is not normally distributed, describe how its distribution seems to differ from a normal distribution. Finally, comment briefly on whether some of your findings from the second question seem to have been artifacts caused by pooling together data together across the groups.

  5. Two possible sources of bias in the comparisons between groups could be (i) differences in the mussel seed used for the different groups and (ii) differences in the environmental conditions for socks of the different groups. For this question, give recommendations for how the mussel seed should be selected and distributed onto the groups. Beware that each sock contains hundreds of seeds, so complex individual handling of the seeds will not be feasible. Your aim for this question should be to describe general principles for what to do (or maybe not to do), without going into description of complex procedures.

  6. In this part, we will review the layout of the socks that were included in the study. The study actually included two additional sampling times (June and August) for a total of 270 socks. The task for you is to discuss how to carry out a randomization of the relevant steps in the study layout, as described below.

    Discuss how randomization can be built into the last two steps above, and describe how you would carry out the actual randomization. You may either use Minitab (or other statistical software) or a table of random digits. Make sure to explain how your randomization could be reproduced. (Hint: It is strongly recommended that you draw a sketch of the study layout, corresponding to the description above, although you are not required to include such a sketch with your answers.)


Henrik Stryhn (hstryhn@upei.ca) 2020-10-01