Assignment I for Biostats Course VHM 801 at AVC - Fall semester 2023

The assignment is worth 10% of the final course mark. Please be aware that by handing in the home assignment you implicitly acknowledge to have read and accepted the instructions for home assignments as described on the VHM 801 homepage.

This assignment is based on a subset of data that were collected in a oyster field trial in the Richibouctou River (Eastern New Brunswick, Canada) from May to October 2009. The trial was conducted by Denise Méthé from DFO in Moncton, and the research formed part of her PhD project at AVC. The objective of the research was to evaluate the effects of the location (site) and source of juvenile oysters on their growth during one growth season. The trial involved two sites and two sources of oysters, in both cases characterized by either a low salinity or high salinity environment. It is well-known that the salinity of the water is an important health and growth parameter for oysters, and the specific hypothesis investigated was whether movement of oysters from a low salinity to a high salinity environment, or vice versa, had any beneficial growth effects. At each of the two study sites oysters from two sources were deposited in May into several cages in the river after physical measurements were obtained. In October, the cages were retrieved from the water, and new physical measurements were obtained. The following variables are included in the data:

Each row in the data set gives values obtained on one oyster. For reasons of data confidentiality, only 240 oysters were included in the data file. The dataset is available in Minitab format and as a comma-separated file, for import into Stata and other statistical software.

The home assignment has four questions (a-d) which should all be answered.

  1. Select three variables in the dataset: two continuous variables containing values of the same parameter at the beginning and end of the study (e.g., length1 and length2) and one categorical variable (possibly dichotomous, i.e., with only two categories). Apart from these restrictions you are free to select the variables as you want. Carry out a descriptive analysis of each of your three selected variables, including both a graphical representation and descriptive statistics. Choose the graphical representation and the statistics you find most useful to show each distribution, in consideration of the variable's type and range of values. Where appropriate, comment specifically on the distribution's center, spread and shape. If your descriptive analysis identifies any "suspected outliers", discuss whether these should be considered as truly outlying observations, in the sense that they don't really belong to the distribution, or whether they should be considered as part of the distribution. (Note: for simplicity (and not as a realistic approach in practice!), for each descriptive analysis you are here expected to ignore the values of all other variables, except that your discussion of suspected outliers may involve both the first and second measurement of your continuous variable.)

  2. The main objective of the study was to compare growth between oysters of different origins and grown at different sites. Compute for each oyster its growth, in terms of the continuous variable you focused on for (a), through the study period. Carry out a second descriptive analysis, this time with the purpose of comparing the treatments (Hint: you should identify 4 treatments, disregarding the cages) with respect to your growth variable. Select descriptive statistics and graphical display(s) that you find useful to illustrate differences between treatments in the distributions of your growth variable. Describe your findings and try to draw conclusions. Note that you are not expected to compute any statistical tests to compare the distributions.

  3. A common preliminary step of a formal statistical analysis to compare the distributions of a variable between treatments, consists in assessing whether it would seem reasonable to assume the values of this variable to be normally distributed. As the analysis intends to distinguish between the treatments, the assessment needs to be done separately for each treatment group. Carry out such an assessment of normality for your growth variable from (b) and the 4 treatment groups. If you conclude that your variable, for one or several of the treatments, is not normally distributed, describe how its distribution seems to differ from a normal distribution.

  4. This last part of the assignment discusses the randomization procedure for the trial. For simplicity, we will focus on certain aspects of the randomization instead of illustrating a full randomization.
    1. Consider first randomization at one of the two sites. Several cages are available to hold the oysters, which will be from two different sources. Do you think it is better to fill each cage with oysters from one source only or to mix the two sources in each cage? Explain your answer.
    2. Irrespective of your answer to (i), assume now that oysters from the two sources are to be mixed within a cage. The cage holds 32 oysters, laid out in a rectangular format with 4 rows and 8 columns. Demonstrate a randomization of the 32 oysters to the cage. (You may either use Minitab or a table of random numbers.) Think about, and discuss, whether special consideration should be given to the layout in rows and columns, and if so how that might be done. (Note: several sensible options exist, so the discussion is more important than a single "correct" answer.)
    3. Finally, describe in general terms how oysters can be randomly allocated to the two study sites. You don't need to illustrate a specific randomization scheme, but your description and discussion should cover the main principles and potential difficulties involved.

Henrik Stryhn (hstryhn@upei.ca) 2023-09-27