Assignment I for Biostats Course VHM 801 at AVC - Winter semester 2018

The assignment is worth 10% of the final course mark. Please be aware that by handing in the home assignment you implicitly acknowledge to have read and accepted the instructions for home assignments as described on the VHM 801 homepage.

This assignment utilizes data on growth performance and abattoir findings of pigs from a selection of Prince Edward Island, Canada farms. The data were collected to study the inter-relationships among respiratory diseases (atrophic rhinitis and enzootic pneumonia), ascarid (a parasitic nematode) levels and daily weight gain. Details about the study can be found in a set of papers from 1990 by Theresa Bernardo in the Canadian Journal of Veterinary Research. Here we provide a brief overview of the variables.

Atrophic rhinitis score was determined by splitting the snout and measuring the space ventral to the turbinates. An adjustment to the score was made if the nasal septum was deviated. Lung scores were recorded on a scale of 0 to 3 (negative to severe pneumonia). Parasite burdens were evaluated using fecal egg counts, counts of adult worms in the intestine and visual assessment of the liver for ascarid tracks. Production data were recorded by monitoring the pigs on the farms of origin from birth through to slaughter. For this assignment we consider the following variables:

The dataset is available in Minitab format and as a comma-separated file, for import into Stata and other statistical software. Alternatively, the dataset is also accessible in Stata format from the VER website as the pig_adg data (including a few extra variables).

The home assignment has five questions (a)-(e) which should all be answered.

  1. Characterize the study type (e.g., experimental or another type), and describe the variable type for all the variables. (Hint: you may use some of the following descriptors for variables: categorical, nominal, ordinal, quantitative, discrete, continuous.)

  2. Select two variables in the dataset: one quantitative variable and one categorical variable; apart from these restrictions you are free to select the variables as you want. Carry out a descriptive analysis for these variables, including both a graphical representation and descriptive statistics. Choose the graphical representation and the statistics you find most useful to show each distribution, in consideration of the variable's type and range of values. Where appropriate, comment specifically on the distribution's center, spread and shape. If your descriptive analysis identifies any "suspected outliers", discuss whether these should be considered as truly outlying observations, in the sense that they do not really belong to the distribution, or whether they should be considered as part of the distribution. (Hint: for this question and the next, you should explore the distribution of values across all the sampled pigs without accounting for them having been produced in specific farms.)

  3. The average daily gain was computed by dividing the total weight gain from birth to slaughter by the lifetime of the pig (days to market). Compute the total weight gain, describe its distribution (as outlined above) and further examine whether it would seem reasonable to assume its values to be (approximately) normally distributed. Describe carefully how you assess the agreement of the variable with a normal distribution. Additionally, if you conclude that the variable is not approximated well by a normal distribution, describe how its distribution seems to differ from a normal distribution.

  4. Internet searches suggest that current standards for growth rates of pigs correspond to average daily gains of at least 1.5 lbs per day. Compare the distribution of adg values in the data with this reference value, and comment on any differences you note. (Hint: you are (probably) not an expert in pig production, so phrase your comments in general terms about how these data can be compared.)

  5. In this question you will explore whether distributions exhibit the same characteristics when viewed for the entire sample of pigs or viewed within each farm. Select again one quantitative and one categorical variable (possibly but not necessarily the same as above), and use descriptive tools, ideally both graphical and numerical, to describe the distributions within farms (as well as across farms, if you did not do this already in previous questions). For the description within farms, your focus should be on the similarities and differences in the distributions across the 15 farms. You may use statistical tools to assess the "significance" of differences across the 15 farms, but as we have not covered such methods in the course yet this is not a requirement and essentially not recommended; any methods that have not been covered in the course during Sessions 1-5 need to be properly explained. In summary, discuss for both variables whether the within-farm distributions are similar across farms and hence represented well by the overall distribution for all pigs sampled.

Henrik Stryhn (hstryhn@upei.ca) 2018-01-30