Assignment I for Biostats Course VHM 801 at AVC - Winter semester 2018
The assignment is worth 10% of the final course mark. Please be aware that by handing
in the home assignment you implicitly acknowledge to have read and accepted
the instructions for home assignments as described
on the VHM 801 homepage.
This assignment utilizes data on growth performance and abattoir findings of pigs
from a selection of Prince Edward Island, Canada farms. The data were
collected to study the inter-relationships among respiratory diseases
(atrophic rhinitis and enzootic pneumonia), ascarid (a parasitic nematode)
levels and daily
weight gain. Details about the study can be found in a set of papers from 1990
by Theresa Bernardo in the Canadian Journal of Veterinary Research. Here we provide
a brief overview of the variables.
Atrophic rhinitis score was determined by splitting the
snout and measuring the space ventral to the turbinates. An adjustment
to the score was made if the nasal septum was deviated. Lung scores were
recorded on a scale of 0 to 3 (negative to severe pneumonia). Parasite
burdens were evaluated using fecal egg counts, counts of adult worms in
the intestine and visual assessment of the liver for ascarid tracks.
Production data were recorded by monitoring the pigs on the farms of
origin from birth through to slaughter. For this assignment we consider the following variables:
- farm: farm identification number,
- pig: pig identification number,
- sex: sex of the pig (0=female; 1=castrate)
- dtm: days to market (i.e. from birth to slaughter),
- adg: average daily weight gain (g),
- mm: measurement of snout space (mm),
- ar: atrophic rhinitis score (0-5),
- lu: lung score for enzootic pneumonia (0=negative; 1=mild;
2=moderate; 3=severe),
- epg5: fecal gastrointestinal nematode egg count at time of
slaughter (eggs per 5 g),
- worms: count of nematodes in small intestine at time of slaughter,
- li: liver score (based on number of parasite-induced "white spots": 0=negative; 1=mild;
2=severe),
The dataset is available in Minitab format
and as a comma-separated file, for import into Stata and other statistical software.
Alternatively, the dataset is also accessible in Stata format from the VER website
as the pig_adg data (including a few extra variables).
The home assignment has five questions (a)-(e) which should all be answered.
- Characterize the study type (e.g., experimental or another type),
and describe the variable type for all the variables.
(Hint: you may use some of the following descriptors for
variables: categorical, nominal, ordinal, quantitative, discrete, continuous.)
- Select two variables in the dataset: one quantitative variable and one categorical variable;
apart from these restrictions
you are free to select the variables as you want. Carry out a
descriptive analysis for these variables, including both a graphical representation and
descriptive statistics. Choose the graphical representation and the statistics you find most
useful to show each distribution,
in consideration of the variable's type and range of values. Where appropriate,
comment specifically on the distribution's center, spread and shape.
If your descriptive analysis identifies any "suspected outliers", discuss
whether these should be considered as truly outlying observations, in the sense that they
do not really belong to the distribution, or whether they should be considered as part
of the distribution. (Hint: for this question and the next, you should explore
the distribution of values across all the sampled pigs without accounting for them
having been produced in specific farms.)
- The average daily gain was computed by dividing the total weight gain from birth to slaughter
by the lifetime of the pig (days to market). Compute the total weight gain, describe its distribution
(as outlined above) and further examine whether
it would seem reasonable to assume its values to be (approximately) normally distributed.
Describe carefully how you assess
the agreement of the variable with a normal distribution. Additionally, if you conclude that
the variable is not approximated well by a normal distribution, describe how its distribution
seems to differ from a normal distribution.
- Internet searches suggest that current standards for growth rates of pigs correspond to average daily
gains of at least 1.5 lbs per day. Compare the distribution of adg values in the data with this
reference value, and comment on any differences you note. (Hint: you are (probably) not an expert
in pig production, so phrase your comments in general terms about how
these data can be compared.)
- In this question you will explore whether distributions exhibit the same characteristics
when viewed for the entire sample of pigs or viewed within each farm. Select again one
quantitative and one categorical variable (possibly but not necessarily the same as above),
and use descriptive tools, ideally both graphical
and numerical, to describe the distributions within farms (as well as across farms, if you
did not do this already in previous questions).
For the description within farms, your focus should be on the
similarities and differences in the distributions across the 15 farms.
You may use statistical tools to assess the "significance" of
differences across the 15 farms, but as we have not covered such methods
in the course yet this is not a requirement and essentially not recommended; any methods that have not
been covered in the course during Sessions 1-5 need to be properly explained.
In summary, discuss for both variables whether the within-farm distributions
are similar across farms and hence represented well by the overall
distribution for all pigs sampled.
Henrik Stryhn
(hstryhn@upei.ca) 2018-01-30