Assignment I for Biostats Course VHM 801 at AVC - Fall semester 2022
The assignment is worth 10% of the final course mark. Please be aware that by handing
in the home assignment you implicitly acknowledge to have read and accepted
the instructions for home assignments as described
on the VHM 801 homepage.
This assignment is based on data collected on 443 cattle at the time that they entered a feedlot for
'fattening' prior to slaughter. The data consist of demographic
information plus readings obtained from an ultrasonic evaluation of the
animal. Ultrasound measurements of backfat thickness, ribeye area and
the percentage of intramuscular fat were obtained. The
objective of the study was to determine if ultrasound examination of the
animal at the time of entry into the feedlot was able to predict final
carcass grade (A, AA, or AAA, where AAA is the highest grade and A the lowest grade in terms of price).
Carcass grade depends primarily on the
amount of 'marbling' (intramuscular fat in the loin region) in the carcass at the time of slaughter. The data were
compiled from the 'beef_ultra' dataset of the textbook Dohoo
et al. (2009), Veterinary Epidemiologic Research, 2nd ed., by omitting some breed categories and farms only
sparsely represented in the original data, but including the same selection of variables:
- id: animal id (without any intrinsic meaning),
- farm: farm number (1-8),
- grade: carcass grade (A; AA; AAA),
- breed: breed (known or estimated; multiple values, whose meaning is unimportant here),
- sex: sex (0=female (heifer); 1=male (steer)),
- bckgrnd: animal backgrounded (0=no (weaned); 1=yes (backgrounded)),
- implant: hormone implant used (0=no; 1=yes),
- backfat: backfat thickness (in mm),
- ribeye: area of ribeye muscle (in square cm),
- imfat: intramuscular fat score (percent of area),
- days: period spent in feedlot before slaughter (days),
- carcwt: carcass weight (in kg).
Our use of these data for the home assignment is unrelated to its use both in the textbook
and in the paper: Keefe et al. (2004), Ultrasonic imaging of marbling at feedlot entry as a predictor of carcass quality grade,
Canadian Journal of Animal Science, 84, 165-170. However, the paper gives additional details
and background information about the data, even if the present description should suffice. The dataset is available
in Minitab format and as a comma-separated file, for
import into Stata and other statistical software
The home assignment has three questions which should all be answered.
- First, briefly describe the variable type of all the variables in the dataset,
e.g. using one or several of the descriptors: nominal, ordinal,
discrete, continuous. Next, select four variables in the dataset: two quantitative variables, one categorical variable (with more than
two categories), and one dichotomous (or binary) variable. Apart from this restriction on the variable types
you are free to select the variables as you want. Carry out a descriptive analysis of
your four selected variables including both a graphical representation and
descriptive statistics. Choose the graphical representation and the statistics you find most
useful to show each of the distributions,
in consideration of the variable's type and range of values. Where appropriate,
comment specifically on the distribution's center, spread and shape, as
well as potential outliers. If you note potential outliers, include also an assessment of
whether these should be considered truly outlying observations, in the sense that they
don't really belong to the distribution, or whether the values should be considered as part
of the distribution.
- For each of your selected two quantitative variables, examine further whether
it would seem reasonable to assume the data to be normally distributed.
Describe carefully the tools you use for this, and how you arrive at
your conclusions.
If you conclude that a variable is not normally distributed, describe how its distribution seems to differ
from a normal distribution. Explore also (briefly) whether the square-root or log-transformed (natural or base 10)
values can be approximated better by a normal distribution.
- For this last question we consider the variable implant and
its possible association with carcass grade and weight (some information about the
use of hormone growth implants can be found here).
Other studies have discussed how hormonal implants affect growth and carcass
grade. Let us pretend that we would want to use the present data to discuss or evaluate
this question.
First, should the present study be considered as
observational or an experiment? If you think the study is an experiment,
describe the experimental design and discuss how the randomization
of the use of hormonal implants might have
been done (or should have been done). Make sure to consider the role of
farms in the study design.
If you on the other hand think the study is
observational, use a diagram similar to those used in the course (Session 2) to
discuss whether any association between implant and carcass
characteristic(s) in these data may be considered as most likely a causal effect or
may have been caused by or influenced by some lurking
variable(s). Make any suggestions for lurking variables as
specific as possible. As you would not have the information or tools to
definitively conclude about causal effects, the focus in your answer
should be on the discussion and explanation of plausible scenarios.
Henrik Stryhn
(hstryhn@upei.ca) 2022-09-29