The assignment is worth 10% of the final course mark.
The home assignment has six questions which should all be answered. Recall that home assignments must be answered by each student individually; collaboration is not allowed (limited discussion is probably inevitable, and therefore acceptable). For other practical details about the assignment, consult the guidelines for home assignments for VHM 801. Note also that your solution should include text identifying and explaining the procedures used even if the calculations were done using computer software. Generally, you should limit your analysis to the methods described in the chapters of the textbook covered by the course. Any analysis using additional (possibly more advanced) methods should be justified by explaining why the answer obtained from such an analysis could not be obtained by methods covered in the course.
The data for the assignment gives the number of diseased and non-diseased persons in a population subdivided by three variables: gender, age (in five age groups) and work type (dichotomized). The work type is understood to be an indicator of whether the person was engaged in a particular type of work considered to be a potential risk or protective factor. The dataset includes 30,000 persons classified according to these criteria and by whether they suffered from a particular disease. The background information about the data is sparse, and although the type of disease is indicated it is omitted here to avoid unnecessary speculation about biological relationships. It is not known what population the dataset represents, or how the dataset was obtained; the data may be artificial.
| Work type | Gender | Age<40 | Age 40-49 | Age 50-59 | Age 60-69 | Age>=70 | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Diseased | Diseased | Diseased | Diseased | Diseased | |||||||
| yes | no | yes | no | yes | no | yes | no | yes | no | ||
| 0 | male | 1 | 1589 | 12 | 2333 | 24 | 2763 | 53 | 2436 | 95 | 2286 |
| female | 1 | 1925 | 7 | 2670 | 15 | 2887 | 38 | 3107 | 63 | 2855 | |
| 1 | male | 2 | 1525 | 3 | 851 | 5 | 670 | 3 | 181 | 2 | 73 |
| female | 0 | 712 | 0 | 401 | 4 | 308 | 1 | 79 | 0 | 20 | |
A datafile is available (Stata format: version 9, version 10) for the data in the table. The variables are coded as follows: Gender (0=male; 1=female), Work type (0/1), Age (1=<40; 2=40-49; 3=50-59; 4=60-69; 5=70+), Disease (0=no; 1=yes).
Identify as well as possible the study type from the description given above; if you think some further
clarification is needed to ensure that the study is actually of the type under consideration,
explain and make appropriate assumptions. Assume for the rest of your work the study to be
of this particular type. Draw a diagram for the
causal structure you would hypothesize for how the three factors may be
related to disease. Note that the exposure of interest is the
work type.
Use the data to determine relevant measure(s) of association (risk difference, relative risk, or odds-ratio) between each of the three
factors and disease. If there is a choice between relevant measures of association, select
one of these as the primary focus of your work; motivate your choice. The focus here is on "crude" measures of
association so the other factors should be ignored for each of these calculations.
Supplement with (crude) assessments of the statistical significance of these
associations. (Hint: For this and some of the following questions, you will have to deal
in a sensible way
with the fact that age is categorized into five categories. In
some cases it may be appropriate to reduce the number
of categories, in other cases it may not be necessary or it may lead to
a substantial loss of information. Whenever you decide to change the age
categories, make sure to describe and motivate your decision.)
Carry out an epidemiological analysis to determine whether gender acts as a confounding
or effect modifying factor for the relation between work type and
disease. For this part, ignore any role of age on the relations of interest.
Carry out an epidemiological analysis to determine whether age acts as a confounding
or effect modifying factor for the relation between work type and
disease. For this part, ignore any role of gender on the relations of interest.
Continue your epidemiological analysis from Questions 3 and 4 by investigating the combined
effect of gender and age on the relation between work type and
disease. In particular, assess informally (without carrying out any statistical tests) whether
the combined effect of age and gender seems to involve an interaction between the two factors; that is,
whether the combined effect seems to substantially different from the
"sum" of the two separate effects. Give a measure of association to describe the relation
between work type and disease, and assess its statistical significance.
As a summary of your analyses in Questions 3-5, try to describe the
causal structure for the relation between work type and disease in terms of the classification
system introduced in Section 13.11 of VER.
As the last step of the epidemiological analysis of the data, explore the relationship
of age and gender with disease, and characterize the epidemiological roles of these two variables (relative
to each other). Give appropriate measures of association
and assess their statistical significance.
(Note: Your choice of analysis should reflect your conclusions about the
relation between work type and disease. Due to the lack of information about the study population
and the selection of subjects, it may be difficult to determine the appropriate causal relation
between age and gender. Therefore the relation may either be left undetermined or be argued
to be of a particular nature, with its consequences for the analysis.)