Follow this link to datasets (in Minitab (.mtw) and
comma-separated (.csv) file formats).
Follow this link to extra exercises (labeled as x:number) and this link to
AI discussion exercises.
This page contains links to solutions (either as text files, to be opened directly in a web browser or in Notepad (or similar), or as .pdf files, to be opened in a suitable reader, such as Adobe Acrobat) for selected exercises of VHM 801. The solutions have been compiled by the Biostats 801 course instructor, Henrik Stryhn.
For the CRP and retinol data, compute the Spearman rank correlation coefficient, and its statistical significance, both directly in the software and indirectly via the ranks. Compare your findings with those for the Pearson correlation coefficient. Additionally, explore how strongly the Spearman rank correlation is affected by the outlier in the golf scores data (Supplementary Exercises 2.2 and 10.7), and compare also here with your findings for the Pearson correlation.
| # | Topic | Suggested/Template questions |
|---|---|---|
| 1 | Choice of descriptive statistic | when to use a mean and when to use a median? (follow-up: if computing the median for ordinal categorical data, it may produce a nonsensical value?); what is a good descriptive statistic for spread of a distribution?; is the standard deviation appropriate for skewed distributions? |
| 2 | Outliers | what is an outlier?; are observations indicated by asterisks in a boxplot outliers? (follow-up: what do you mean, are they potential or actual outliers?); should outliers be removed from the data? |
| 3 | Association vs causation | can I conclude a causal effect from an observed effect?; can an experiment prove causation?; does association in a randomized controlled trial imply causation? |
| 4 | Probability (conditional) | A father of two children tells you one of his children is a boy. What is the probability the other child is a girl?; Suppose he then tells you the oldest child is a boy. What is the probability the younger child is a girl? (should these probabilities not be the same?) |
| 5 | Assessing normality | how to decide whether my data are normally distributed?; why not just use a normality test and go by p<0.05?; how to determine that a normal plot is straight enough?; which is the best normality test? |
| 6 | Versions of two-sample t-tests | is it better to use the t-test for paired or independent samples?; is it better to use the two-sample t-test with pooled or separate variances?; does the t-test with separate variances assume the variances to be unequal? |
| 7 | CIs for a proportion | how to compute a confidence interval for a proportion?; does this method always work, or are there conditions for its use?; in which sense is an exact binomial confidence interval exact?; is it not true that an exact binomial confidence interval is conservative, i.e. has theoretically too large coverage? |
| 8 | Chi-square tests | in statistics, what is a chi-square test?; is there not also a chi-square test for homogeneity?; does the chi-square test have assumptions?; must all expected frequencies be larger than 5? |
| 9 | anova (one-way) | in statistics, what is anova?; what assumptions are needed for anova? what robustness to assumptions does anova have?; is it necessary to test for equal variances? |