. * do-file for lecture 13 of VHM 802, Winter 2021 . version 16 /* works also with versions 14-15 */ . set more off . cd "r:\" r:\ . . * classification by logistic regression . import delimited "r:\sparrow", clear varnames(1) (6 vars, 49 obs) . encode survivorship, gen(surv) . replace surv=surv-1 (49 real changes made) . label values surv /* drop value label */ . logit surv total_length-l_keel_sternum Iteration 0: log likelihood = -33.462497 Iteration 1: log likelihood = -32.041086 Iteration 2: log likelihood = -32.035688 Iteration 3: log likelihood = -32.035688 Logistic regression Number of obs = 49 LR chi2(5) = 2.85 Prob > chi2 = 0.7225 Log likelihood = -32.035688 Pseudo R2 = 0.0426 -------------------------------------------------------------------------------- surv | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- total_length | -.1625675 .1396369 -1.16 0.244 -.4362508 .1111159 alar_extent | -.0276413 .1060235 -0.26 0.794 -.2354436 .1801609 l_beak_head | -.0837496 .628623 -0.13 0.894 -1.315828 1.148329 l_humerous | 1.061744 1.023129 1.04 0.299 -.9435529 3.067041 l_keel_sternum | .0715755 .4166297 0.17 0.864 -.7450037 .8881547 _cons | 13.58231 15.86496 0.86 0.392 -17.51244 44.67706 -------------------------------------------------------------------------------- . estat classification Logistic model for surv -------- True -------- Classified | D ~D | Total -----------+--------------------------+----------- + | 7 4 | 11 - | 14 24 | 38 -----------+--------------------------+----------- Total | 21 28 | 49 Classified + if predicted Pr(D) >= .5 True D defined as surv != 0 -------------------------------------------------- Sensitivity Pr( +| D) 33.33% Specificity Pr( -|~D) 85.71% Positive predictive value Pr( D| +) 63.64% Negative predictive value Pr(~D| -) 63.16% -------------------------------------------------- False + rate for true ~D Pr( +|~D) 14.29% False - rate for true D Pr( -| D) 66.67% False + rate for classified + Pr(~D| +) 36.36% False - rate for classified - Pr( D| -) 36.84% -------------------------------------------------- Correctly classified 63.27% -------------------------------------------------- . lsens . lroc Logistic model for surv number of observations = 49 area under ROC curve = 0.6633 . predict prob1, p . generate highp01=prob1>21/49 . tab surv highp01 /* classification at the data proportion cut-off */ Survivorsh | highp01 ip | 0 1 | Total -----------+----------------------+---------- 0 | 18 10 | 28 1 | 8 13 | 21 -----------+----------------------+---------- Total | 26 23 | 49 . * now using discrim command: same results . discrim logistic total_length-l_keel_sternum, group(surv) prior(proportional) Iteration 0: log likelihood = -33.462497 Iteration 1: log likelihood = -32.041086 Iteration 2: log likelihood = -32.035688 Iteration 3: log likelihood = -32.035688 Logistic discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True surv | 0 1 | Total -------------+----------------+------- 0 | 24 4 | 28 | 85.71 14.29 | 100.00 | | 1 | 14 7 | 21 | 66.67 33.33 | 100.00 -------------+----------------+------- Total | 38 11 | 49 | 77.55 22.45 | 100.00 | | Priors | 0.5714 0.4286 | . predict prob0, pr . predict gr01, classification . * new: classification from equal priors instead of data priors . estat classtable, priors(equal) Resubstitution classification table +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True surv | 0 1 | Total -------------+----------------+------- 0 | 18 10 | 28 | 64.29 35.71 | 100.00 | | 1 | 8 13 | 21 | 38.10 61.90 | 100.00 -------------+----------------+------- Total | 26 23 | 49 | 53.06 46.94 | 100.00 | | Priors | 0.5000 0.5000 | . predict prob0eq, pr priors(equal) . * leave-one-out cross-validation . * this does not work, although it should??: estat classtable, loo . . * manual coding of leave-one-out cross-validation . capture drop id . gen id=_n . sort id . capture drop rowno . capture drop cvp1 . generate rowno=_n . generate cvp1=. (49 missing values generated) . quietly { . generate highcvp1=cvp1>0.5 . tab surv highcvp1 /* crossvalidation classification table */ Survivorsh | highcvp1 ip | 0 1 | Total -----------+----------------------+---------- 0 | 21 7 | 28 1 | 16 5 | 21 -----------+----------------------+---------- Total | 37 12 | 49 . . * linear discriminant analysis (LDA) . * for illustration with two categories and predictors: total_length l_humerous . logit surv total_length l_humerous Iteration 0: log likelihood = -33.462497 Iteration 1: log likelihood = -32.10122 Iteration 2: log likelihood = -32.095449 Iteration 3: log likelihood = -32.095448 Logistic regression Number of obs = 49 LR chi2(2) = 2.73 Prob > chi2 = 0.2549 Log likelihood = -32.095448 Pseudo R2 = 0.0409 ------------------------------------------------------------------------------ surv | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- total_length | -.1770048 .1137912 -1.56 0.120 -.4000315 .0460219 l_humerous | .922332 .7219004 1.28 0.201 -.4925668 2.337231 _cons | 10.62317 13.16035 0.81 0.420 -15.17064 36.41698 ------------------------------------------------------------------------------ . discrim lda total_length l_humerous, group(surv) Linear discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True surv | 0 1 | Total -------------+----------------+------- 0 | 19 9 | 28 | 67.86 32.14 | 100.00 | | 1 | 7 14 | 21 | 33.33 66.67 | 100.00 -------------+----------------+------- Total | 26 23 | 49 | 53.06 46.94 | 100.00 | | Priors | 0.5000 0.5000 | . estat list +-----------------------------------------+ | | Classification | Probabilities | | | | | | Obs.| True Class. | 0 1 | |-----+------------------+----------------| | 1 | 0 1 * | 0.2580 0.7420 | | 2 | 0 1 * | 0.2809 0.7191 | | 3 | 1 1 | 0.3178 0.6822 | | 4 | 1 1 | 0.3742 0.6258 | | 5 | 1 1 | 0.3571 0.6429 | |-----+------------------+----------------| | 6 | 1 1 | 0.3309 0.6691 | | 7 | 0 1 * | 0.3742 0.6258 | | 8 | 1 1 | 0.3539 0.6461 | | 9 | 1 1 | 0.4325 0.5675 | | 10 | 0 1 * | 0.3742 0.6258 | |-----+------------------+----------------| | 11 | 1 1 | 0.3539 0.6461 | | 12 | 1 1 | 0.4761 0.5239 | | 13 | 1 1 | 0.4490 0.5510 | | 14 | 1 1 | 0.4145 0.5855 | | 15 | 1 1 | 0.4779 0.5221 | |-----+------------------+----------------| | 16 | 0 0 | 0.5077 0.4923 | | 17 | 1 1 | 0.4631 0.5369 | | 18 | 0 1 * | 0.4797 0.5203 | | 19 | 1 1 | 0.4613 0.5387 | | 20 | 0 1 * | 0.4797 0.5203 | |-----+------------------+----------------| | 21 | 0 1 * | 0.4893 0.5107 | | 22 | 0 1 * | 0.4928 0.5072 | | 23 | 0 1 * | 0.4815 0.5185 | | 24 | 1 1 | 0.4981 0.5019 | | 25 | 0 0 | 0.5088 0.4912 | |-----+------------------+----------------| | 26 | 0 0 | 0.5166 0.4834 | | 27 | 1 0 * | 0.5237 0.4763 | | 28 | 0 0 | 0.5307 0.4693 | | 29 | 0 0 | 0.5508 0.4492 | | 30 | 0 0 | 0.5034 0.4966 | |-----+------------------+----------------| | 31 | 0 0 | 0.5166 0.4834 | | 32 | 1 0 * | 0.5438 0.4562 | | 33 | 0 0 | 0.5473 0.4527 | | 34 | 0 0 | 0.5620 0.4380 | | 35 | 0 0 | 0.5403 0.4597 | |-----+------------------+----------------| | 36 | 0 0 | 0.5350 0.4650 | | 37 | 0 0 | 0.5350 0.4650 | | 38 | 1 0 * | 0.5603 0.4397 | | 39 | 0 0 | 0.5997 0.4003 | | 40 | 0 0 | 0.6048 0.3952 | |-----+------------------+----------------| | 41 | 1 0 * | 0.5980 0.4020 | | 42 | 1 0 * | 0.6460 0.3540 | | 43 | 1 0 * | 0.6173 0.3827 | | 44 | 0 0 | 0.6611 0.3389 | | 45 | 0 0 | 0.6256 0.3744 | |-----+------------------+----------------| | 46 | 1 0 * | 0.6790 0.3210 | | 47 | 0 0 | 0.7518 0.2482 | | 48 | 0 0 | 0.6774 0.3226 | | 49 | 0 0 | 0.7666 0.2334 | +-----------------------------------------+ * indicates misclassified observations . estat loadings, unstandardized Canonical discriminant function coefficients | function1 -------------+----------- total_length | -.3567887 l_humerous | 1.858881 _cons | 22.03294 . generate fitlda=(22.03294+1.858881*l_humerous)/0.3567887 . twoway (scatter total_length l_humerous if surv==0, msymbol(smx) jitter(2)) (scatter total_length l_humerous if surv==1, msymbol(smci > rcle_hollow) jitter(2)) (line fitlda l_humerous, sort), xtitle (l_humerous) ytitle(total_length) title(LDA separation for uniform pri > or (x=NS, o=S (surv=1)), size(medium)) legend(off) . estat classtable, prior(proportion) Resubstitution classification table +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True surv | 0 1 | Total -------------+----------------+------- 0 | 24 4 | 28 | 85.71 14.29 | 100.00 | | 1 | 14 7 | 21 | 66.67 33.33 | 100.00 -------------+----------------+------- Total | 38 11 | 49 | 77.55 22.45 | 100.00 | | Priors | 0.5714 0.4286 | . estat classtable, prior(proportion) loo Leave-one-out classification table +---------+ | Key | |---------| | Number | | Percent | +---------+ | LOO Classified True surv | 0 1 | Total -------------+----------------+------- 0 | 24 4 | 28 | 85.71 14.29 | 100.00 | | 1 | 14 7 | 21 | 66.67 33.33 | 100.00 -------------+----------------+------- Total | 38 11 | 49 | 77.55 22.45 | 100.00 | | Priors | 0.5714 0.4286 | . * code for logistic cross-validation not shown (again) . . * multinomial logistic regression for 3 categories . use "r:\beef_ultra", clear . mlogit grade sex-carc_wt, base(1) Iteration 0: log likelihood = -443.33618 Iteration 1: log likelihood = -374.26182 Iteration 2: log likelihood = -364.20181 Iteration 3: log likelihood = -363.9063 Iteration 4: log likelihood = -363.90546 Iteration 5: log likelihood = -363.90546 Multinomial logistic regression Number of obs = 487 LR chi2(16) = 158.86 Prob > chi2 = 0.0000 Log likelihood = -363.90546 Pseudo R2 = 0.1792 ------------------------------------------------------------------------------ grade | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- AAA | (base outcome) -------------+---------------------------------------------------------------- AA | sex | .9521876 .2845745 3.35 0.001 .3944318 1.509943 bckgrnd | -1.379689 .3476694 -3.97 0.000 -2.061109 -.69827 implant | -.4543614 .2854138 -1.59 0.111 -1.013762 .1050395 backfat | -.2656307 .1180485 -2.25 0.024 -.4970015 -.0342599 ribeye | .3311029 .0861079 3.85 0.000 .1623346 .4998712 imfat | -.4826611 .1261669 -3.83 0.000 -.7299437 -.2353786 days | -.0015295 .0015713 -0.97 0.330 -.0046093 .0015502 carc_wt | -.017968 .0036668 -4.90 0.000 -.0251548 -.0107812 _cons | 7.464408 1.37777 5.42 0.000 4.764029 10.16479 -------------+---------------------------------------------------------------- A | sex | 1.929613 .5309941 3.63 0.000 .8888833 2.970342 bckgrnd | -2.42452 .5154775 -4.70 0.000 -3.434837 -1.414202 implant | -1.379545 .5415622 -2.55 0.011 -2.440987 -.3181021 backfat | -.7707347 .2664633 -2.89 0.004 -1.292993 -.2484762 ribeye | .3248934 .1642367 1.98 0.048 .0029953 .6467915 imfat | -1.09812 .2387782 -4.60 0.000 -1.566117 -.6301236 days | -.0069193 .0032564 -2.12 0.034 -.0133018 -.0005369 carc_wt | -.039161 .006837 -5.73 0.000 -.0525612 -.0257608 _cons | 17.49636 2.525537 6.93 0.000 12.5464 22.44632 ------------------------------------------------------------------------------ . predict pm*, p . egen pmax=rowmax(pm1-pm3) . generate highp123=(pm1==pmax)+2*(pm2==pmax)+3*(pm3==pmax) . tab grade highp123 /* confusion matrix */ Carcass | grade | 1=AAA 2=AA | highp123 3=A | 1 2 3 | Total -----------+---------------------------------+---------- AAA | 78 86 0 | 164 AA | 41 231 5 | 277 A | 3 37 6 | 46 -----------+---------------------------------+---------- Total | 122 354 11 | 487 . discrim logistic sex-carc_wt, group(grade) prior(prop) Iteration 0: log likelihood = -443.33618 Iteration 1: log likelihood = -374.26182 Iteration 2: log likelihood = -364.41906 Iteration 3: log likelihood = -363.91187 Iteration 4: log likelihood = -363.90546 Iteration 5: log likelihood = -363.90546 Logistic discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 78 86 0 | 164 | 47.56 52.44 0.00 | 100.00 | | AA | 41 231 5 | 277 | 14.80 83.39 1.81 | 100.00 | | A | 3 37 6 | 46 | 6.52 80.43 13.04 | 100.00 -------------+------------------------+------- Total | 122 354 11 | 487 | 25.05 72.69 2.26 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | . * leave-one-out crossclassification (for full multinomial model) . sort id . capture drop rowno . capture drop cvpm* . generate rowno=_n . forvalues k=1/3 { 2. generate cvpm`k'=. 3. } (487 missing values generated) (487 missing values generated) (487 missing values generated) . quietly { . egen cvpmax=rowmax(cvpm1-cvpm3) . generate highcvp123=(cvpm1==cvpmax)+2*(cvpm2==cvpmax)+3*(cvpm3==cvpmax) . tab grade highcvp123 /* crossvalidation classification table */ Carcass | grade | 1=AAA 2=AA | highcvp123 3=A | 1 2 3 | Total -----------+---------------------------------+---------- AAA | 74 90 0 | 164 AA | 43 229 5 | 277 A | 3 40 3 | 46 -----------+---------------------------------+---------- Total | 120 359 8 | 487 . drop pm1-highcvp123 /* clean-up */ . . * LDA: three categories and full model . candisc sex-carc_wt, group(grade) prior(prop) lootable Canonical linear discriminant analysis | | Like- | Canon. Eigen- Variance | lihood Fcn | Corr. value Prop. Cumul. | Ratio F df1 df2 Prob>F ----+---------------------------------+------------------------------------ 1 | 0.5148 .360563 0.9517 0.9517 | 0.7218 10.557 16 954 0.0000 e 2 | 0.1341 .018314 0.0483 1.0000 | 0.9820 1.2506 7 478 0.2734 e --------------------------------------------------------------------------- Ho: this and smaller canon. corr. are zero; e = exact F Standardized canonical discriminant function coefficients | function1 function2 -------------+---------------------- sex | .4370393 .4008008 bckgrnd | -.5406284 .0744169 implant | -.2749724 .1062096 backfat | -.3677969 -.0014345 ribeye | .4268066 1.089007 imfat | -.4979424 .0526061 days | -.2114979 .352345 carc_wt | -.7996381 -.1815683 Canonical structure | function1 function2 -------------+---------------------- sex | .0591467 .327256 bckgrnd | -.5496828 .3288711 implant | .016047 -.0649989 backfat | -.1348509 .3978866 ribeye | .002019 .8897859 imfat | -.423941 .0144226 days | -.1496699 -.175766 carc_wt | -.48544 .3082858 Group means on canonical variables grade | function1 function2 -------------+---------------------- AAA | -.7407635 -.0893078 AA | .2348255 .1048708 A | 1.226925 -.313103 Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 80 84 0 | 164 | 48.78 51.22 0.00 | 100.00 | | AA | 42 232 3 | 277 | 15.16 83.75 1.08 | 100.00 | | A | 3 37 6 | 46 | 6.52 80.43 13.04 | 100.00 -------------+------------------------+------- Total | 125 353 9 | 487 | 25.67 72.48 1.85 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | Leave-one-out classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 76 88 0 | 164 | 46.34 53.66 0.00 | 100.00 | | AA | 43 230 4 | 277 | 15.52 83.03 1.44 | 100.00 | | A | 3 39 4 | 46 | 6.52 84.78 8.70 | 100.00 -------------+------------------------+------- Total | 122 357 8 | 487 | 25.05 73.31 1.64 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | . discrim lda sex-carc_wt, group(grade) prior(prop) lootable Linear discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 80 84 0 | 164 | 48.78 51.22 0.00 | 100.00 | | AA | 42 232 3 | 277 | 15.16 83.75 1.08 | 100.00 | | A | 3 37 6 | 46 | 6.52 80.43 13.04 | 100.00 -------------+------------------------+------- Total | 125 353 9 | 487 | 25.67 72.48 1.85 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | Leave-one-out classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 76 88 0 | 164 | 46.34 53.66 0.00 | 100.00 | | AA | 43 230 4 | 277 | 15.52 83.03 1.44 | 100.00 | | A | 3 39 4 | 46 | 6.52 84.78 8.70 | 100.00 -------------+------------------------+------- Total | 122 357 8 | 487 | 25.05 73.31 1.64 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | . loadingplot . scoreplot, msymbol(i) mlabsize(vsmall) . * quadratic distriminant analysis (QDA) . discrim qda sex-carc_wt, group(grade) prior(prop) lootable Quadratic discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 106 57 1 | 164 | 64.63 34.76 0.61 | 100.00 | | AA | 70 189 18 | 277 | 25.27 68.23 6.50 | 100.00 | | A | 4 25 17 | 46 | 8.70 54.35 36.96 | 100.00 -------------+------------------------+------- Total | 180 271 36 | 487 | 36.96 55.65 7.39 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | Leave-one-out classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 93 70 1 | 164 | 56.71 42.68 0.61 | 100.00 | | AA | 77 179 21 | 277 | 27.80 64.62 7.58 | 100.00 | | A | 4 30 12 | 46 | 8.70 65.22 26.09 | 100.00 -------------+------------------------+------- Total | 174 279 34 | 487 | 35.73 57.29 6.98 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | . . * kth nearest neighbour (KNN) . * three categories and full model for beef_ultra data (standardized variables) . foreach var of varlist sex-carc_wt { 2. egen s`var'=std(`var') 3. } . discrim knn ssex-scarc_wt, group(grade) prior(prop) lootable measure (L2) k(7) Kth-nearest-neighbor discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A Unclassified | Total -------------+--------------------------------------+------- AAA | 87 67 1 9 | 164 | 53.05 40.85 0.61 5.49 | 100.00 | | AA | 37 234 1 5 | 277 | 13.36 84.48 0.36 1.81 | 100.00 | | A | 2 32 9 3 | 46 | 4.35 69.57 19.57 6.52 | 100.00 -------------+--------------------------------------+------- Total | 126 333 11 17 | 487 | 25.87 68.38 2.26 3.49 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | Leave-one-out classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A Unclassified | Total -------------+--------------------------------------+------- AAA | 67 95 1 1 | 164 | 40.85 57.93 0.61 0.61 | 100.00 | | AA | 61 215 1 0 | 277 | 22.02 77.62 0.36 0.00 | 100.00 | | A | 3 35 8 0 | 46 | 6.52 76.09 17.39 0.00 | 100.00 -------------+--------------------------------------+------- Total | 131 345 10 1 | 487 | 26.90 70.84 2.05 0.21 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | . discrim knn ssex-scarc_wt, group(grade) prior(prop) lootable measure (L1) k(7) Kth-nearest-neighbor discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A Unclassified | Total -------------+--------------------------------------+------- AAA | 88 72 1 3 | 164 | 53.66 43.90 0.61 1.83 | 100.00 | | AA | 34 228 2 13 | 277 | 12.27 82.31 0.72 4.69 | 100.00 | | A | 0 31 9 6 | 46 | 0.00 67.39 19.57 13.04 | 100.00 -------------+--------------------------------------+------- Total | 122 331 12 22 | 487 | 25.05 67.97 2.46 4.52 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | Leave-one-out classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 69 94 1 | 164 | 42.07 57.32 0.61 | 100.00 | | AA | 56 219 2 | 277 | 20.22 79.06 0.72 | 100.00 | | A | 1 37 8 | 46 | 2.17 80.43 17.39 | 100.00 -------------+------------------------+------- Total | 126 350 11 | 487 | 25.87 71.87 2.26 | 100.00 | | Priors | 0.3368 0.5688 0.0945 | . * KNN with equal prior probabilities . discrim knn ssex-scarc_wt, group(grade) prior(equal) lootable measure (L1) k(11) Kth-nearest-neighbor discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 111 35 18 | 164 | 67.68 21.34 10.98 | 100.00 | | AA | 84 119 74 | 277 | 30.32 42.96 26.71 | 100.00 | | A | 3 5 38 | 46 | 6.52 10.87 82.61 | 100.00 -------------+------------------------+------- Total | 198 159 130 | 487 | 40.66 32.65 26.69 | 100.00 | | Priors | 0.3333 0.3333 0.3333 | Leave-one-out classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True grade | AAA AA A | Total -------------+------------------------+------- AAA | 93 50 21 | 164 | 56.71 30.49 12.80 | 100.00 | | AA | 98 94 85 | 277 | 35.38 33.94 30.69 | 100.00 | | A | 6 16 24 | 46 | 13.04 34.78 52.17 | 100.00 -------------+------------------------+------- Total | 197 160 130 | 487 | 40.45 32.85 26.69 | 100.00 | | Priors | 0.3333 0.3333 0.3333 | . . * kth nearest neighbour for sparrow data . di "suggested range for k (binary): " 49^(2/8) " - " 49^(3/8) suggested range for k (binary): 2.6457513 - 4.3035171 . import delimited "r:\sparrow", clear varnames(1) (6 vars, 49 obs) . encode survivorship, gen(surv) . discrim knn total_length l_humerous, group(surv) prior(prop) measure(L2) k(3) Kth-nearest-neighbor discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True surv | NS S | Total -------------+----------------+------- NS | 22 6 | 28 | 78.57 21.43 | 100.00 | | S | 7 14 | 21 | 33.33 66.67 | 100.00 -------------+----------------+------- Total | 29 20 | 49 | 59.18 40.82 | 100.00 | | Priors | 0.5714 0.4286 | . estat classtable, loo Leave-one-out classification table +---------+ | Key | |---------| | Number | | Percent | +---------+ | LOO Classified True surv | NS S | Total -------------+----------------+------- NS | 18 10 | 28 | 64.29 35.71 | 100.00 | | S | 9 12 | 21 | 42.86 57.14 | 100.00 -------------+----------------+------- Total | 27 22 | 49 | 55.10 44.90 | 100.00 | | Priors | 0.5714 0.4286 | . foreach var of varlist total_length-l_humerous { 2. egen s`var'=std(`var') 3. } . discrim knn stotal_length sl_humerous, group(surv) prior(prop) measure(L2) k(3) Kth-nearest-neighbor discriminant analysis Resubstitution classification summary +---------+ | Key | |---------| | Number | | Percent | +---------+ | Classified True surv | NS S Unclassified | Total -------------+------------------------------+------- NS | 24 4 0 | 28 | 85.71 14.29 0.00 | 100.00 | | S | 8 11 2 | 21 | 38.10 52.38 9.52 | 100.00 -------------+------------------------------+------- Total | 32 15 2 | 49 | 65.31 30.61 4.08 | 100.00 | | Priors | 0.5714 0.4286 | . estat classtable, loo Leave-one-out classification table +---------+ | Key | |---------| | Number | | Percent | +---------+ | LOO Classified True surv | NS S | Total -------------+----------------+------- NS | 13 15 | 28 | 46.43 53.57 | 100.00 | | S | 13 8 | 21 | 61.90 38.10 | 100.00 -------------+----------------+------- Total | 26 23 | 49 | 53.06 46.94 | 100.00 | | Priors | 0.5714 0.4286 | . . * canonical correlations analysis . import delimited r:\butterfly.csv, clear (11 vars, 16 obs) . gen pfpgi046=pfpgi040+pfpgi060 . canon (altitude annprec maxtemp mintemp) (pfpgi046 pfpgi080 pfpgi100 pfpgi116) Canonical correlation analysis Number of obs = 16 Raw coefficients for the first variable set | 1 2 3 4 -------------+---------------------------------------- altitude | 0.0000 -0.0008 -0.0010 0.0005 annprec | 0.0210 0.0483 -0.0971 0.0173 maxtemp | -0.0734 -0.0750 -0.0905 0.2666 mintemp | -0.0237 -0.1280 -0.3222 -0.0080 ------------------------------------------------------ Raw coefficients for the second variable set | 1 2 3 4 -------------+---------------------------------------- pfpgi046 | -0.0488 0.1571 0.3099 0.0590 pfpgi080 | -0.0391 0.2094 0.1202 -0.1307 pfpgi100 | 0.0044 0.1919 0.1868 -0.0251 pfpgi116 | -0.0766 0.2643 0.2547 0.0590 ------------------------------------------------------ ---------------------------------------------------------------------------- Canonical correlations: 0.8619 0.4503 0.3859 0.0885 ---------------------------------------------------------------------------- Tests of significance of all canonical correlations Statistic df1 df2 F Prob>F Wilks' lambda .173128 16 25.078 1.2154 0.3220 a Pillai's trace 1.10235 16 44 1.0462 0.4312 a Lawley-Hotelling trace 3.32563 16 26 1.3510 0.2407 a Roy's largest root 2.88842 4 11 7.9431 0.0029 u ---------------------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F . canon (altitude annprec maxtemp mintemp) (pfpgi046 pfpgi080 pfpgi100 pfpgi116), stdcoef Canonical correlation analysis Number of obs = 16 Standardized coefficients for the first variable set | 1 2 3 4 -------------+---------------------------------------- altitude | 0.1243 -2.4316 2.9501 1.3685 annprec | 0.2931 0.6752 1.3581 0.2416 maxtemp | -0.4683 -0.4785 0.5773 1.7014 mintemp | -0.2597 -1.4034 3.5314 -0.0882 ------------------------------------------------------ Standardized coefficients for the second variable set | 1 2 3 4 -------------+---------------------------------------- pfpgi046 | -0.5480 1.7660 3.4831 0.6632 pfpgi080 | -0.4218 2.2563 1.2959 -1.4088 pfpgi100 | 0.0885 3.8509 3.7474 -0.5045 pfpgi116 | -0.8256 2.8482 2.7455 0.6359 ------------------------------------------------------ ---------------------------------------------------------------------------- Canonical correlations: 0.8619 0.4503 0.3859 0.0885 ---------------------------------------------------------------------------- Tests of significance of all canonical correlations Statistic df1 df2 F Prob>F Wilks' lambda .173128 16 25.078 1.2154 0.3220 a Pillai's trace 1.10235 16 44 1.0462 0.4312 a Lawley-Hotelling trace 3.32563 16 26 1.3510 0.2407 a Roy's largest root 2.88842 4 11 7.9431 0.0029 u ---------------------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F . canon, test(1 2 3 4) stdcoef Canonical correlation analysis Number of obs = 16 Standardized coefficients for the first variable set | 1 2 3 4 -------------+---------------------------------------- altitude | 0.1243 -2.4316 2.9501 1.3685 annprec | 0.2931 0.6752 1.3581 0.2416 maxtemp | -0.4683 -0.4785 0.5773 1.7014 mintemp | -0.2597 -1.4034 3.5314 -0.0882 ------------------------------------------------------ Standardized coefficients for the second variable set | 1 2 3 4 -------------+---------------------------------------- pfpgi046 | -0.5480 1.7660 3.4831 0.6632 pfpgi080 | -0.4218 2.2563 1.2959 -1.4088 pfpgi100 | 0.0885 3.8509 3.7474 -0.5045 pfpgi116 | -0.8256 2.8482 2.7455 0.6359 ------------------------------------------------------ ---------------------------------------------------------------------------- Canonical correlations: 0.8619 0.4503 0.3859 0.0885 ---------------------------------------------------------------------------- Tests of significance of all canonical correlations Statistic df1 df2 F Prob>F Wilks' lambda .173128 16 25.078 1.2154 0.3220 a Pillai's trace 1.10235 16 44 1.0462 0.4312 a Lawley-Hotelling trace 3.32563 16 26 1.3510 0.2407 a Roy's largest root 2.88842 4 11 7.9431 0.0029 u ---------------------------------------------------------------------------- Test of significance of canonical correlations 1-4 Statistic df1 df2 F Prob>F Wilks' lambda .173128 16 25.078 1.2154 0.3220 a ---------------------------------------------------------------------------- Test of significance of canonical correlations 2-4 Statistic df1 df2 F Prob>F Wilks' lambda .673194 9 22.0542 0.4327 0.9029 a ---------------------------------------------------------------------------- Test of significance of canonical correlations 3-4 Statistic df1 df2 F Prob>F Wilks' lambda .844383 4 20 0.4413 0.7773 e ---------------------------------------------------------------------------- Test of significance of canonical correlation 4 Statistic df1 df2 F Prob>F Wilks' lambda .992173 1 11 0.0868 0.7738 e ---------------------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F . canon, stderr Linear combinations for canonical correlations Number of obs = 16 ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- u1 | altitude | .0000432 .0002503 0.17 0.865 -.0004904 .0005767 annprec | .0209501 .0198237 1.06 0.307 -.0213032 .0632033 maxtemp | -.0733716 .0532979 -1.38 0.189 -.1869734 .0402302 mintemp | -.0236983 .0616723 -0.38 0.706 -.1551497 .1077531 -------------+---------------------------------------------------------------- v1 | pfpgi046 | -.0487537 .0631188 -0.77 0.452 -.1832882 .0857809 pfpgi080 | -.0391405 .0492035 -0.80 0.439 -.1440153 .0657342 pfpgi100 | .0044129 .0477266 0.09 0.928 -.0973139 .1061398 pfpgi116 | -.076604 .0673398 -1.14 0.273 -.2201353 .0669274 -------------+---------------------------------------------------------------- u2 | altitude | -.0008445 .0008437 -1.00 0.333 -.0026428 .0009537 annprec | .0482542 .0668109 0.72 0.481 -.0941498 .1906582 maxtemp | -.0749766 .1796274 -0.42 0.682 -.4578433 .3078901 mintemp | -.128046 .2078512 -0.62 0.547 -.5710703 .3149783 -------------+---------------------------------------------------------------- v2 | pfpgi046 | .1571241 .2127263 0.74 0.472 -.2962913 .6105395 pfpgi080 | .2093762 .1658281 1.26 0.226 -.144078 .5628304 pfpgi100 | .1919306 .1608506 1.19 0.251 -.1509145 .5347756 pfpgi116 | .2642672 .226952 1.16 0.262 -.2194697 .748004 -------------+---------------------------------------------------------------- u3 | altitude | -.0010247 .0010169 -1.01 0.330 -.0031922 .0011428 annprec | -.0970585 .0805307 -1.21 0.247 -.2687056 .0745886 maxtemp | -.0904504 .2165145 -0.42 0.682 -.5519401 .3710392 mintemp | -.3222173 .2505341 -1.29 0.218 -.856218 .2117835 -------------+---------------------------------------------------------------- v3 | pfpgi046 | .3098943 .2564104 1.21 0.246 -.2366315 .85642 pfpgi080 | .12025 .1998814 0.60 0.556 -.3057872 .5462872 pfpgi100 | .1867713 .1938818 0.96 0.351 -.2264781 .6000206 pfpgi116 | .2547378 .2735574 0.93 0.366 -.328336 .8378115 -------------+---------------------------------------------------------------- u4 | altitude | .0004753 .0047901 0.10 0.922 -.0097344 .0106851 annprec | .0172695 .3793301 0.05 0.964 -.7912534 .8257924 maxtemp | .266588 1.019865 0.26 0.797 -1.907203 2.440379 mintemp | -.0080481 1.180111 -0.01 0.995 -2.523394 2.507298 -------------+---------------------------------------------------------------- v4 | pfpgi046 | .0590097 1.20779 0.05 0.962 -2.515334 2.633353 pfpgi080 | -.1307303 .9415173 -0.14 0.891 -2.137527 1.876066 pfpgi100 | -.0251423 .913257 -0.03 0.978 -1.971703 1.921419 pfpgi116 | .0589981 1.288559 0.05 0.964 -2.6875 2.805497 ------------------------------------------------------------------------------ (Standard errors estimated conditionally) Canonical correlations: 0.8619 0.4503 0.3859 0.0885 ---------------------------------------------------------------------------- Tests of significance of all canonical correlations Statistic df1 df2 F Prob>F Wilks' lambda .173128 16 25.078 1.2154 0.3220 a Pillai's trace 1.10235 16 44 1.0462 0.4312 a Lawley-Hotelling trace 3.32563 16 26 1.3510 0.2407 a Roy's largest root 2.88842 4 11 7.9431 0.0029 u ---------------------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F . estat loadings Canonical loadings for variable list 1 | 1 2 3 4 -------------+---------------------------------------- altitude | 0.9215 -0.3388 0.0624 0.1795 annprec | 0.7709 0.5134 -0.2674 0.2658 maxtemp | -0.8983 0.2023 -0.0241 0.3892 mintemp | -0.9194 0.0525 -0.2285 -0.3158 ------------------------------------------------------ Canonical loadings for variable list 2 | 1 2 3 4 -------------+---------------------------------------- pfpgi046 | -0.3842 -0.6200 0.6027 -0.3237 pfpgi080 | -0.7396 -0.1494 0.0834 -0.6510 pfpgi100 | 0.9611 0.2508 0.0020 0.1158 pfpgi116 | -0.4753 0.5147 -0.4424 0.5598 ------------------------------------------------------ Correlation between variable list 1 and canonical variates from list 2 | 1 2 3 4 -------------+---------------------------------------- altitude | 0.7942 -0.1526 0.0241 0.0159 annprec | 0.6644 0.2312 -0.1032 0.0235 maxtemp | -0.7742 0.0911 -0.0093 0.0344 mintemp | -0.7924 0.0236 -0.0882 -0.0279 ------------------------------------------------------ Correlation between variable list 2 and canonical variates from list 1 | 1 2 3 4 -------------+---------------------------------------- pfpgi046 | -0.3311 -0.2792 0.2326 -0.0286 pfpgi080 | -0.6374 -0.0673 0.0322 -0.0576 pfpgi100 | 0.8283 0.1129 0.0008 0.0102 pfpgi116 | -0.4097 0.2318 -0.1707 0.0495 ------------------------------------------------------ . estat correlations Correlations for variable list 1 | altitude annprec maxtemp mintemp -------------+---------------------------------------- altitude | 1.0000 annprec | 0.5674 1.0000 maxtemp | -0.8280 -0.4787 1.0000 mintemp | -0.9359 -0.7046 0.7191 1.0000 ------------------------------------------------------ Correlations for variable list 2 | pfpgi046 pfpgi080 pfpgi100 pfpgi116 -------------+---------------------------------------- pfpgi046 | 1.0000 pfpgi080 | 0.6377 1.0000 pfpgi100 | -0.5610 -0.8235 1.0000 pfpgi116 | -0.5843 -0.1267 -0.2638 1.0000 ------------------------------------------------------ Correlations between variable lists 1 and 2 | altitude annprec maxtemp mintemp -------------+---------------------------------------- pfpgi046 | -0.2011 -0.4684 0.2242 0.2456 pfpgi080 | -0.5729 -0.5498 0.5358 0.5933 pfpgi100 | 0.7269 0.6990 -0.7173 -0.7590 pfpgi116 | -0.4578 -0.1380 0.4383 0.4122 ------------------------------------------------------ . predict lcx, u . predict lcy, v . scatter lcy lcx, mlabel(colony) . * demonstration 1 . regress pfpgi080 altitude annprec maxtemp mintemp Source | SS df MS Number of obs = 16 -------------+---------------------------------- F(4, 11) = 1.95 Model | 723.17338 4 180.793345 Prob > F = 0.1718 Residual | 1018.76412 11 92.61492 R-squared = 0.4152 -------------+---------------------------------- Adj R-squared = 0.2025 Total | 1741.9375 15 116.129167 Root MSE = 9.6237 ------------------------------------------------------------------------------ pfpgi080 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- altitude | -.0003347 .0035061 -0.10 0.926 -.0080516 .0073823 annprec | -.2232496 .2776552 -0.80 0.438 -.8343645 .3878654 maxtemp | .3615108 .7465025 0.48 0.638 -1.28153 2.004552 mintemp | .14888 .863796 0.17 0.866 -1.752322 2.050082 _cons | -12.73373 90.12097 -0.14 0.890 -211.0886 185.6212 ------------------------------------------------------------------------------ . canon (altitude annprec maxtemp mintemp) (pfpgi080) Canonical correlation analysis Number of obs = 16 Raw coefficients for the first variable set | 1 -------------+---------- altitude | -0.0000 annprec | -0.0322 maxtemp | 0.0521 mintemp | 0.0214 ------------------------ Raw coefficients for the second variable set | 1 -------------+---------- pfpgi080 | 0.0928 ------------------------ ---------------------------------------------------------------------------- Canonical correlations: 0.6443 ---------------------------------------------------------------------------- Tests of significance of all canonical correlations Statistic df1 df2 F Prob>F Wilks' lambda .584845 4 11 1.9521 0.1718 e Pillai's trace .415155 4 11 1.9521 0.1718 e Lawley-Hotelling trace .709854 4 11 1.9521 0.1718 e Roy's largest root .709854 4 11 1.9521 0.1718 e ---------------------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F . * demonstration 2 . canon (annprec maxtemp) (pfpgi08 pfpgi116) Canonical correlation analysis Number of obs = 16 Raw coefficients for the first variable set | 1 2 -------------+-------------------- annprec | -0.0253 0.0774 maxtemp | 0.1224 0.1298 ---------------------------------- Raw coefficients for the second variable set | 1 2 -------------+-------------------- pfpgi080 | 0.0807 -0.0473 pfpgi116 | 0.0571 0.0741 ---------------------------------- ---------------------------------------------------------------------------- Canonical correlations: 0.7741 0.2478 ---------------------------------------------------------------------------- Tests of significance of all canonical correlations Statistic df1 df2 F Prob>F Wilks' lambda .37613 4 24 3.7832 0.0160 e Pillai's trace .660658 4 26 3.2063 0.0288 a Lawley-Hotelling trace 1.56085 4 22 4.2923 0.0102 a Roy's largest root 1.49545 2 13 9.7204 0.0026 u ---------------------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F . canon, test(1 2) Canonical correlation analysis Number of obs = 16 Raw coefficients for the first variable set | 1 2 -------------+-------------------- annprec | -0.0253 0.0774 maxtemp | 0.1224 0.1298 ---------------------------------- Raw coefficients for the second variable set | 1 2 -------------+-------------------- pfpgi080 | 0.0807 -0.0473 pfpgi116 | 0.0571 0.0741 ---------------------------------- ---------------------------------------------------------------------------- Canonical correlations: 0.7741 0.2478 ---------------------------------------------------------------------------- Tests of significance of all canonical correlations Statistic df1 df2 F Prob>F Wilks' lambda .37613 4 24 3.7832 0.0160 e Pillai's trace .660658 4 26 3.2063 0.0288 a Lawley-Hotelling trace 1.56085 4 22 4.2923 0.0102 a Roy's largest root 1.49545 2 13 9.7204 0.0026 u ---------------------------------------------------------------------------- Test of significance of canonical correlations 1-2 Statistic df1 df2 F Prob>F Wilks' lambda .37613 4 24 3.7832 0.0160 e ---------------------------------------------------------------------------- Test of significance of canonical correlation 2 Statistic df1 df2 F Prob>F Wilks' lambda .938613 1 13 0.8502 0.3733 e ---------------------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F . estat loadings Canonical loadings for variable list 1 | 1 2 -------------+-------------------- annprec | -0.7275 0.6861 maxtemp | 0.9506 0.3104 ---------------------------------- Canonical loadings for variable list 2 | 1 2 -------------+-------------------- pfpgi080 | 0.7919 -0.6107 pfpgi116 | 0.5054 0.8629 ---------------------------------- Correlation between variable list 1 and canonical variates from list 2 | 1 2 -------------+-------------------- annprec | -0.5632 0.1700 maxtemp | 0.7359 0.0769 ---------------------------------- Correlation between variable list 2 and canonical variates from list 1 | 1 2 -------------+-------------------- pfpgi080 | 0.6130 -0.1513 pfpgi116 | 0.3913 0.2138 ---------------------------------- .