. * do-file for lecture 3b of VHM 812/802, Winter 2025 . version 18 /* works also with versions 14-17 */ . set more off . set scheme stcolor_alt . cd "r:\" r:\ . . use daisy2, clear . * illustration of linearity assessment, using daisy2 data . * models have terrible residuals but we will ignore that... . gen milk120k=milk120/1000 (1,288 missing values generated) . lowess cf milk120k . * extra code to add 95% range for x ~ Figure 15.3 . centile milk120k, centile(2.5 97.5) Binom. interp. Variable | Obs Percentile Centile [95% conf. interval] -------------+------------------------------------------------------------- milk120k | 8,095 2.5 1.67574 1.643845 1.709898 | 97.5 4.48878 4.440558 4.542869 . twoway (scatter cf milk120k, xline(1.67574) xline(4.48878)) (lowess cf milk120k) . * graphics based on residuals . regress cf milk120k Source | SS df MS Number of obs = 7,720 -------------+---------------------------------- F(1, 7718) = 0.95 Model | 764.55464 1 764.55464 Prob > F = 0.3293 Residual | 6199583.97 7,718 803.263018 R-squared = 0.0001 -------------+---------------------------------- Adj R-squared = -0.0000 Total | 6200348.53 7,719 803.258003 Root MSE = 28.342 ------------------------------------------------------------------------------ cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- milk120k | -.4434752 .4545628 -0.98 0.329 -1.334542 .4475914 _cons | 79.60298 1.379894 57.69 0.000 76.89801 82.30795 ------------------------------------------------------------------------------ . predict stdres, rstandard (1,663 missing values generated) . lowess stdres milk120k . * quadratic regression, VER Example 15.3 . regress cf c.milk120k##c.milk120k Source | SS df MS Number of obs = 7,720 -------------+---------------------------------- F(2, 7717) = 17.93 Model | 28681.5187 2 14340.7594 Prob > F = 0.0000 Residual | 6171667.01 7,717 799.749515 R-squared = 0.0046 -------------+---------------------------------- Adj R-squared = 0.0044 Total | 6200348.53 7,719 803.258003 Root MSE = 28.28 --------------------------------------------------------------------------------------- cf | Coefficient Std. err. t P>|t| [95% conf. interval] ----------------------+---------------------------------------------------------------- milk120k | -16.14046 2.69524 -5.99 0.000 -21.42386 -10.85706 | c.milk120k#c.milk120k | 2.53519 .429095 5.91 0.000 1.694047 3.376333 | _cons | 102.5711 4.124112 24.87 0.000 94.48672 110.6555 --------------------------------------------------------------------------------------- . * categorisation with lintrend . * lintrend is an add-on command, install using: findit lintrend (pick sg50 option) . lintrend cf milk120k, groups(5) plot(mean) // does not support linear relation The mean of cf by categories of milk120k (Note: 5 milk120k categories of equal sample size; Uses mean milk120k value for each category) +--------------------------------------------+ | milk120k min max total cf | |--------------------------------------------| | 2.0 .989 2.35 1620 76.70 | | 2.6 2.3507 2.7474 1618 74.69 | | 2.9 2.7476 3.0963 1620 73.93 | | 3.3 3.0964 3.5314 1619 74.06 | | 4.0 3.5317 6.7128 1618 73.96 | +--------------------------------------------+ . * box-cox analysis for transforming x . boxcox cf milk120k, model(rhs) Fitting full model Iteration 0: Log likelihood = -36771.519 (not concave) Iteration 1: Log likelihood = -36770.154 (not concave) Iteration 2: Log likelihood = -36768.274 (not concave) Iteration 3: Log likelihood = -36758.817 Iteration 4: Log likelihood = -36758.585 Iteration 5: Log likelihood = -36757.787 Iteration 6: Log likelihood = -36757.787 (1,288 missing values generated) (1,288 missing values generated) (1,288 missing values generated) Number of obs = 7,720 LR chi2(2) = 28.42 Log likelihood = -36757.787 Prob > chi2 = 0.000 ------------------------------------------------------------------------------ cf | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- /lambda | -4.119009 .6687904 -6.16 0.000 -5.429814 -2.808204 ------------------------------------------------------------------------------ Estimates of scale-variant parameters ---------------------------- | Coefficient -------------+-------------- Notrans | _cons | 116.5945 -------------+-------------- Trans | milk120k | -161.589 -------------+-------------- /sigma | 28.28786 ---------------------------- --------------------------------------------------------- Test Restricted LR statistic H0: log likelihood chi2 Prob > chi2 --------------------------------------------------------- lambda = -1 -36766.658 17.74 0.000 lambda = 0 -36769.777 23.98 0.000 lambda = 1 -36771.519 27.46 0.000 --------------------------------------------------------- . generate milk120k_pow=-1/milk120k^4 // minus to keep order for x (1,288 missing values generated) . regress cf milk120k_pow Source | SS df MS Number of obs = 7,720 -------------+---------------------------------- F(1, 7718) = 28.43 Model | 22754.594 1 22754.594 Prob > F = 0.0000 Residual | 6177593.93 7,718 800.413829 R-squared = 0.0037 -------------+---------------------------------- Adj R-squared = 0.0035 Total | 6200348.53 7,719 803.258003 Root MSE = 28.292 ------------------------------------------------------------------------------ cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- milk120k_pow | -38.06757 7.139665 -5.33 0.000 -52.06325 -24.07188 _cons | 77.30789 .3713342 208.19 0.000 76.57997 78.0358 ------------------------------------------------------------------------------ . predict stdres2, rstandard (1,663 missing values generated) . lowess stdres2 milk120k_pow . * improvement over linear term alone, but not over quadratic model . . * fractional polynomials, VER Example 15.4 . fp : regress cf (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------ | Test Residual Deviance milk120k | df Deviance std. dev. diff. P Powers -------------+---------------------------------------------------------------- omitted | 4 73543.99 28.342 45.690 0.000 linear | 3 73543.04 28.342 44.738 0.000 1 m = 1 | 2 73525.42 28.310 27.118 0.000 -2 m = 2 | 0 73498.30 28.262 0.000 -- -.5 0 ------------------------------------------------------------------------------ Note: Test df is degrees of freedom, and P = P > F is sig. level for tests comparing models vs. model with m = 2 based on deviance difference, F(df, 7715). Source | SS df MS Number of obs = 7,720 -------------+---------------------------------- F(2, 7717) = 22.90 Model | 36587.3332 2 18293.6666 Prob > F = 0.0000 Residual | 6163761.19 7,717 798.725048 R-squared = 0.0059 -------------+---------------------------------- Adj R-squared = 0.0056 Total | 6200348.53 7,719 803.258003 Root MSE = 28.262 ------------------------------------------------------------------------------ cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- milk120k_1 | 295.8967 46.01638 6.43 0.000 205.6922 386.1013 milk120k_2 | 87.36896 14.07522 6.21 0.000 59.77771 114.9602 _cons | -189.8411 42.17274 -4.50 0.000 -272.5111 -107.1711 ------------------------------------------------------------------------------ . fp plot, r(none) // estimated curve with 95% CI ~ Figure 15.7 . fp plot, r(residuals) // estimated curve with 95% CI and observed values . fp plot, r(rstandard) // adding the standardised residuals does not make much sense . br cf milk120k milk120k_1 milk120k_2 . di "-0.5 term: " 1/sqrt(3.5058) " and 0 term: "ln(3.5058) -0.5 term: .53408014 and 0 term: 1.2544187 . * try with larger class of models . capture drop milk120k_1 milk120k_2 // fp will not override these variables, unless replace option is us > ed . fp , powers(-4 -3 -2 -1 -.5 0 .5 1 2 3) dimension(3) : regress cf (fitting 285 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------ | Test Residual Deviance milk120k | df Deviance std. dev. diff. P Powers -------------+---------------------------------------------------------------- omitted | 6 73543.99 28.342 45.741 0.000 linear | 5 73543.04 28.342 44.789 0.000 1 m = 1 | 4 73515.61 28.292 17.358 0.002 -4 m = 2 | 2 73498.30 28.262 0.052 0.974 -.5 0 m = 3 | 0 73498.25 28.263 0.000 -- .5 .5 .5 ------------------------------------------------------------------------------ Note: Test df is degrees of freedom, and P = P > F is sig. level for tests comparing models vs. model with m = 3 based on deviance difference, F(df, 7713). Source | SS df MS Number of obs = 7,720 -------------+---------------------------------- F(3, 7716) = 15.28 Model | 36628.6805 3 12209.5602 Prob > F = 0.0000 Residual | 6163719.85 7,716 798.823204 R-squared = 0.0059 -------------+---------------------------------- Adj R-squared = 0.0055 Total | 6200348.53 7,719 803.258003 Root MSE = 28.263 ------------------------------------------------------------------------------ cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- milk120k_1 | -730.7391 394.4714 -1.85 0.064 -1504.01 42.53202 milk120k_2 | 313.1601 182.0551 1.72 0.085 -43.71734 670.0375 milk120k_3 | -42.3349 30.02417 -1.41 0.159 -101.1904 16.52062 _cons | 835.2254 399.5796 2.09 0.037 51.94089 1618.51 ------------------------------------------------------------------------------ . * syntax for multivariable models . capture drop milk120k_1 milk120k_2 milk120k_3 . fp : regress cf i.twin (fitting 44 models) (....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%) Fractional polynomial comparisons: ------------------------------------------------------------------------------ | Test Residual Deviance milk120k | df Deviance std. dev. diff. P Powers -------------+---------------------------------------------------------------- omitted | 4 73541.35 28.339 46.253 0.000 linear | 3 73540.28 28.339 45.192 0.000 1 m = 1 | 2 73522.22 28.306 27.124 0.000 -2 m = 2 | 0 73495.09 28.258 0.000 -- -.5 0 ------------------------------------------------------------------------------ Note: Test df is degrees of freedom, and P = P > F is sig. level for tests comparing models vs. model with m = 2 based on deviance difference, F(df, 7714). Source | SS df MS Number of obs = 7,720 -------------+---------------------------------- F(3, 7716) = 16.34 Model | 39147.274 3 13049.0913 Prob > F = 0.0000 Residual | 6161201.25 7,716 798.496793 R-squared = 0.0063 -------------+---------------------------------- Adj R-squared = 0.0059 Total | 6200348.53 7,719 803.258003 Root MSE = 28.258 ------------------------------------------------------------------------------ cf | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- milk120k_1 | 296.9481 46.01355 6.45 0.000 206.7491 387.1472 milk120k_2 | 87.6043 14.07382 6.22 0.000 60.01579 115.1928 | twin | yes | 4.259369 2.378849 1.79 0.073 -.403821 8.922559 _cons | -190.7942 42.17007 -4.52 0.000 -273.459 -108.1294 ------------------------------------------------------------------------------ . . * reliability, illustrated with Coleman data . use coleman, clear . regress y x1-x5 Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(5, 14) = 27.08 Model | 582.686401 5 116.53728 Prob > F = 0.0000 Residual | 60.2378924 14 4.3027066 R-squared = 0.9063 -------------+---------------------------------- Adj R-squared = 0.8728 Total | 642.924294 19 33.8381207 Root MSE = 2.0743 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- x1 | -1.793333 1.233397 -1.45 0.168 -4.438706 .8520392 x2 | .0436015 .0532589 0.82 0.427 -.0706275 .1578304 x3 | .5557601 .0929564 5.98 0.000 .3563884 .7551318 x4 | 1.110168 .4337681 2.56 0.023 .1798279 2.040508 x5 | -1.810919 2.02739 -0.89 0.387 -6.159239 2.5374 _cons | 19.94857 13.62755 1.46 0.165 -9.279627 49.17676 ------------------------------------------------------------------------------ . predict res, residual . predict lev, hat . generate temp=(res/(1-lev))^2 . total temp // PRESS statistic Total estimation Number of obs = 20 -------------------------------------------------------------- | Total Std. err. [95% conf. interval] -------------+------------------------------------------------ temp | 117.7791 56.41735 -.3037369 235.862 -------------------------------------------------------------- . display "Predictive R^2 = " 1- 117.7791/642.924294 Predictive R^2 = .8168072 . * drop in R^2 from 90.6% to 81.7% ~ substantial (9%) shrinkage . . * model fit statistics . regress y x1-x5 Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(5, 14) = 27.08 Model | 582.686401 5 116.53728 Prob > F = 0.0000 Residual | 60.2378924 14 4.3027066 R-squared = 0.9063 -------------+---------------------------------- Adj R-squared = 0.8728 Total | 642.924294 19 33.8381207 Root MSE = 2.0743 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- x1 | -1.793333 1.233397 -1.45 0.168 -4.438706 .8520392 x2 | .0436015 .0532589 0.82 0.427 -.0706275 .1578304 x3 | .5557601 .0929564 5.98 0.000 .3563884 .7551318 x4 | 1.110168 .4337681 2.56 0.023 .1798279 2.040508 x5 | -1.810919 2.02739 -0.89 0.387 -6.159239 2.5374 _cons | 19.94857 13.62755 1.46 0.165 -9.279627 49.17676 ------------------------------------------------------------------------------ . estat ic Akaike's information criterion and Bayesian information criterion ----------------------------------------------------------------------------- Model | N ll(null) ll(model) df AIC BIC -------------+--------------------------------------------------------------- . | 20 -63.08172 -39.40446 6 90.80893 96.78332 ----------------------------------------------------------------------------- Note: BIC uses N = number of observations. See [R] IC note. . regress y x3 x4 Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(2, 17) = 66.95 Model | 570.497872 2 285.248936 Prob > F = 0.0000 Residual | 72.4264222 17 4.26037777 R-squared = 0.8873 -------------+---------------------------------- Adj R-squared = 0.8741 Total | 642.924294 19 33.8381207 Root MSE = 2.0641 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- x3 | .5415607 .0500441 10.82 0.000 .4359768 .6471445 x4 | .7498915 .3666402 2.05 0.057 -.0236517 1.523435 _cons | 14.5827 9.175409 1.59 0.130 -4.775723 33.94112 ------------------------------------------------------------------------------ . estat ic Akaike's information criterion and Bayesian information criterion ----------------------------------------------------------------------------- Model | N ll(null) ll(model) df AIC BIC -------------+--------------------------------------------------------------- . | 20 -63.08172 -41.24716 3 88.49432 91.48152 ----------------------------------------------------------------------------- Note: BIC uses N = number of observations. See [R] IC note. . . * comparison of predictor effects . centile x1-x5, centile(25 75) Binom. interp. Variable | Obs Percentile Centile [95% conf. interval] -------------+------------------------------------------------------------- x1 | 20 25 2.4425 2.091335 2.608275 | 75 3.025 2.692345 3.535893 x2 | 20 25 17.55 11.60616 29.29372 | 75 67.7025 36.25481 77.36343 x3 | 20 25 -.7625 -12.84819 3.901728 | 75 11.895 6.221725 14.2645 x4 | 20 25 24.51 23.51092 25.01 | 75 25.7 25.08819 26.6 x5 | 20 25 5.66 5.342362 6.17354 | 75 6.92 6.24761 7.099384 . regress y x1-x5 Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(5, 14) = 27.08 Model | 582.686401 5 116.53728 Prob > F = 0.0000 Residual | 60.2378924 14 4.3027066 R-squared = 0.9063 -------------+---------------------------------- Adj R-squared = 0.8728 Total | 642.924294 19 33.8381207 Root MSE = 2.0743 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- x1 | -1.793333 1.233397 -1.45 0.168 -4.438706 .8520392 x2 | .0436015 .0532589 0.82 0.427 -.0706275 .1578304 x3 | .5557601 .0929564 5.98 0.000 .3563884 .7551318 x4 | 1.110168 .4337681 2.56 0.023 .1798279 2.040508 x5 | -1.810919 2.02739 -0.89 0.387 -6.159239 2.5374 _cons | 19.94857 13.62755 1.46 0.165 -9.279627 49.17676 ------------------------------------------------------------------------------ . lincom x1*(3.025-2.4425) ( 1) .5825*x1 = 0 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- (1) | -1.044617 .7184535 -1.45 0.168 -2.585546 .4963128 ------------------------------------------------------------------------------ . lincom x2*(67.7025-17.55) ( 1) 50.1525*x2 = 0 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- (1) | 2.186723 2.671067 0.82 0.427 -3.542145 7.915591 ------------------------------------------------------------------------------ . lincom x3*(11.895+0.7625) ( 1) 12.6575*x3 = 0 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- (1) | 7.034533 1.176596 5.98 0.000 4.510986 9.55808 ------------------------------------------------------------------------------ . lincom x4*(25.7-24.51) ( 1) 1.19*x4 = 0 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- (1) | 1.3211 .516184 2.56 0.023 .2139952 2.428204 ------------------------------------------------------------------------------ . lincom x5*(6.92-5.66) ( 1) 1.26*x5 = 0 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- (1) | -2.281758 2.554512 -0.89 0.387 -7.760641 3.197124 ------------------------------------------------------------------------------ . * x3 has by far the largest effect on the outcome (also most significant, but a different thing) . . * automated variable selection procedures . use coleman, clear . * forward selection with significance level 0.05 . stepwise, pe(0.05): regress y x1-x5 Wald test, begin with empty model: p = 0.0000 < 0.0500, adding x3 Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(1, 18) = 110.23 Model | 552.675528 1 552.675528 Prob > F = 0.0000 Residual | 90.2487661 18 5.01382034 R-squared = 0.8596 -------------+---------------------------------- Adj R-squared = 0.8518 Total | 642.924294 19 33.8381207 Root MSE = 2.2392 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- x3 | .5603254 .0533691 10.50 0.000 .4482012 .6724497 _cons | 33.3228 .5279987 63.11 0.000 32.21351 34.43208 ------------------------------------------------------------------------------ . * backward elimination with significance level 0.05 . stepwise, pr(0.05): regress y x1-x5 Wald test, begin with full model: p = 0.4267 >= 0.0500, removing x2 p = 0.6863 >= 0.0500, removing x5 p = 0.1616 >= 0.0500, removing x1 p = 0.0566 >= 0.0500, removing x4 Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(1, 18) = 110.23 Model | 552.675528 1 552.675528 Prob > F = 0.0000 Residual | 90.2487661 18 5.01382034 R-squared = 0.8596 -------------+---------------------------------- Adj R-squared = 0.8518 Total | 642.924294 19 33.8381207 Root MSE = 2.2392 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- x3 | .5603254 .0533691 10.50 0.000 .4482012 .6724497 _cons | 33.3228 .5279987 63.11 0.000 32.21351 34.43208 ------------------------------------------------------------------------------ . * stepwise approach with significance level 0.05, starting from maximal model . stepwise, pe(0.049999) pr(0.05): regress y x1-x5 // note: pe= 0.0500, removing x2 p = 0.6863 >= 0.0500, removing x5 p = 0.1616 >= 0.0500, removing x1 p = 0.0566 >= 0.0500, removing x4 Source | SS df MS Number of obs = 20 -------------+---------------------------------- F(1, 18) = 110.23 Model | 552.675528 1 552.675528 Prob > F = 0.0000 Residual | 90.2487661 18 5.01382034 R-squared = 0.8596 -------------+---------------------------------- Adj R-squared = 0.8518 Total | 642.924294 19 33.8381207 Root MSE = 2.2392 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- x3 | .5603254 .0533691 10.50 0.000 .4482012 .6724497 _cons | 33.3228 .5279987 63.11 0.000 32.21351 34.43208 ------------------------------------------------------------------------------ . * all algorithms lead to same model (not always the case!) . . use daisy2red, clear . gen calv_mth=month(calv_dt) . gen aut_calv=(calv_mth>=2 & calv_mth<=7) if !missing(calv_mth) . gen rootwpc=sqrt(wpc) . stepwise, pe(0.049999) pr(0.05): regress rootwpc parity c.herd_size##c.herd_size c.mwp i.twin##i.dyst i > .rp##vag_disch // resulting model not meaningful note: 0b.twin omitted because of estimability. note: 0b.dyst omitted because of estimability. note: 0o.twin#0o.dyst omitted because of estimability. note: 0o.twin#1o.dyst omitted because of estimability. note: 1o.twin#0o.dyst omitted because of estimability. note: 0b.rp omitted because of estimability. note: 0b.vag_disch omitted because of estimability. note: 0o.rp#0o.vag_disch omitted because of estimability. note: 0o.rp#1o.vag_disch omitted because of estimability. note: 1o.rp#0o.vag_disch omitted because of estimability. Wald test, begin with full model: p = 0.9745 >= 0.0500, removing 1.vag_disch p = 0.3571 >= 0.0500, removing parity p = 0.2028 >= 0.0500, removing mwp p = 0.1144 >= 0.0500, removing 1.rp p = 0.1047 >= 0.0500, removing 1.twin#1.dyst p = 0.0759 >= 0.0500, removing 1.dyst Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(4, 1569) = 31.17 Model | 979.60429 4 244.901072 Prob > F = 0.0000 Residual | 12326.6626 1,569 7.85638149 R-squared = 0.0736 -------------+---------------------------------- Adj R-squared = 0.0713 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.8029 ----------------------------------------------------------------------------------------- rootwpc | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------------+---------------------------------------------------------------- rp#vag_disch | yes#yes | 1.928071 .5207635 3.70 0.000 .9066051 2.949537 | herd_size | -.0207276 .0083839 -2.47 0.014 -.0371725 -.0042827 | c.herd_size#c.herd_size | .0000657 .0000173 3.80 0.000 .0000318 .0000997 | twin | yes | 1.413782 .5492526 2.57 0.010 .3364358 2.491129 _cons | 8.516599 .9733102 8.75 0.000 6.607473 10.42572 ----------------------------------------------------------------------------------------- . * avoid splitting terms from a combined effect by (), always include first term . stepwise, pe(0.049999) pr(0.05) lockterm1: regress rootwpc parity (c.herd_size##c.herd_size) c.mwp (i.t > win##i.dyst) (i.rp##vag_disch) // now twin#dyst stays in although non-significant note: 0b.twin omitted because of estimability. note: 0b.dyst omitted because of estimability. note: 0b.twin#0b.dyst omitted because of estimability. note: 0b.twin#1o.dyst omitted because of estimability. note: 1o.twin#0b.dyst omitted because of estimability. note: 0b.rp omitted because of estimability. note: 0b.vag_disch omitted because of estimability. note: 0b.rp#0b.vag_disch omitted because of estimability. note: 0b.rp#1o.vag_disch omitted because of estimability. note: 1o.rp#0b.vag_disch omitted because of estimability. Wald test, begin with full model: p = 0.2194 >= 0.0500, removing mwp Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(9, 1564) = 14.92 Model | 1052.07609 9 116.897343 Prob > F = 0.0000 Residual | 12254.1908 1,564 7.83516033 R-squared = 0.0791 -------------+---------------------------------- Adj R-squared = 0.0738 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.7991 ----------------------------------------------------------------------------------------- rootwpc | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------------+---------------------------------------------------------------- parity | .0471693 .0481253 0.98 0.327 -.0472276 .1415661 herd_size | -.0224056 .0084408 -2.65 0.008 -.0389621 -.0058491 | c.herd_size#c.herd_size | .0000696 .0000175 3.99 0.000 .0000354 .0001039 | rp | yes | .4330348 .26975 1.61 0.109 -.096075 .9621446 | vag_disch | yes | .0135432 .4019532 0.03 0.973 -.7748808 .8019672 | rp#vag_disch | yes#yes | 1.46987 .7023592 2.09 0.037 .0922056 2.847535 | twin | yes | 1.603972 .5858987 2.74 0.006 .4547421 2.753202 | dyst | yes | .6309143 .3111981 2.03 0.043 .0205049 1.241324 | twin#dyst | yes#yes | -2.795651 1.745789 -1.60 0.109 -6.219985 .6286827 | _cons | 8.478238 .9899599 8.56 0.000 6.536449 10.42003 ----------------------------------------------------------------------------------------- . stepwise, pe(0.049999) pr(0.05) lockterm1: regress rootwpc parity (c.herd_size##c.herd_size) c.mwp i.tw > in##i.dyst (i.rp##vag_disch) // the model we already had note: 0b.twin omitted because of estimability. note: 0b.dyst omitted because of estimability. note: 0o.twin#0o.dyst omitted because of estimability. note: 0o.twin#1o.dyst omitted because of estimability. note: 1o.twin#0o.dyst omitted because of estimability. note: 0b.rp omitted because of estimability. note: 0b.vag_disch omitted because of estimability. note: 0b.rp#0b.vag_disch omitted because of estimability. note: 0b.rp#1o.vag_disch omitted because of estimability. note: 1o.rp#0b.vag_disch omitted because of estimability. Wald test, begin with full model: p = 0.2194 >= 0.0500, removing mwp p = 0.1095 >= 0.0500, removing 1.twin#1.dyst p = 0.0759 >= 0.0500, removing 1.dyst Source | SS df MS Number of obs = 1,574 -------------+---------------------------------- F(7, 1566) = 18.32 Model | 1007.23309 7 143.890441 Prob > F = 0.0000 Residual | 12299.0338 1,566 7.85378911 R-squared = 0.0757 -------------+---------------------------------- Adj R-squared = 0.0716 Total | 13306.2668 1,573 8.45916519 Root MSE = 2.8025 ----------------------------------------------------------------------------------------- rootwpc | Coefficient Std. err. t P>|t| [95% conf. interval] ------------------------+---------------------------------------------------------------- parity | .0384512 .047746 0.81 0.421 -.0552016 .1321039 herd_size | -.0213327 .0084356 -2.53 0.012 -.037879 -.0047863 | c.herd_size#c.herd_size | .000067 .0000174 3.84 0.000 .0000328 .0001011 | rp | yes | .4636612 .2694853 1.72 0.086 -.0649288 .9922512 | vag_disch | yes | .1276652 .399162 0.32 0.749 -.6552831 .9106136 | rp#vag_disch | yes#yes | 1.377465 .700519 1.97 0.049 .0034113 2.751519 | twin | yes | 1.309327 .5523056 2.37 0.018 .225991 2.392664 _cons | 8.442287 .9910022 8.52 0.000 6.498456 10.38612 ----------------------------------------------------------------------------------------- .