-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      name:  <unnamed>
       log:  F:\vhm812-data\l3a_model_build_II.log
  log type:  text
 opened on:  19 Jan 2015, 13:38:05

. set more off

. 
. * open the DAISY Red dataset
. use daisy2red.dta, clear

. gen month=month(calv_dt)

. gen aut_calv=(month>=2 & month<=7) 

. gen hs_ct=herd_size-251

. gen hs_sq=herd_size^2

. gen parity1=parity-1

. gen milk120k=milk120/1000
(38 missing values generated)

. gen wpc_sqrt=sqrt(wpc)

. 
. * specifying maximum model
. * no analyses
. 
. * causal model
. * no analyses
. 
. * Reducing the Number of Predictors
. * descriptive statistics
. codebook cf vag_disch

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
cf                                                                                                                                            Calving to first service interval
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                  type:  numeric (float)

                 range:  [21,240]                     units:  1
         unique values:  129                      missing .:  14/1574

                  mean:   71.3115
              std. dev:   21.9039

           percentiles:        10%       25%       50%       75%       90%
                                51        59        67        78        96

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vag_disch                                                                                                                                            Vaginal discharge observed
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                  type:  numeric (float)
                 label:  noyes

                 range:  [0,1]                        units:  1
         unique values:  2                        missing .:  0/1574

            tabulation:  Freq.   Numeric  Label
                          1492         0  no
                            82         1  yes

. sum  milk120 parity herd_size dyst rp vag_disch, detail

       Milk volume (l) in first 120 days of lactation
-------------------------------------------------------------
      Percentiles      Smallest
 1%       1698.5         1110.2
 5%       2077.2         1397.7
10%       2298.2         1461.5       Obs                1536
25%      2731.95           1467       Sum of Wgt.        1536

50%      3215.25                      Mean           3215.096
                        Largest       Std. Dev.      698.1316
75%       3682.1         5181.8
90%       4080.7           5278       Variance       487387.7
95%       4403.3         5399.9       Skewness       .1101838
99%       4904.4         5630.3       Kurtosis       2.845637

                      Lactation number
-------------------------------------------------------------
      Percentiles      Smallest
 1%            1              1
 5%            1              1
10%            1              1       Obs                1574
25%            1              1       Sum of Wgt.        1574

50%            2                      Mean           2.729987
                        Largest       Std. Dev.      1.493841
75%            4              7
90%            5              7       Variance        2.23156
95%            5              7       Skewness       .5450922
99%            6              7       Kurtosis       2.315593

                          Herd size
-------------------------------------------------------------
      Percentiles      Smallest
 1%          125            125
 5%          125            125
10%          185            125       Obs                1574
25%          201            125       Sum of Wgt.        1574

50%          263                      Mean           251.0076
                        Largest       Std. Dev.      62.01692
75%          294            333
90%          333            333       Variance       3846.098
95%          333            333       Skewness      -.3550929
99%          333            333       Kurtosis       2.256969

                     Dystocia at calving
-------------------------------------------------------------
      Percentiles      Smallest
 1%            0              0
 5%            0              0
10%            0              0       Obs                1574
25%            0              0       Sum of Wgt.        1574

50%            0                      Mean           .0597205
                        Largest       Std. Dev.      .2370435
75%            0              1
90%            0              1       Variance       .0561896
95%            1              1       Skewness       3.715938
99%            1              1       Kurtosis       14.80819

                Retained placenta at calving
-------------------------------------------------------------
      Percentiles      Smallest
 1%            0              0
 5%            0              0
10%            0              0       Obs                1574
25%            0              0       Sum of Wgt.        1574

50%            0                      Mean           .0946633
                        Largest       Std. Dev.      .2928423
75%            0              1
90%            0              1       Variance       .0857566
95%            1              1       Skewness       2.769173
99%            1              1       Kurtosis        8.66832

                 Vaginal discharge observed
-------------------------------------------------------------
      Percentiles      Smallest
 1%            0              0
 5%            0              0
10%            0              0       Obs                1574
25%            0              0       Sum of Wgt.        1574

50%            0                      Mean           .0520966
                        Largest       Std. Dev.      .2222924
75%            0              1
90%            0              1       Variance       .0494139
95%            1              1       Skewness       4.031139
99%            1              1       Kurtosis       17.25008

. 
. * correlation
. corr  milk120 parity herd_size
(obs=1536)

             |  milk120   parity herd_s~e
-------------+---------------------------
     milk120 |   1.0000
      parity |   0.3821   1.0000
   herd_size |  -0.0433   0.0356   1.0000


. pwcorr milk120 parity herd_size, obs star(0.05)

             |  milk120   parity herd_s~e
-------------+---------------------------
     milk120 |   1.0000 
             |     1536
             |
      parity |   0.3821*  1.0000 
             |     1536     1574
             |
   herd_size |  -0.0433   0.0386   1.0000 
             |     1536     1574     1574
             |

. 
. * indices
. * no analyses
. 
. * unconditional associations
. reg cf parity 

      Source |       SS       df       MS              Number of obs =    1560
-------------+------------------------------           F(  1,  1558) =    1.46
       Model |  699.698494     1  699.698494           Prob > F      =  0.2273
    Residual |  747280.894  1558  479.641139           R-squared     =  0.0009
-------------+------------------------------           Adj R-squared =  0.0003
       Total |  747980.592  1559  479.782291           Root MSE      =  21.901

------------------------------------------------------------------------------
          cf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      parity |  -.4480144   .3709323    -1.21   0.227    -1.175594    .2795649
       _cons |   72.53554   1.155186    62.79   0.000     70.26965    74.80142
------------------------------------------------------------------------------

. reg cf vag_disch

      Source |       SS       df       MS              Number of obs =    1560
-------------+------------------------------           F(  1,  1558) =    0.49
       Model |  236.166881     1  236.166881           Prob > F      =  0.4831
    Residual |  747744.425  1558  479.938656           R-squared     =  0.0003
-------------+------------------------------           Adj R-squared = -0.0003
       Total |  747980.592  1559  479.782291           Root MSE      =  21.908

------------------------------------------------------------------------------
          cf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   vag_disch |   1.743523   2.485484     0.70   0.483    -3.131724     6.61877
       _cons |   71.21989   .5698436   124.98   0.000     70.10215    72.33763
------------------------------------------------------------------------------

. 
. * principle components / factor analysis / correspondence analysis
. * not covered
. 
. * Functional Form of predictors
. * residual plots
. reg cf milk120

      Source |       SS       df       MS              Number of obs =    1525
-------------+------------------------------           F(  1,  1523) =    0.72
       Model |  329.282002     1  329.282002           Prob > F      =  0.3977
    Residual |  700869.964  1523   460.19039           R-squared     =  0.0005
-------------+------------------------------           Adj R-squared = -0.0002
       Total |  701199.246  1524  460.104492           Root MSE      =  21.452

------------------------------------------------------------------------------
          cf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     milk120 |   .0006667   .0007881     0.85   0.398    -.0008793    .0022126
       _cons |   68.95612   2.595207    26.57   0.000     63.86556    74.04667
------------------------------------------------------------------------------

. predict stdres, rstandar
(49 missing values generated)

. 
. * lowess smoother
. twoway (scatter stdres milk120) (lowess stdres milk120)

. 
. * Detecting and Correcting for non-linearity (transformation of X)
. * categorization of predictor
. reg cf parity

      Source |       SS       df       MS              Number of obs =    1560
-------------+------------------------------           F(  1,  1558) =    1.46
       Model |  699.698494     1  699.698494           Prob > F      =  0.2273
    Residual |  747280.894  1558  479.641139           R-squared     =  0.0009
-------------+------------------------------           Adj R-squared =  0.0003
       Total |  747980.592  1559  479.782291           Root MSE      =  21.901

------------------------------------------------------------------------------
          cf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      parity |  -.4480144   .3709323    -1.21   0.227    -1.175594    .2795649
       _cons |   72.53554   1.155186    62.79   0.000     70.26965    74.80142
------------------------------------------------------------------------------

. egen parity_c6=cut(parity), at(0 1 2 3 4 5 6 15) icodes

. tab parity parity_c6

 Lactation |                             parity_c6
    number |         1          2          3          4          5          6 |     Total
-----------+------------------------------------------------------------------+----------
         1 |       417          0          0          0          0          0 |       417 
         2 |         0        374          0          0          0          0 |       374 
         3 |         0          0        319          0          0          0 |       319 
         4 |         0          0          0        222          0          0 |       222 
         5 |         0          0          0          0        169          0 |       169 
         6 |         0          0          0          0          0         69 |        69 
         7 |         0          0          0          0          0          4 |         4 
-----------+------------------------------------------------------------------+----------
     Total |       417        374        319        222        169         73 |     1,574 


. reg cf i.parity_c6

      Source |       SS       df       MS              Number of obs =    1560
-------------+------------------------------           F(  5,  1554) =    0.91
       Model |  2174.44893     5  434.889785           Prob > F      =  0.4761
    Residual |  745806.143  1554  479.926733           R-squared     =  0.0029
-------------+------------------------------           Adj R-squared = -0.0003
       Total |  747980.592  1559  479.782291           Root MSE      =  21.907

------------------------------------------------------------------------------
          cf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   parity_c6 |
          2  |  -.6468817   1.568168    -0.41   0.680    -3.722829    2.429066
          3  |  -3.181369   1.635853    -1.94   0.052    -6.390081     .027343
          4  |  -1.528531   1.831255    -0.83   0.404    -5.120523    2.063462
          5  |  -2.322236   2.004684    -1.16   0.247    -6.254406    1.609935
          6  |  -.9349232   2.781437    -0.34   0.737    -6.390688    4.520841
             |
       _cons |   72.61985   1.077984    67.37   0.000      70.5054    74.73431
------------------------------------------------------------------------------

. 
. * quadratic function of X
. gen ln_cf=ln(cf)
(14 missing values generated)

. reg ln_cf c.milk120k##c.milk120k

      Source |       SS       df       MS              Number of obs =    1525
-------------+------------------------------           F(  2,  1522) =    4.78
       Model |  .709133443     2  .354566722           Prob > F      =  0.0085
    Residual |  112.944101  1522  .074207688           R-squared     =  0.0062
-------------+------------------------------           Adj R-squared =  0.0049
       Total |  113.653234  1524  .074575613           Root MSE      =  .27241

---------------------------------------------------------------------------------------
                ln_cf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------------+----------------------------------------------------------------
             milk120k |   .2056932   .0697547     2.95   0.003     .0688677    .3425186
                      |
c.milk120k#c.milk120k |  -.0295122   .0105961    -2.79   0.005    -.0502966   -.0087277
                      |
                _cons |   3.883482   .1122207    34.61   0.000     3.663358    4.103605
---------------------------------------------------------------------------------------

. estat vif

    Variable |       VIF       1/VIF  
-------------+----------------------
    milk120k |     48.58    0.020585
  c.milk120k#|
  c.milk120k |     48.58    0.020585
-------------+----------------------
    Mean VIF |     48.58

. vce, corr

Correlation matrix of coefficients of regress model

             |           c.m~120k#          
        e(V) | milk120k  c.m~120k     _cons 
-------------+------------------------------
    milk120k |   1.0000                     
  c.milk120k#|                              
  c.milk120k |  -0.9897    1.0000           
       _cons |  -0.9872    0.9559    1.0000 

. 
. * redoing the analysis with milk120 centred
. summ milk120k

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
    milk120k |      1536    3.215096    .6981316     1.1102     5.6303

. gen m120k_ct=(milk120k-r(mean)) /*r(mean) - memory variable created by summ command that store the mean of the variable*/
(38 missing values generated)

. reg ln_cf c.m120k_ct##c.m120k_ct

      Source |       SS       df       MS              Number of obs =    1525
-------------+------------------------------           F(  2,  1522) =    4.78
       Model |  .709133424     2  .354566712           Prob > F      =  0.0085
    Residual |  112.944101  1522  .074207688           R-squared     =  0.0062
-------------+------------------------------           Adj R-squared =  0.0049
       Total |  113.653234  1524  .074575613           Root MSE      =  .27241

---------------------------------------------------------------------------------------
                ln_cf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------------+----------------------------------------------------------------
             m120k_ct |   .0159242   .0100484     1.58   0.113    -.0037859    .0356344
                      |
c.m120k_ct#c.m120k_ct |  -.0295122   .0105961    -2.79   0.005    -.0502966   -.0087277
                      |
                _cons |   4.239742   .0086679   489.13   0.000      4.22274    4.256745
---------------------------------------------------------------------------------------

. estat vif

    Variable |       VIF       1/VIF  
-------------+----------------------
    m120k_ct |      1.01    0.992010
  c.m120k_ct#|
  c.m120k_ct |      1.01    0.992010
-------------+----------------------
    Mean VIF |      1.01

. vce, corr

Correlation matrix of coefficients of regress model

             |           c.m120~t#          
        e(V) | m120k_ct  c.m120~t     _cons 
-------------+------------------------------
    m120k_ct |   1.0000                     
  c.m120k_ct#|                              
  c.m120k_ct |  -0.0894    1.0000           
       _cons |   0.0494   -0.5936    1.0000 

. 
. capture drop stdres

. predict stdres, rstandar
(49 missing values generated)

. twoway (scatter stdres m120k_ct) (lowess stdres m120k_ct)       

. 
. *box cox
. boxcox ln_cf milk120k , model(rhs)

Fitting full model

Iteration 0:   log likelihood = -183.07949  (not concave)
Iteration 1:   log likelihood = -180.48708  
Iteration 2:   log likelihood = -180.32784  
Iteration 3:   log likelihood =  -180.1167  
Iteration 4:   log likelihood = -179.88238  
Iteration 5:   log likelihood = -179.86958  
Iteration 6:   log likelihood = -179.86958  
(38 missing values generated)
(38 missing values generated)
(38 missing values generated)

                                                  Number of obs   =       1525
                                                  LR chi2(2)      =       8.21
Log likelihood = -179.86958                       Prob > chi2     =      0.016
 
------------------------------------------------------------------------------
       ln_cf |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     /lambda |  -3.647194   1.174882    -3.10   0.002     -5.94992   -1.344468
------------------------------------------------------------------------------
 
Estimates of scale-variant parameters
----------------------------
             |      Coef.
-------------+--------------
Notrans      |
       _cons |    3.60132
-------------+--------------
Trans        |
    milk120k |   2.329791
-------------+--------------
      /sigma |   .2722618
----------------------------

---------------------------------------------------------
   Test         Restricted     LR statistic      P-value
    H0:       log likelihood       chi2       Prob > chi2
---------------------------------------------------------
lambda = -1      -181.5923         3.45           0.063
lambda =  0     -182.42186         5.10           0.024
lambda =  1     -183.07949         6.42           0.011
---------------------------------------------------------

. 
end of do-file

. do "C:\Users\javier\AppData\Local\Temp\STD01000000.tmp"

. gen inv_m120k=1/milk120k
(38 missing values generated)

. 
. reg ln_cf inv_m120k 

      Source |       SS       df       MS              Number of obs =    1525
-------------+------------------------------           F(  1,  1523) =    4.77
       Model |  .354674587     1  .354674587           Prob > F      =  0.0292
    Residual |   113.29856  1523    .0743917           R-squared     =  0.0031
-------------+------------------------------           Adj R-squared =  0.0025
       Total |  113.653234  1524  .074575613           Root MSE      =  .27275

------------------------------------------------------------------------------
       ln_cf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   inv_m120k |   -.190606   .0872939    -2.18   0.029     -.361835   -.0193771
       _cons |   4.287814   .0294011   145.84   0.000     4.230143    4.345485
------------------------------------------------------------------------------

. capture drop stdres

. predict stdres, rstandar
(49 missing values generated)

. twoway (scatter stdres milk120k) (lowess stdres milk120k)       

. 
end of do-file

. do "C:\Users\javier\AppData\Local\Temp\STD01000000.tmp"

. fp <milk120k>,  scale center replace: reg ln_cf <milk120k> 
(fitting 44 models)
(....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%)

Fractional polynomial comparisons:
-------------------------------------------------------------------------------
    milk120k |   df    Deviance  Res. s.d.   Dev. dif.   P(*)   Powers
-------------+-----------------------------------------------------------------
     omitted |    0    367.951      0.273     10.533    0.033               
      linear |    1    366.159      0.273      8.741    0.033   1           
       m = 1 |    2    361.396      0.273      3.978    0.138   -2          
       m = 2 |    4    357.418      0.272      0.000       --   -2 3        
-------------------------------------------------------------------------------
(*) P = sig. level of model with m = 2 based on F with 1520 denominator dof.

      Source |       SS       df       MS              Number of obs =    1525
-------------+------------------------------           F(  2,  1522) =    5.27
       Model |  .782289941     2  .391144971           Prob > F      =  0.0052
    Residual |  112.870944  1522  .074159622           R-squared     =  0.0069
-------------+------------------------------           Adj R-squared =  0.0056
       Total |  113.653234  1524  .074575613           Root MSE      =  .27232

------------------------------------------------------------------------------
       ln_cf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  milk120k_1 |  -.5301266   .1654544    -3.20   0.001    -.8546694   -.2055839
  milk120k_2 |  -.0008516   .0004271    -1.99   0.046    -.0016894   -.0000138
       _cons |   4.238434    .008295   510.96   0.000     4.222163    4.254704
------------------------------------------------------------------------------

. fp plot, r(none) ytitle(Predicted Ln(cf))

. 
end of do-file

. do "C:\Users\javier\AppData\Local\Temp\STD01000000.tmp"

. reg ln_cf milk120k_1 milk120k_2  

      Source |       SS       df       MS              Number of obs =    1525
-------------+------------------------------           F(  2,  1522) =    5.27
       Model |  .782289941     2  .391144971           Prob > F      =  0.0052
    Residual |  112.870944  1522  .074159622           R-squared     =  0.0069
-------------+------------------------------           Adj R-squared =  0.0056
       Total |  113.653234  1524  .074575613           Root MSE      =  .27232

------------------------------------------------------------------------------
       ln_cf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  milk120k_1 |  -.5301266   .1654544    -3.20   0.001    -.8546694   -.2055839
  milk120k_2 |  -.0008516   .0004271    -1.99   0.046    -.0016894   -.0000138
       _cons |   4.238434    .008295   510.96   0.000     4.222163    4.254704
------------------------------------------------------------------------------

. capture drop stdres

. predict stdres, rstandar
(49 missing values generated)

. twoway (scatter stdres milk120k) (lowess stdres milk120k)       

. 
end of do-file

. exit, clear
