CAFRI Labs: Shrubland 1.0: The Gang's All Here

Mike Mahoney

Evaluation Results

Last iteration of these models: 2022-01-12

Change Summary

Added a neural net to the ensemble
Retrained logistic model to include neural net
Other than the new model & ensemble, identical to version 0.0.1

Test Data

ROC Curves

Optimal Coordinates

Threshold values were chosen using the validation set (below) to optimize for a certain level of specificity.

	Probability Threshold	Specificity	Sensitivity
Optimize Both
Linear Ensemble	0.493	0.786	0.828
Neural Net	0.441	0.748	0.847
LGB	0.484	0.755	0.842
RF	0.517	0.746	0.752
90% Specificity
Linear Ensemble	0.759	0.909	0.637
Neural Net	0.754	0.903	0.597
LGB	0.730	0.910	0.622
RF	0.604	0.907	0.444
95% Specificity
Linear Ensemble	0.854	0.960	0.441
Neural Net	0.875	0.949	0.382
LGB	0.832	0.961	0.434
RF	0.637	0.950	0.331
97.5% Specificity
Linear Ensemble	0.891	0.983	0.277
Neural Net	0.935	0.977	0.247
LGB	0.882	0.984	0.298
RF	0.666	0.980	0.219
99% Specificity
Linear Ensemble	0.910	0.992	0.138
Neural Net	0.973	0.993	0.097
LGB	0.922	0.992	0.173
RF	0.691	0.993	0.110

Confusion Matrices

Optimize Both

Logistic Ensemble

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1315  286
         1  357 1376
                                          
               Accuracy : 0.8071          
                 95% CI : (0.7933, 0.8204)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.6143          
                                          
 Mcnemar's Test P-Value : 0.005771        
                                          
            Sensitivity : 0.8279          
            Specificity : 0.7865          
         Pos Pred Value : 0.7940          
         Neg Pred Value : 0.8214          
             Prevalence : 0.4985          
         Detection Rate : 0.4127          
   Detection Prevalence : 0.5198          
      Balanced Accuracy : 0.8072          
                                          
       'Positive' Class : 1

Neural Net

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1251  254
         1  421 1408
                                          
               Accuracy : 0.7975          
                 95% CI : (0.7835, 0.8111)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.5952          
                                          
 Mcnemar's Test P-Value : 0.0000000001666 
                                          
            Sensitivity : 0.8472          
            Specificity : 0.7482          
         Pos Pred Value : 0.7698          
         Neg Pred Value : 0.8312          
             Prevalence : 0.4985          
         Detection Rate : 0.4223          
   Detection Prevalence : 0.5486          
      Balanced Accuracy : 0.7977          
                                          
       'Positive' Class : 1

LightGBM

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1262  262
         1  410 1400
                                          
               Accuracy : 0.7984          
                 95% CI : (0.7844, 0.8119)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.597           
                                          
 Mcnemar's Test P-Value : 0.00000001423   
                                          
            Sensitivity : 0.8424          
            Specificity : 0.7548          
         Pos Pred Value : 0.7735          
         Neg Pred Value : 0.8281          
             Prevalence : 0.4985          
         Detection Rate : 0.4199          
   Detection Prevalence : 0.5429          
      Balanced Accuracy : 0.7986          
                                          
       'Positive' Class : 1

Random Forest

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1248  412
         1  424 1250
                                          
               Accuracy : 0.7493          
                 95% CI : (0.7342, 0.7639)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.4985          
                                          
 Mcnemar's Test P-Value : 0.7036          
                                          
            Sensitivity : 0.7521          
            Specificity : 0.7464          
         Pos Pred Value : 0.7467          
         Neg Pred Value : 0.7518          
             Prevalence : 0.4985          
         Detection Rate : 0.3749          
   Detection Prevalence : 0.5021          
      Balanced Accuracy : 0.7493          
                                          
       'Positive' Class : 1

90% Specificity

Logistic Ensemble

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1520  603
         1  152 1059
                                         
               Accuracy : 0.7735         
                 95% CI : (0.759, 0.7877)
    No Information Rate : 0.5015         
    P-Value [Acc > NIR] : < 2.2e-16      
                                         
                  Kappa : 0.5467         
                                         
 Mcnemar's Test P-Value : < 2.2e-16      
                                         
            Sensitivity : 0.6372         
            Specificity : 0.9091         
         Pos Pred Value : 0.8745         
         Neg Pred Value : 0.7160         
             Prevalence : 0.4985         
         Detection Rate : 0.3176         
   Detection Prevalence : 0.3632         
      Balanced Accuracy : 0.7731         
                                         
       'Positive' Class : 1

Neural Net

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1509  670
         1  163  992
                                          
               Accuracy : 0.7501          
                 95% CI : (0.7351, 0.7648)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.4998          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.5969          
            Specificity : 0.9025          
         Pos Pred Value : 0.8589          
         Neg Pred Value : 0.6925          
             Prevalence : 0.4985          
         Detection Rate : 0.2975          
   Detection Prevalence : 0.3464          
      Balanced Accuracy : 0.7497          
                                          
       'Positive' Class : 1

LightGBM

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1521  628
         1  151 1034
                                          
               Accuracy : 0.7663          
                 95% CI : (0.7516, 0.7806)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.5323          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.6221          
            Specificity : 0.9097          
         Pos Pred Value : 0.8726          
         Neg Pred Value : 0.7078          
             Prevalence : 0.4985          
         Detection Rate : 0.3101          
   Detection Prevalence : 0.3554          
      Balanced Accuracy : 0.7659          
                                          
       'Positive' Class : 1

Random Forest

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1516  924
         1  156  738
                                          
               Accuracy : 0.6761          
                 95% CI : (0.6599, 0.6919)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.3512          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.4440          
            Specificity : 0.9067          
         Pos Pred Value : 0.8255          
         Neg Pred Value : 0.6213          
             Prevalence : 0.4985          
         Detection Rate : 0.2214          
   Detection Prevalence : 0.2681          
      Balanced Accuracy : 0.6754          
                                          
       'Positive' Class : 1

95% Specificity

Logistic Ensemble

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1605  929
         1   67  733
                                          
               Accuracy : 0.7013          
                 95% CI : (0.6854, 0.7168)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.4016          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.4410          
            Specificity : 0.9599          
         Pos Pred Value : 0.9162          
         Neg Pred Value : 0.6334          
             Prevalence : 0.4985          
         Detection Rate : 0.2199          
   Detection Prevalence : 0.2400          
      Balanced Accuracy : 0.7005          
                                          
       'Positive' Class : 1

Neural Net

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1587 1027
         1   85  635
                                          
               Accuracy : 0.6665          
                 95% CI : (0.6502, 0.6825)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.3318          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.3821          
            Specificity : 0.9492          
         Pos Pred Value : 0.8819          
         Neg Pred Value : 0.6071          
             Prevalence : 0.4985          
         Detection Rate : 0.1905          
   Detection Prevalence : 0.2160          
      Balanced Accuracy : 0.6656          
                                          
       'Positive' Class : 1

LightGBM

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1607  940
         1   65  722
                                          
               Accuracy : 0.6986          
                 95% CI : (0.6827, 0.7141)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.3962          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.4344          
            Specificity : 0.9611          
         Pos Pred Value : 0.9174          
         Neg Pred Value : 0.6309          
             Prevalence : 0.4985          
         Detection Rate : 0.2166          
   Detection Prevalence : 0.2361          
      Balanced Accuracy : 0.6978          
                                          
       'Positive' Class : 1

Random Forest

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1589 1112
         1   83  550
                                         
               Accuracy : 0.6416         
                 95% CI : (0.625, 0.6579)
    No Information Rate : 0.5015         
    P-Value [Acc > NIR] : < 2.2e-16      
                                         
                  Kappa : 0.2818         
                                         
 Mcnemar's Test P-Value : < 2.2e-16      
                                         
            Sensitivity : 0.3309         
            Specificity : 0.9504         
         Pos Pred Value : 0.8689         
         Neg Pred Value : 0.5883         
             Prevalence : 0.4985         
         Detection Rate : 0.1650         
   Detection Prevalence : 0.1899         
      Balanced Accuracy : 0.6406         
                                         
       'Positive' Class : 1

97.5% Specificity

Logistic Ensemble

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1644 1202
         1   28  460
                                          
               Accuracy : 0.6311          
                 95% CI : (0.6144, 0.6475)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.2606          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.2768          
            Specificity : 0.9833          
         Pos Pred Value : 0.9426          
         Neg Pred Value : 0.5777          
             Prevalence : 0.4985          
         Detection Rate : 0.1380          
   Detection Prevalence : 0.1464          
      Balanced Accuracy : 0.6300          
                                          
       'Positive' Class : 1

Neural Net

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1633 1251
         1   39  411
                                          
               Accuracy : 0.6131          
                 95% CI : (0.5963, 0.6297)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.2245          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.2473          
            Specificity : 0.9767          
         Pos Pred Value : 0.9133          
         Neg Pred Value : 0.5662          
             Prevalence : 0.4985          
         Detection Rate : 0.1233          
   Detection Prevalence : 0.1350          
      Balanced Accuracy : 0.6120          
                                          
       'Positive' Class : 1

LightGBM

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1645 1166
         1   27  496
                                          
               Accuracy : 0.6422          
                 95% CI : (0.6256, 0.6585)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.2829          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.2984          
            Specificity : 0.9839          
         Pos Pred Value : 0.9484          
         Neg Pred Value : 0.5852          
             Prevalence : 0.4985          
         Detection Rate : 0.1488          
   Detection Prevalence : 0.1569          
      Balanced Accuracy : 0.6411          
                                          
       'Positive' Class : 1

Random Forest

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1638 1298
         1   34  364
                                          
               Accuracy : 0.6005          
                 95% CI : (0.5836, 0.6172)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.1991          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.2190          
            Specificity : 0.9797          
         Pos Pred Value : 0.9146          
         Neg Pred Value : 0.5579          
             Prevalence : 0.4985          
         Detection Rate : 0.1092          
   Detection Prevalence : 0.1194          
      Balanced Accuracy : 0.5993          
                                          
       'Positive' Class : 1

99% Specificity

Logistic Ensemble

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1659 1433
         1   13  229
                                             
               Accuracy : 0.5663             
                 95% CI : (0.5493, 0.5832)   
    No Information Rate : 0.5015             
    P-Value [Acc > NIR] : 0.00000000000003841
                                             
                  Kappa : 0.1303             
                                             
 Mcnemar's Test P-Value : < 2.2e-16          
                                             
            Sensitivity : 0.13779            
            Specificity : 0.99222            
         Pos Pred Value : 0.94628            
         Neg Pred Value : 0.53655            
             Prevalence : 0.49850            
         Detection Rate : 0.06869            
   Detection Prevalence : 0.07259            
      Balanced Accuracy : 0.56501            
                                             
       'Positive' Class : 1

Neural Net

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1661 1500
         1   11  162
                                          
               Accuracy : 0.5468          
                 95% CI : (0.5297, 0.5638)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : 0.000000091     
                                          
                  Kappa : 0.0911          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.09747         
            Specificity : 0.99342         
         Pos Pred Value : 0.93642         
         Neg Pred Value : 0.52547         
             Prevalence : 0.49850         
         Detection Rate : 0.04859         
   Detection Prevalence : 0.05189         
      Balanced Accuracy : 0.54545         
                                          
       'Positive' Class : 1

LightGBM

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1658 1374
         1   14  288
                                          
               Accuracy : 0.5837          
                 95% CI : (0.5667, 0.6005)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.1653          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.17329         
            Specificity : 0.99163         
         Pos Pred Value : 0.95364         
         Neg Pred Value : 0.54683         
             Prevalence : 0.49850         
         Detection Rate : 0.08638         
   Detection Prevalence : 0.09058         
      Balanced Accuracy : 0.58246         
                                          
       'Positive' Class : 1

Random Forest

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1660 1479
         1   12  183
                                          
               Accuracy : 0.5528          
                 95% CI : (0.5357, 0.5698)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : 0.000000001697  
                                          
                  Kappa : 0.1032          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.11011         
            Specificity : 0.99282         
         Pos Pred Value : 0.93846         
         Neg Pred Value : 0.52883         
             Prevalence : 0.49850         
         Detection Rate : 0.05489         
   Detection Prevalence : 0.05849         
      Balanced Accuracy : 0.55147         
                                          
       'Positive' Class : 1

Validation Data

ROC Curves

Optimal Coordinates

	Probability Threshold	Specificity	Sensitivity
Optimize Both
Linear Ensemble	0.493	0.782	0.824
Neural Net	0.441	0.732	0.837
LGB	0.484	0.746	0.836
RF	0.517	0.723	0.753
90% Specificity
Linear Ensemble	0.759	0.900	0.612
Neural Net	0.754	0.900	0.574
LGB	0.730	0.900	0.604
RF	0.604	0.900	0.449
95% Specificity
Linear Ensemble	0.854	0.950	0.417
Neural Net	0.875	0.950	0.384
LGB	0.832	0.950	0.417
RF	0.637	0.950	0.333
97.5% Specificity
Linear Ensemble	0.891	0.975	0.267
Neural Net	0.935	0.975	0.245
LGB	0.882	0.975	0.301
RF	0.666	0.975	0.216
99% Specificity
Linear Ensemble	0.910	0.990	0.155
Neural Net	0.973	0.990	0.095
LGB	0.922	0.990	0.186
RF	0.691	0.990	0.112

Metadata

Data

16666 observations
- 9999 training
- 3333 validation
- 3334 testing
19 predictors
- tcb, tcw, tcg, nbr, mag, yod, nys_precip, nys_tmax, nys_tmin, nys_aspect, nys_dem, nys_slope, nys_twi, lcsec_X2, lcsec_X3, lcsec_X4, lcsec_X5, lcsec_X8, lcsec_X6

Models

Tuning used 5-fold CV
Final models:

Logistic Regression


Call:  glm(formula = shrub ~ ., family = "binomial", data = validation)

Coefficients:
 (Intercept)           tcb           tcw           tcg           nbr  
-3.968665633   0.000235535   0.000145283   0.000065578  -0.000321203  
         mag           yod    nys_precip      nys_tmax      nys_tmin  
-0.000839042   0.000104765   0.000005128   0.082880178  -0.117791981  
  nys_aspect       nys_dem     nys_slope       nys_twi      lcsec_X2  
 0.000106479  -0.000528244  -0.001397095  -0.058290832  -0.093505410  
    lcsec_X3      lcsec_X4      lcsec_X5      lcsec_X8      lcsec_X6  
 0.217207780   0.140594098   0.195246572   0.531440060  11.187723895  
         lgb            rf          nnet  
 3.723719156  -0.317407053   2.203063234  

Degrees of Freedom: 3332 Total (i.e. Null);  3310 Residual
Null Deviance:      4621 
Residual Deviance: 2928     AIC: 2974

Neural net

Model
Model: "sequential"
______________________________________________________________________
 Layer (type)                  Output Shape                Param #    
======================================================================
 dense_features (DenseFeatures  multiple                   0          
 )                                                                    
                                                                      
 dense_5 (Dense)               multiple                    5120       
                                                                      
 dense_4 (Dense)               multiple                    32896      
                                                                      
 dense_3 (Dense)               multiple                    8256       
                                                                      
 dense_2 (Dense)               multiple                    2080       
                                                                      
 dense_1 (Dense)               multiple                    528        
                                                                      
 dropout (Dropout)             multiple                    0          
                                                                      
 dense (Dense)                 multiple                    17         
                                                                      
======================================================================
Total params: 48,897
Trainable params: 48,897
Non-trainable params: 0
______________________________________________________________________

Random Forest

$num.trees
[1] 3000

$mtry
[1] 1

$min.node.size
[1] 6

$replace
[1] TRUE

$sample.fraction
[1] 0.2

$formula
shrub ~ .

LightGBM

$params
$params$learning_rate
[1] 0.01

$params$nrounds
[1] 2500

$params$num_leaves
[1] 14

$params$max_depth
[1] -1

$params$extra_trees
[1] FALSE

$params$min_data_in_leaf
[1] 10

$params$bagging_fraction
[1] 0.5

$params$bagging_freq
[1] 1

$params$feature_fraction
[1] 0.9

$params$min_data_in_bin
[1] 3

$params$lambda_l1
[1] 0

$params$lambda_l2
[1] 0.5

$params$force_col_wise
[1] TRUE

Shrubland 1.0: The Gang’s All Here

Evaluation Results

Change Summary

Test Data

ROC Curves

Optimal Coordinates

Confusion Matrices

Optimize Both

90% Specificity

95% Specificity

97.5% Specificity

99% Specificity

Validation Data

ROC Curves

Optimal Coordinates

Metadata

Data

Models

Logistic Regression

Neural net

Random Forest

LightGBM

Corrections

Citation