Shrubland 1.0: The Gang’s All Here

Including a neural net in the shrubland ensemble. 2022-01-15

Mike Mahoney true

Evaluation Results

Last iteration of these models: 2022-01-12

Change Summary

Test Data

ROC Curves

Optimal Coordinates

Threshold values were chosen using the validation set (below) to optimize for a certain level of specificity.

Probability Threshold Specificity Sensitivity
Optimize Both
Linear Ensemble 0.493 0.786 0.828
Neural Net 0.441 0.748 0.847
LGB 0.484 0.755 0.842
RF 0.517 0.746 0.752
90% Specificity
Linear Ensemble 0.759 0.909 0.637
Neural Net 0.754 0.903 0.597
LGB 0.730 0.910 0.622
RF 0.604 0.907 0.444
95% Specificity
Linear Ensemble 0.854 0.960 0.441
Neural Net 0.875 0.949 0.382
LGB 0.832 0.961 0.434
RF 0.637 0.950 0.331
97.5% Specificity
Linear Ensemble 0.891 0.983 0.277
Neural Net 0.935 0.977 0.247
LGB 0.882 0.984 0.298
RF 0.666 0.980 0.219
99% Specificity
Linear Ensemble 0.910 0.992 0.138
Neural Net 0.973 0.993 0.097
LGB 0.922 0.992 0.173
RF 0.691 0.993 0.110

Confusion Matrices

Optimize Both

Logistic Ensemble

Confusion Matrix and Statistics

Prediction    0    1
         0 1315  286
         1  357 1376
               Accuracy : 0.8071          
                 95% CI : (0.7933, 0.8204)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                  Kappa : 0.6143          
 Mcnemar's Test P-Value : 0.005771        
            Sensitivity : 0.8279          
            Specificity : 0.7865          
         Pos Pred Value : 0.7940          
         Neg Pred Value : 0.8214          
             Prevalence : 0.4985          
         Detection Rate : 0.4127          
   Detection Prevalence : 0.5198          
      Balanced Accuracy : 0.8072          
       'Positive' Class : 1               

Neural Net

Confusion Matrix and Statistics

Prediction    0    1
         0 1251  254
         1  421 1408
               Accuracy : 0.7975          
                 95% CI : (0.7835, 0.8111)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                  Kappa : 0.5952          
 Mcnemar's Test P-Value : 0.0000000001666 
            Sensitivity : 0.8472          
            Specificity : 0.7482          
         Pos Pred Value : 0.7698          
         Neg Pred Value : 0.8312          
             Prevalence : 0.4985          
         Detection Rate : 0.4223          
   Detection Prevalence : 0.5486          
      Balanced Accuracy : 0.7977          
       'Positive' Class : 1               


Confusion Matrix and Statistics

Prediction    0    1
         0 1262  262
         1  410 1400
               Accuracy : 0.7984          
                 95% CI : (0.7844, 0.8119)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                  Kappa : 0.597           
 Mcnemar's Test P-Value : 0.00000001423   
            Sensitivity : 0.8424          
            Specificity : 0.7548          
         Pos Pred Value : 0.7735          
         Neg Pred Value : 0.8281          
             Prevalence : 0.4985          
         Detection Rate : 0.4199          
   Detection Prevalence : 0.5429          
      Balanced Accuracy : 0.7986          
       'Positive' Class : 1               

Random Forest

Confusion Matrix and Statistics

Prediction    0    1
         0 1248  412
         1  424 1250
               Accuracy : 0.7493          
                 95% CI : (0.7342, 0.7639)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : <2e-16          
                  Kappa : 0.4985          
 Mcnemar's Test P-Value : 0.7036          
            Sensitivity : 0.7521          
            Specificity : 0.7464          
         Pos Pred Value : 0.7467          
         Neg Pred Value : 0.7518          
             Prevalence : 0.4985          
         Detection Rate : 0.3749          
   Detection Prevalence : 0.5021          
      Balanced Accuracy : 0.7493          
       'Positive' Class : 1               

90% Specificity

Logistic Ensemble

Confusion Matrix and Statistics

Prediction    0    1
         0 1520  603
         1  152 1059
               Accuracy : 0.7735         
                 95% CI : (0.759, 0.7877)
    No Information Rate : 0.5015         
    P-Value [Acc > NIR] : < 2.2e-16      
                  Kappa : 0.5467         
 Mcnemar's Test P-Value : < 2.2e-16      
            Sensitivity : 0.6372         
            Specificity : 0.9091         
         Pos Pred Value : 0.8745         
         Neg Pred Value : 0.7160         
             Prevalence : 0.4985         
         Detection Rate : 0.3176         
   Detection Prevalence : 0.3632         
      Balanced Accuracy : 0.7731         
       'Positive' Class : 1              

Neural Net

Confusion Matrix and Statistics

Prediction    0    1
         0 1509  670
         1  163  992
               Accuracy : 0.7501          
                 95% CI : (0.7351, 0.7648)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                  Kappa : 0.4998          
 Mcnemar's Test P-Value : < 2.2e-16       
            Sensitivity : 0.5969          
            Specificity : 0.9025          
         Pos Pred Value : 0.8589          
         Neg Pred Value : 0.6925          
             Prevalence : 0.4985          
         Detection Rate : 0.2975          
   Detection Prevalence : 0.3464          
      Balanced Accuracy : 0.7497          
       'Positive' Class : 1               


Confusion Matrix and Statistics

Prediction    0    1
         0 1521  628
         1  151 1034
               Accuracy : 0.7663          
                 95% CI : (0.7516, 0.7806)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                  Kappa : 0.5323          
 Mcnemar's Test P-Value : < 2.2e-16       
            Sensitivity : 0.6221          
            Specificity : 0.9097          
         Pos Pred Value : 0.8726          
         Neg Pred Value : 0.7078          
             Prevalence : 0.4985          
         Detection Rate : 0.3101          
   Detection Prevalence : 0.3554          
      Balanced Accuracy : 0.7659          
       'Positive' Class : 1               

Random Forest

Confusion Matrix and Statistics

Prediction    0    1
         0 1516  924
         1  156  738
               Accuracy : 0.6761          
                 95% CI : (0.6599, 0.6919)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                  Kappa : 0.3512          
 Mcnemar's Test P-Value : < 2.2e-16       
            Sensitivity : 0.4440          
            Specificity : 0.9067          
         Pos Pred Value : 0.8255          
         Neg Pred Value : 0.6213          
             Prevalence : 0.4985          
         Detection Rate : 0.2214          
   Detection Prevalence : 0.2681          
      Balanced Accuracy : 0.6754          
       'Positive' Class : 1               

95% Specificity

Logistic Ensemble

Confusion Matrix and Statistics

Prediction    0    1
         0 1605  929
         1   67  733
               Accuracy : 0.7013          
                 95% CI : (0.6854, 0.7168)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                  Kappa : 0.4016          
 Mcnemar's Test P-Value : < 2.2e-16       
            Sensitivity : 0.4410          
            Specificity : 0.9599          
         Pos Pred Value : 0.9162          
         Neg Pred Value : 0.6334          
             Prevalence : 0.4985          
         Detection Rate : 0.2199          
   Detection Prevalence : 0.2400          
      Balanced Accuracy : 0.7005          
       'Positive' Class : 1               

Neural Net

Confusion Matrix and Statistics

Prediction    0    1
         0 1587 1027
         1   85  635
               Accuracy : 0.6665          
                 95% CI : (0.6502, 0.6825)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                  Kappa : 0.3318          
 Mcnemar's Test P-Value : < 2.2e-16       
            Sensitivity : 0.3821          
            Specificity : 0.9492          
         Pos Pred Value : 0.8819          
         Neg Pred Value : 0.6071          
             Prevalence : 0.4985          
         Detection Rate : 0.1905          
   Detection Prevalence : 0.2160          
      Balanced Accuracy : 0.6656          
       'Positive' Class : 1               


Confusion Matrix and Statistics

Prediction    0    1
         0 1607  940
         1   65  722
               Accuracy : 0.6986          
                 95% CI : (0.6827, 0.7141)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                  Kappa : 0.3962          
 Mcnemar's Test P-Value : < 2.2e-16       
            Sensitivity : 0.4344          
            Specificity : 0.9611          
         Pos Pred Value : 0.9174          
         Neg Pred Value : 0.6309          
             Prevalence : 0.4985          
         Detection Rate : 0.2166          
   Detection Prevalence : 0.2361          
      Balanced Accuracy : 0.6978          
       'Positive' Class : 1               

Random Forest

Confusion Matrix and Statistics

Prediction    0    1
         0 1589 1112
         1   83  550
               Accuracy : 0.6416         
                 95% CI : (0.625, 0.6579)
    No Information Rate : 0.5015         
    P-Value [Acc > NIR] : < 2.2e-16      
                  Kappa : 0.2818         
 Mcnemar's Test P-Value : < 2.2e-16      
            Sensitivity : 0.3309         
            Specificity : 0.9504         
         Pos Pred Value : 0.8689         
         Neg Pred Value : 0.5883         
             Prevalence : 0.4985         
         Detection Rate : 0.1650         
   Detection Prevalence : 0.1899         
      Balanced Accuracy : 0.6406         
       'Positive' Class : 1              

97.5% Specificity

Logistic Ensemble

Confusion Matrix and Statistics

Prediction    0    1
         0 1644 1202
         1   28  460
               Accuracy : 0.6311          
                 95% CI : (0.6144, 0.6475)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                  Kappa : 0.2606          
 Mcnemar's Test P-Value : < 2.2e-16       
            Sensitivity : 0.2768          
            Specificity : 0.9833          
         Pos Pred Value : 0.9426          
         Neg Pred Value : 0.5777          
             Prevalence : 0.4985          
         Detection Rate : 0.1380          
   Detection Prevalence : 0.1464          
      Balanced Accuracy : 0.6300          
       'Positive' Class : 1               

Neural Net

Confusion Matrix and Statistics

Prediction    0    1
         0 1633 1251
         1   39  411
               Accuracy : 0.6131          
                 95% CI : (0.5963, 0.6297)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                  Kappa : 0.2245          
 Mcnemar's Test P-Value : < 2.2e-16       
            Sensitivity : 0.2473          
            Specificity : 0.9767          
         Pos Pred Value : 0.9133          
         Neg Pred Value : 0.5662          
             Prevalence : 0.4985          
         Detection Rate : 0.1233          
   Detection Prevalence : 0.1350          
      Balanced Accuracy : 0.6120          
       'Positive' Class : 1               


Confusion Matrix and Statistics

Prediction    0    1
         0 1645 1166
         1   27  496
               Accuracy : 0.6422          
                 95% CI : (0.6256, 0.6585)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                  Kappa : 0.2829          
 Mcnemar's Test P-Value : < 2.2e-16       
            Sensitivity : 0.2984          
            Specificity : 0.9839          
         Pos Pred Value : 0.9484          
         Neg Pred Value : 0.5852          
             Prevalence : 0.4985          
         Detection Rate : 0.1488          
   Detection Prevalence : 0.1569          
      Balanced Accuracy : 0.6411          
       'Positive' Class : 1               

Random Forest

Confusion Matrix and Statistics

Prediction    0    1
         0 1638 1298
         1   34  364
               Accuracy : 0.6005          
                 95% CI : (0.5836, 0.6172)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                  Kappa : 0.1991          
 Mcnemar's Test P-Value : < 2.2e-16       
            Sensitivity : 0.2190          
            Specificity : 0.9797          
         Pos Pred Value : 0.9146          
         Neg Pred Value : 0.5579          
             Prevalence : 0.4985          
         Detection Rate : 0.1092          
   Detection Prevalence : 0.1194          
      Balanced Accuracy : 0.5993          
       'Positive' Class : 1               

99% Specificity

Logistic Ensemble

Confusion Matrix and Statistics

Prediction    0    1
         0 1659 1433
         1   13  229
               Accuracy : 0.5663             
                 95% CI : (0.5493, 0.5832)   
    No Information Rate : 0.5015             
    P-Value [Acc > NIR] : 0.00000000000003841
                  Kappa : 0.1303             
 Mcnemar's Test P-Value : < 2.2e-16          
            Sensitivity : 0.13779            
            Specificity : 0.99222            
         Pos Pred Value : 0.94628            
         Neg Pred Value : 0.53655            
             Prevalence : 0.49850            
         Detection Rate : 0.06869            
   Detection Prevalence : 0.07259            
      Balanced Accuracy : 0.56501            
       'Positive' Class : 1                  

Neural Net

Confusion Matrix and Statistics

Prediction    0    1
         0 1661 1500
         1   11  162
               Accuracy : 0.5468          
                 95% CI : (0.5297, 0.5638)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : 0.000000091     
                  Kappa : 0.0911          
 Mcnemar's Test P-Value : < 2.2e-16       
            Sensitivity : 0.09747         
            Specificity : 0.99342         
         Pos Pred Value : 0.93642         
         Neg Pred Value : 0.52547         
             Prevalence : 0.49850         
         Detection Rate : 0.04859         
   Detection Prevalence : 0.05189         
      Balanced Accuracy : 0.54545         
       'Positive' Class : 1               


Confusion Matrix and Statistics

Prediction    0    1
         0 1658 1374
         1   14  288
               Accuracy : 0.5837          
                 95% CI : (0.5667, 0.6005)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                  Kappa : 0.1653          
 Mcnemar's Test P-Value : < 2.2e-16       
            Sensitivity : 0.17329         
            Specificity : 0.99163         
         Pos Pred Value : 0.95364         
         Neg Pred Value : 0.54683         
             Prevalence : 0.49850         
         Detection Rate : 0.08638         
   Detection Prevalence : 0.09058         
      Balanced Accuracy : 0.58246         
       'Positive' Class : 1               

Random Forest

Confusion Matrix and Statistics

Prediction    0    1
         0 1660 1479
         1   12  183
               Accuracy : 0.5528          
                 95% CI : (0.5357, 0.5698)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : 0.000000001697  
                  Kappa : 0.1032          
 Mcnemar's Test P-Value : < 2.2e-16       
            Sensitivity : 0.11011         
            Specificity : 0.99282         
         Pos Pred Value : 0.93846         
         Neg Pred Value : 0.52883         
             Prevalence : 0.49850         
         Detection Rate : 0.05489         
   Detection Prevalence : 0.05849         
      Balanced Accuracy : 0.55147         
       'Positive' Class : 1               

Validation Data

ROC Curves

Optimal Coordinates

Probability Threshold Specificity Sensitivity
Optimize Both
Linear Ensemble 0.493 0.782 0.824
Neural Net 0.441 0.732 0.837
LGB 0.484 0.746 0.836
RF 0.517 0.723 0.753
90% Specificity
Linear Ensemble 0.759 0.900 0.612
Neural Net 0.754 0.900 0.574
LGB 0.730 0.900 0.604
RF 0.604 0.900 0.449
95% Specificity
Linear Ensemble 0.854 0.950 0.417
Neural Net 0.875 0.950 0.384
LGB 0.832 0.950 0.417
RF 0.637 0.950 0.333
97.5% Specificity
Linear Ensemble 0.891 0.975 0.267
Neural Net 0.935 0.975 0.245
LGB 0.882 0.975 0.301
RF 0.666 0.975 0.216
99% Specificity
Linear Ensemble 0.910 0.990 0.155
Neural Net 0.973 0.990 0.095
LGB 0.922 0.990 0.186
RF 0.691 0.990 0.112




Logistic Regression

Call:  glm(formula = shrub ~ ., family = "binomial", data = validation)

 (Intercept)           tcb           tcw           tcg           nbr  
-3.968665633   0.000235535   0.000145283   0.000065578  -0.000321203  
         mag           yod    nys_precip      nys_tmax      nys_tmin  
-0.000839042   0.000104765   0.000005128   0.082880178  -0.117791981  
  nys_aspect       nys_dem     nys_slope       nys_twi      lcsec_X2  
 0.000106479  -0.000528244  -0.001397095  -0.058290832  -0.093505410  
    lcsec_X3      lcsec_X4      lcsec_X5      lcsec_X8      lcsec_X6  
 0.217207780   0.140594098   0.195246572   0.531440060  11.187723895  
         lgb            rf          nnet  
 3.723719156  -0.317407053   2.203063234  

Degrees of Freedom: 3332 Total (i.e. Null);  3310 Residual
Null Deviance:      4621 
Residual Deviance: 2928     AIC: 2974

Neural net

Model: "sequential"
 Layer (type)                  Output Shape                Param #    
 dense_features (DenseFeatures  multiple                   0          
 dense_5 (Dense)               multiple                    5120       
 dense_4 (Dense)               multiple                    32896      
 dense_3 (Dense)               multiple                    8256       
 dense_2 (Dense)               multiple                    2080       
 dense_1 (Dense)               multiple                    528        
 dropout (Dropout)             multiple                    0          
 dense (Dense)                 multiple                    17         
Total params: 48,897
Trainable params: 48,897
Non-trainable params: 0

Random Forest

[1] 3000

[1] 1

[1] 6

[1] TRUE

[1] 0.2

shrub ~ .


[1] 0.01

[1] 2500

[1] 14

[1] -1


[1] 10

[1] 0.5

[1] 1

[1] 0.9

[1] 3

[1] 0

[1] 0.5

[1] TRUE


If you see mistakes or want to suggest changes, please create an issue on the source repository.


For attribution, please cite this work as

Mahoney (2022, Jan. 15). CAFRI Labs: Shrubland 1.0: The Gang's All Here. Retrieved from

BibTeX citation

  author = {Mahoney, Mike},
  title = {CAFRI Labs: Shrubland 1.0: The Gang's All Here},
  url = {},
  year = {2022}