Shrubland 0.0.1: Making It Happen

The first iteration of shrubland model reporting. 2022-01-12

Mike Mahoney true
2022-01-12

Evaluation Results

This is the first iteration of shrubland modeling using LANDSAT-derived predictors. Shrubland pixels are 30m pixels which are (at a 1m scale) >= 50% between 1m and 5m heights (in a LiDAR-derived CHM), within vegetated LCPRI classes.

Test Data

ROC Curves

Optimal Coordinates

Threshold values were chosen using the validation set (below) to optimize for a certain level of specificity.

Probability Threshold Specificity Sensitivity
Optimize Both
Linear Ensemble 0.416 0.742 0.863
LGB 0.484 0.755 0.842
RF 0.508 0.721 0.783
95% Specificity
Linear Ensemble 0.848 0.962 0.424
LGB 0.832 0.961 0.434
RF 0.639 0.956 0.324
97.5% Specificity
Linear Ensemble 0.882 0.984 0.263
LGB 0.882 0.984 0.298
RF 0.667 0.980 0.214
99% Specificity
Linear Ensemble 0.902 0.993 0.136
LGB 0.922 0.992 0.173
RF 0.689 0.991 0.114

Confusion Matrices

Optimize Both

Logistic Ensemble

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1240  227
         1  432 1435
                                          
               Accuracy : 0.8023          
                 95% CI : (0.7884, 0.8157)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.6048          
                                          
 Mcnemar's Test P-Value : 1.915e-15       
                                          
            Sensitivity : 0.8634          
            Specificity : 0.7416          
         Pos Pred Value : 0.7686          
         Neg Pred Value : 0.8453          
             Prevalence : 0.4985          
         Detection Rate : 0.4304          
   Detection Prevalence : 0.5600          
      Balanced Accuracy : 0.8025          
                                          
       'Positive' Class : 1               
                                          

LightGBM

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1262  262
         1  410 1400
                                          
               Accuracy : 0.7984          
                 95% CI : (0.7844, 0.8119)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.597           
                                          
 Mcnemar's Test P-Value : 0.00000001423   
                                          
            Sensitivity : 0.8424          
            Specificity : 0.7548          
         Pos Pred Value : 0.7735          
         Neg Pred Value : 0.8281          
             Prevalence : 0.4985          
         Detection Rate : 0.4199          
   Detection Prevalence : 0.5429          
      Balanced Accuracy : 0.7986          
                                          
       'Positive' Class : 1               
                                          

Random Forest

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1206  360
         1  466 1302
                                          
               Accuracy : 0.7522          
                 95% CI : (0.7372, 0.7668)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.5046          
                                          
 Mcnemar's Test P-Value : 0.0002588       
                                          
            Sensitivity : 0.7834          
            Specificity : 0.7213          
         Pos Pred Value : 0.7364          
         Neg Pred Value : 0.7701          
             Prevalence : 0.4985          
         Detection Rate : 0.3905          
   Detection Prevalence : 0.5303          
      Balanced Accuracy : 0.7523          
                                          
       'Positive' Class : 1               
                                          

95% Specificity

Logistic Ensemble

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1609  957
         1   63  705
                                          
               Accuracy : 0.6941          
                 95% CI : (0.6781, 0.7097)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.3871          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.4242          
            Specificity : 0.9623          
         Pos Pred Value : 0.9180          
         Neg Pred Value : 0.6270          
             Prevalence : 0.4985          
         Detection Rate : 0.2115          
   Detection Prevalence : 0.2304          
      Balanced Accuracy : 0.6933          
                                          
       'Positive' Class : 1               
                                          

LightGBM

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1607  940
         1   65  722
                                          
               Accuracy : 0.6986          
                 95% CI : (0.6827, 0.7141)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.3962          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.4344          
            Specificity : 0.9611          
         Pos Pred Value : 0.9174          
         Neg Pred Value : 0.6309          
             Prevalence : 0.4985          
         Detection Rate : 0.2166          
   Detection Prevalence : 0.2361          
      Balanced Accuracy : 0.6978          
                                          
       'Positive' Class : 1               
                                          

Random Forest

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1598 1123
         1   74  539
                                          
               Accuracy : 0.641           
                 95% CI : (0.6244, 0.6573)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.2806          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.3243          
            Specificity : 0.9557          
         Pos Pred Value : 0.8793          
         Neg Pred Value : 0.5873          
             Prevalence : 0.4985          
         Detection Rate : 0.1617          
   Detection Prevalence : 0.1839          
      Balanced Accuracy : 0.6400          
                                          
       'Positive' Class : 1               
                                          

97.5% Specificity

Logistic Ensemble

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1645 1225
         1   27  437
                                          
               Accuracy : 0.6245          
                 95% CI : (0.6078, 0.6409)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.2473          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.2629          
            Specificity : 0.9839          
         Pos Pred Value : 0.9418          
         Neg Pred Value : 0.5732          
             Prevalence : 0.4985          
         Detection Rate : 0.1311          
   Detection Prevalence : 0.1392          
      Balanced Accuracy : 0.6234          
                                          
       'Positive' Class : 1               
                                          

LightGBM

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1645 1166
         1   27  496
                                          
               Accuracy : 0.6422          
                 95% CI : (0.6256, 0.6585)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.2829          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.2984          
            Specificity : 0.9839          
         Pos Pred Value : 0.9484          
         Neg Pred Value : 0.5852          
             Prevalence : 0.4985          
         Detection Rate : 0.1488          
   Detection Prevalence : 0.1569          
      Balanced Accuracy : 0.6411          
                                          
       'Positive' Class : 1               
                                          

Random Forest

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1638 1307
         1   34  355
                                          
               Accuracy : 0.5978          
                 95% CI : (0.5809, 0.6145)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.1937          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.2136          
            Specificity : 0.9797          
         Pos Pred Value : 0.9126          
         Neg Pred Value : 0.5562          
             Prevalence : 0.4985          
         Detection Rate : 0.1065          
   Detection Prevalence : 0.1167          
      Balanced Accuracy : 0.5966          
                                          
       'Positive' Class : 1               
                                          

99% Specificity

Logistic Ensemble

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1661 1436
         1   11  226
                                             
               Accuracy : 0.566              
                 95% CI : (0.549, 0.5829)    
    No Information Rate : 0.5015             
    P-Value [Acc > NIR] : 0.00000000000005002
                                             
                  Kappa : 0.1297             
                                             
 Mcnemar's Test P-Value : < 2.2e-16          
                                             
            Sensitivity : 0.13598            
            Specificity : 0.99342            
         Pos Pred Value : 0.95359            
         Neg Pred Value : 0.53633            
             Prevalence : 0.49850            
         Detection Rate : 0.06779            
   Detection Prevalence : 0.07109            
      Balanced Accuracy : 0.56470            
                                             
       'Positive' Class : 1                  
                                             

LightGBM

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1658 1374
         1   14  288
                                          
               Accuracy : 0.5837          
                 95% CI : (0.5667, 0.6005)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.1653          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.17329         
            Specificity : 0.99163         
         Pos Pred Value : 0.95364         
         Neg Pred Value : 0.54683         
             Prevalence : 0.49850         
         Detection Rate : 0.08638         
   Detection Prevalence : 0.09058         
      Balanced Accuracy : 0.58246         
                                          
       'Positive' Class : 1               
                                          

Random Forest

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1657 1473
         1   15  189
                                          
               Accuracy : 0.5537          
                 95% CI : (0.5366, 0.5707)
    No Information Rate : 0.5015          
    P-Value [Acc > NIR] : 0.0000000008964 
                                          
                  Kappa : 0.105           
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.11372         
            Specificity : 0.99103         
         Pos Pred Value : 0.92647         
         Neg Pred Value : 0.52939         
             Prevalence : 0.49850         
         Detection Rate : 0.05669         
   Detection Prevalence : 0.06119         
      Balanced Accuracy : 0.55237         
                                          
       'Positive' Class : 1               
                                          

Validation Data

ROC Curves

Optimal Coordinates

Probability Threshold Specificity Sensitivity
Optimize Both
Linear Ensemble 0.416 0.725 0.858
LGB 0.484 0.746 0.836
RF 0.508 0.701 0.773
95% Specificity
Linear Ensemble 0.848 0.950 0.407
LGB 0.832 0.950 0.417
RF 0.639 0.950 0.323
97.5% Specificity
Linear Ensemble 0.882 0.975 0.263
LGB 0.882 0.975 0.301
RF 0.667 0.975 0.208
99% Specificity
Linear Ensemble 0.902 0.990 0.151
LGB 0.922 0.990 0.186
RF 0.689 0.990 0.109

Confusion Matrices

Optimize Both

Logistic Ensemble

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1210  236
         1  458 1429
                                          
               Accuracy : 0.7918          
                 95% CI : (0.7776, 0.8055)
    No Information Rate : 0.5005          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.5836          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.8583          
            Specificity : 0.7254          
         Pos Pred Value : 0.7573          
         Neg Pred Value : 0.8368          
             Prevalence : 0.4995          
         Detection Rate : 0.4287          
   Detection Prevalence : 0.5662          
      Balanced Accuracy : 0.7918          
                                          
       'Positive' Class : 1               
                                          

LightGBM

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1244  273
         1  424 1392
                                          
               Accuracy : 0.7909          
                 95% CI : (0.7767, 0.8046)
    No Information Rate : 0.5005          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.5818          
                                          
 Mcnemar's Test P-Value : 0.00000001334   
                                          
            Sensitivity : 0.8360          
            Specificity : 0.7458          
         Pos Pred Value : 0.7665          
         Neg Pred Value : 0.8200          
             Prevalence : 0.4995          
         Detection Rate : 0.4176          
   Detection Prevalence : 0.5449          
      Balanced Accuracy : 0.7909          
                                          
       'Positive' Class : 1               
                                          

Random Forest

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1169  378
         1  499 1287
                                          
               Accuracy : 0.7369          
                 95% CI : (0.7216, 0.7518)
    No Information Rate : 0.5005          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.4738          
                                          
 Mcnemar's Test P-Value : 0.00005076      
                                          
            Sensitivity : 0.7730          
            Specificity : 0.7008          
         Pos Pred Value : 0.7206          
         Neg Pred Value : 0.7557          
             Prevalence : 0.4995          
         Detection Rate : 0.3861          
   Detection Prevalence : 0.5359          
      Balanced Accuracy : 0.7369          
                                          
       'Positive' Class : 1               
                                          

95% Specificity

Logistic Ensemble

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1585  988
         1   83  677
                                          
               Accuracy : 0.6787          
                 95% CI : (0.6625, 0.6945)
    No Information Rate : 0.5005          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.357           
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.4066          
            Specificity : 0.9502          
         Pos Pred Value : 0.8908          
         Neg Pred Value : 0.6160          
             Prevalence : 0.4995          
         Detection Rate : 0.2031          
   Detection Prevalence : 0.2280          
      Balanced Accuracy : 0.6784          
                                          
       'Positive' Class : 1               
                                          

LightGBM

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1585  971
         1   83  694
                                          
               Accuracy : 0.6838          
                 95% CI : (0.6677, 0.6995)
    No Information Rate : 0.5005          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.3672          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.4168          
            Specificity : 0.9502          
         Pos Pred Value : 0.8932          
         Neg Pred Value : 0.6201          
             Prevalence : 0.4995          
         Detection Rate : 0.2082          
   Detection Prevalence : 0.2331          
      Balanced Accuracy : 0.6835          
                                          
       'Positive' Class : 1               
                                          

Random Forest

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1585 1128
         1   83  537
                                         
               Accuracy : 0.6367         
                 95% CI : (0.6201, 0.653)
    No Information Rate : 0.5005         
    P-Value [Acc > NIR] : < 2.2e-16      
                                         
                  Kappa : 0.2729         
                                         
 Mcnemar's Test P-Value : < 2.2e-16      
                                         
            Sensitivity : 0.3225         
            Specificity : 0.9502         
         Pos Pred Value : 0.8661         
         Neg Pred Value : 0.5842         
             Prevalence : 0.4995         
         Detection Rate : 0.1611         
   Detection Prevalence : 0.1860         
      Balanced Accuracy : 0.6364         
                                         
       'Positive' Class : 1              
                                         

97.5% Specificity

Logistic Ensemble

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1627 1227
         1   41  438
                                          
               Accuracy : 0.6196          
                 95% CI : (0.6028, 0.6361)
    No Information Rate : 0.5005          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.2386          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.2631          
            Specificity : 0.9754          
         Pos Pred Value : 0.9144          
         Neg Pred Value : 0.5701          
             Prevalence : 0.4995          
         Detection Rate : 0.1314          
   Detection Prevalence : 0.1437          
      Balanced Accuracy : 0.6192          
                                          
       'Positive' Class : 1               
                                          

LightGBM

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1627 1164
         1   41  501
                                          
               Accuracy : 0.6385          
                 95% CI : (0.6219, 0.6548)
    No Information Rate : 0.5005          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.2765          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.3009          
            Specificity : 0.9754          
         Pos Pred Value : 0.9244          
         Neg Pred Value : 0.5829          
             Prevalence : 0.4995          
         Detection Rate : 0.1503          
   Detection Prevalence : 0.1626          
      Balanced Accuracy : 0.6382          
                                          
       'Positive' Class : 1               
                                          

Random Forest

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1627 1319
         1   41  346
                                         
               Accuracy : 0.592          
                 95% CI : (0.575, 0.6087)
    No Information Rate : 0.5005         
    P-Value [Acc > NIR] : < 2.2e-16      
                                         
                  Kappa : 0.1834         
                                         
 Mcnemar's Test P-Value : < 2.2e-16      
                                         
            Sensitivity : 0.2078         
            Specificity : 0.9754         
         Pos Pred Value : 0.8941         
         Neg Pred Value : 0.5523         
             Prevalence : 0.4995         
         Detection Rate : 0.1038         
   Detection Prevalence : 0.1161         
      Balanced Accuracy : 0.5916         
                                         
       'Positive' Class : 1              
                                         

99% Specificity

Logistic Ensemble

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1652 1413
         1   16  252
                                          
               Accuracy : 0.5713          
                 95% CI : (0.5542, 0.5881)
    No Information Rate : 0.5005          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.1419          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.15135         
            Specificity : 0.99041         
         Pos Pred Value : 0.94030         
         Neg Pred Value : 0.53899         
             Prevalence : 0.49955         
         Detection Rate : 0.07561         
   Detection Prevalence : 0.08041         
      Balanced Accuracy : 0.57088         
                                          
       'Positive' Class : 1               
                                          

LightGBM

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1652 1355
         1   16  310
                                          
               Accuracy : 0.5887          
                 95% CI : (0.5717, 0.6054)
    No Information Rate : 0.5005          
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.1767          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.18619         
            Specificity : 0.99041         
         Pos Pred Value : 0.95092         
         Neg Pred Value : 0.54938         
             Prevalence : 0.49955         
         Detection Rate : 0.09301         
   Detection Prevalence : 0.09781         
      Balanced Accuracy : 0.58830         
                                          
       'Positive' Class : 1               
                                          

Random Forest

Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1652 1483
         1   16  182
                                          
               Accuracy : 0.5503          
                 95% CI : (0.5332, 0.5672)
    No Information Rate : 0.5005          
    P-Value [Acc > NIR] : 0.000000004786  
                                          
                  Kappa : 0.0998          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.10931         
            Specificity : 0.99041         
         Pos Pred Value : 0.91919         
         Neg Pred Value : 0.52695         
             Prevalence : 0.49955         
         Detection Rate : 0.05461         
   Detection Prevalence : 0.05941         
      Balanced Accuracy : 0.54986         
                                          
       'Positive' Class : 1               
                                          

Metadata

Data

Models

Logistic Regression


Call:  glm(formula = shrub ~ ., family = "binomial", data = validation)

Coefficients:
(Intercept)          tcb          tcw          tcg          nbr  
-5.15613260   0.00038152   0.00027974  -0.00012225   0.00003898  
        mag          yod   nys_precip     nys_tmax     nys_tmin  
-0.00079959   0.00001847   0.00048267   0.09909979  -0.10876755  
 nys_aspect      nys_dem    nys_slope      nys_twi     lcsec_X2  
-0.00023984  -0.00019729  -0.00830177  -0.07000833  -0.04556227  
   lcsec_X3     lcsec_X4     lcsec_X5     lcsec_X8     lcsec_X6  
 0.24248714   0.22365026   0.28850568   0.45080926  12.90850836  
        lgb           rf  
 5.40048011   0.89261878  

Degrees of Freedom: 3332 Total (i.e. Null);  3311 Residual
Null Deviance:      4621 
Residual Deviance: 2986     AIC: 3030

Random Forest

$num.trees
[1] 3000

$mtry
[1] 1

$min.node.size
[1] 6

$replace
[1] TRUE

$sample.fraction
[1] 0.2

$formula
shrub ~ .

LightGBM

$params
$params$learning_rate
[1] 0.01

$params$nrounds
[1] 2500

$params$num_leaves
[1] 14

$params$max_depth
[1] -1

$params$extra_trees
[1] FALSE

$params$min_data_in_leaf
[1] 10

$params$bagging_fraction
[1] 0.5

$params$bagging_freq
[1] 1

$params$feature_fraction
[1] 0.9

$params$min_data_in_bin
[1] 3

$params$lambda_l1
[1] 0

$params$lambda_l2
[1] 0.5

$params$force_col_wise
[1] TRUE

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Mahoney (2022, Jan. 12). CAFRI Labs: Shrubland 0.0.1: Making It Happen. Retrieved from https://cafri-labs.github.io/acceptable-growing-stock/posts/shrubland-001-making-it-happen/

BibTeX citation

@misc{mahoney2022shrubland,
  author = {Mahoney, Mike},
  title = {CAFRI Labs: Shrubland 0.0.1: Making It Happen},
  url = {https://cafri-labs.github.io/acceptable-growing-stock/posts/shrubland-001-making-it-happen/},
  year = {2022}
}