Re-doing the supersized models with a balanced sample. 2022-01-21
Last iteration of these models: 2022-01-16
Threshold values were chosen using the validation set (below) to optimize for a certain level of specificity.
Probability Threshold | Specificity | Sensitivity | |
---|---|---|---|
Optimize Both | |||
Linear Ensemble | 0.489 | 0.780 | 0.842 |
Neural Net | 0.494 | 0.774 | 0.830 |
LGB | 0.498 | 0.776 | 0.837 |
RF | 0.519 | 0.752 | 0.767 |
90% Specificity | |||
Linear Ensemble | 0.755 | 0.900 | 0.659 |
Neural Net | 0.704 | 0.901 | 0.618 |
LGB | 0.699 | 0.900 | 0.644 |
RF | 0.602 | 0.900 | 0.537 |
95% Specificity | |||
Linear Ensemble | 0.840 | 0.949 | 0.496 |
Neural Net | 0.791 | 0.950 | 0.441 |
LGB | 0.787 | 0.949 | 0.485 |
RF | 0.648 | 0.951 | 0.379 |
97.5% Specificity | |||
Linear Ensemble | 0.881 | 0.975 | 0.348 |
Neural Net | 0.843 | 0.975 | 0.303 |
LGB | 0.845 | 0.976 | 0.342 |
RF | 0.678 | 0.975 | 0.262 |
99% Specificity | |||
Linear Ensemble | 0.907 | 0.989 | 0.218 |
Neural Net | 0.886 | 0.989 | 0.185 |
LGB | 0.889 | 0.990 | 0.208 |
RF | 0.704 | 0.990 | 0.135 |
Logistic Ensemble
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 32680 6537
1 9211 34906
Accuracy : 0.811
95% CI : (0.8084, 0.8137)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.6222
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.8423
Specificity : 0.7801
Pos Pred Value : 0.7912
Neg Pred Value : 0.8333
Prevalence : 0.4973
Detection Rate : 0.4189
Detection Prevalence : 0.5294
Balanced Accuracy : 0.8112
'Positive' Class : 1
Neural Net
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 32428 7031
1 9463 34412
Accuracy : 0.8021
95% CI : (0.7994, 0.8048)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.6043
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.8303
Specificity : 0.7741
Pos Pred Value : 0.7843
Neg Pred Value : 0.8218
Prevalence : 0.4973
Detection Rate : 0.4129
Detection Prevalence : 0.5265
Balanced Accuracy : 0.8022
'Positive' Class : 1
LightGBM
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 32496 6761
1 9395 34682
Accuracy : 0.8061
95% CI : (0.8034, 0.8088)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.6124
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.8369
Specificity : 0.7757
Pos Pred Value : 0.7869
Neg Pred Value : 0.8278
Prevalence : 0.4973
Detection Rate : 0.4162
Detection Prevalence : 0.5289
Balanced Accuracy : 0.8063
'Positive' Class : 1
Random Forest
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 31522 9646
1 10369 31797
Accuracy : 0.7598
95% CI : (0.7569, 0.7627)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.5197
Mcnemar's Test P-Value : 0.0000003336
Sensitivity : 0.7672
Specificity : 0.7525
Pos Pred Value : 0.7541
Neg Pred Value : 0.7657
Prevalence : 0.4973
Detection Rate : 0.3816
Detection Prevalence : 0.5060
Balanced Accuracy : 0.7599
'Positive' Class : 1
Logistic Ensemble
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 37699 14144
1 4192 27299
Accuracy : 0.78
95% CI : (0.7771, 0.7828)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.5594
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.6587
Specificity : 0.8999
Pos Pred Value : 0.8669
Neg Pred Value : 0.7272
Prevalence : 0.4973
Detection Rate : 0.3276
Detection Prevalence : 0.3779
Balanced Accuracy : 0.7793
'Positive' Class : 1
Neural Net
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 37760 15817
1 4131 25626
Accuracy : 0.7606
95% CI : (0.7577, 0.7635)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.5205
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.6183
Specificity : 0.9014
Pos Pred Value : 0.8612
Neg Pred Value : 0.7048
Prevalence : 0.4973
Detection Rate : 0.3075
Detection Prevalence : 0.3571
Balanced Accuracy : 0.7599
'Positive' Class : 1
LightGBM
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 37689 14739
1 4202 26704
Accuracy : 0.7727
95% CI : (0.7698, 0.7756)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.5448
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.6444
Specificity : 0.8997
Pos Pred Value : 0.8640
Neg Pred Value : 0.7189
Prevalence : 0.4973
Detection Rate : 0.3204
Detection Prevalence : 0.3709
Balanced Accuracy : 0.7720
'Positive' Class : 1
Random Forest
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 37698 19200
1 4193 22243
Accuracy : 0.7193
95% CI : (0.7162, 0.7223)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.4375
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.5367
Specificity : 0.8999
Pos Pred Value : 0.8414
Neg Pred Value : 0.6626
Prevalence : 0.4973
Detection Rate : 0.2669
Detection Prevalence : 0.3172
Balanced Accuracy : 0.7183
'Positive' Class : 1
Logistic Ensemble
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 39768 20907
1 2123 20536
Accuracy : 0.7236
95% CI : (0.7206, 0.7267)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.4459
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.4955
Specificity : 0.9493
Pos Pred Value : 0.9063
Neg Pred Value : 0.6554
Prevalence : 0.4973
Detection Rate : 0.2464
Detection Prevalence : 0.2719
Balanced Accuracy : 0.7224
'Positive' Class : 1
Neural Net
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 39809 23150
1 2082 18293
Accuracy : 0.6972
95% CI : (0.6941, 0.7003)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.3928
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.4414
Specificity : 0.9503
Pos Pred Value : 0.8978
Neg Pred Value : 0.6323
Prevalence : 0.4973
Detection Rate : 0.2195
Detection Prevalence : 0.2445
Balanced Accuracy : 0.6959
'Positive' Class : 1
LightGBM
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 39770 21356
1 2121 20087
Accuracy : 0.7183
95% CI : (0.7152, 0.7213)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.4351
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.4847
Specificity : 0.9494
Pos Pred Value : 0.9045
Neg Pred Value : 0.6506
Prevalence : 0.4973
Detection Rate : 0.2410
Detection Prevalence : 0.2665
Balanced Accuracy : 0.7170
'Positive' Class : 1
Random Forest
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 39843 25733
1 2048 15710
Accuracy : 0.6666
95% CI : (0.6634, 0.6698)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.3312
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.3791
Specificity : 0.9511
Pos Pred Value : 0.8847
Neg Pred Value : 0.6076
Prevalence : 0.4973
Detection Rate : 0.1885
Detection Prevalence : 0.2131
Balanced Accuracy : 0.6651
'Positive' Class : 1
Logistic Ensemble
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 40858 27010
1 1033 14433
Accuracy : 0.6635
95% CI : (0.6603, 0.6667)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.3247
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.3483
Specificity : 0.9753
Pos Pred Value : 0.9332
Neg Pred Value : 0.6020
Prevalence : 0.4973
Detection Rate : 0.1732
Detection Prevalence : 0.1856
Balanced Accuracy : 0.6618
'Positive' Class : 1
Neural Net
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 40839 28905
1 1052 12538
Accuracy : 0.6405
95% CI : (0.6373, 0.6438)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.2784
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.3025
Specificity : 0.9749
Pos Pred Value : 0.9226
Neg Pred Value : 0.5856
Prevalence : 0.4973
Detection Rate : 0.1505
Detection Prevalence : 0.1631
Balanced Accuracy : 0.6387
'Positive' Class : 1
LightGBM
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 40895 27288
1 996 14155
Accuracy : 0.6606
95% CI : (0.6574, 0.6638)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.3189
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.3416
Specificity : 0.9762
Pos Pred Value : 0.9343
Neg Pred Value : 0.5998
Prevalence : 0.4973
Detection Rate : 0.1699
Detection Prevalence : 0.1818
Balanced Accuracy : 0.6589
'Positive' Class : 1
Random Forest
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 40830 30566
1 1061 10877
Accuracy : 0.6205
95% CI : (0.6172, 0.6238)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.238
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.2625
Specificity : 0.9747
Pos Pred Value : 0.9111
Neg Pred Value : 0.5719
Prevalence : 0.4973
Detection Rate : 0.1305
Detection Prevalence : 0.1433
Balanced Accuracy : 0.6186
'Positive' Class : 1
Logistic Ensemble
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 41437 32415
1 454 9028
Accuracy : 0.6056
95% CI : (0.6022, 0.6089)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.2079
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.2178
Specificity : 0.9892
Pos Pred Value : 0.9521
Neg Pred Value : 0.5611
Prevalence : 0.4973
Detection Rate : 0.1083
Detection Prevalence : 0.1138
Balanced Accuracy : 0.6035
'Positive' Class : 1
Neural Net
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 41415 33767
1 476 7676
Accuracy : 0.5891
95% CI : (0.5857, 0.5924)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.1746
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.18522
Specificity : 0.98864
Pos Pred Value : 0.94161
Neg Pred Value : 0.55086
Prevalence : 0.49731
Detection Rate : 0.09211
Detection Prevalence : 0.09782
Balanced Accuracy : 0.58693
'Positive' Class : 1
LightGBM
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 41487 32807
1 404 8636
Accuracy : 0.6015
95% CI : (0.5981, 0.6048)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.1996
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.2084
Specificity : 0.9904
Pos Pred Value : 0.9553
Neg Pred Value : 0.5584
Prevalence : 0.4973
Detection Rate : 0.1036
Detection Prevalence : 0.1085
Balanced Accuracy : 0.5994
'Positive' Class : 1
Random Forest
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 41471 35857
1 420 5586
Accuracy : 0.5647
95% CI : (0.5613, 0.568)
No Information Rate : 0.5027
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.1253
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.13479
Specificity : 0.98997
Pos Pred Value : 0.93007
Neg Pred Value : 0.53630
Prevalence : 0.49731
Detection Rate : 0.06703
Detection Prevalence : 0.07207
Balanced Accuracy : 0.56238
'Positive' Class : 1
Probability Threshold | Specificity | Sensitivity | |
---|---|---|---|
Optimize Both | |||
Linear Ensemble | 0.489 | 0.779 | 0.844 |
Neural Net | 0.494 | 0.772 | 0.830 |
LGB | 0.498 | 0.774 | 0.838 |
RF | 0.519 | 0.753 | 0.769 |
90% Specificity | |||
Linear Ensemble | 0.755 | 0.900 | 0.655 |
Neural Net | 0.704 | 0.900 | 0.616 |
LGB | 0.699 | 0.900 | 0.640 |
RF | 0.602 | 0.900 | 0.536 |
95% Specificity | |||
Linear Ensemble | 0.840 | 0.950 | 0.493 |
Neural Net | 0.791 | 0.950 | 0.439 |
LGB | 0.787 | 0.950 | 0.478 |
RF | 0.648 | 0.950 | 0.376 |
97.5% Specificity | |||
Linear Ensemble | 0.881 | 0.975 | 0.344 |
Neural Net | 0.843 | 0.975 | 0.298 |
LGB | 0.845 | 0.975 | 0.337 |
RF | 0.678 | 0.975 | 0.257 |
99% Specificity | |||
Linear Ensemble | 0.907 | 0.990 | 0.213 |
Neural Net | 0.886 | 0.990 | 0.181 |
LGB | 0.889 | 0.990 | 0.205 |
RF | 0.704 | 0.990 | 0.133 |
Call: glm(formula = shrub ~ ., family = "binomial", data = validation)
Coefficients:
(Intercept) tcb tcw tcg nbr
-4.0528297 0.0004629 0.0001641 -0.0005451 0.0005373
mag yod nys_precip nys_tmax nys_tmin
-0.0005495 0.0001269 0.0002575 0.0468812 -0.0569427
nys_aspect nys_dem nys_slope nys_twi lcsec_X2
0.0003939 -0.0001579 -0.0046514 0.0113271 -0.1513639
lcsec_X3 lcsec_X4 lcsec_X5 lcsec_X6 lcsec_X8
0.0045767 -0.0064030 -0.0359777 -0.1350724 0.3104493
lgb rf nnet
4.2314516 -1.3507346 2.6849262
Degrees of Freedom: 83333 Total (i.e. Null); 83311 Residual
Null Deviance: 115500
Residual Deviance: 69130 AIC: 69180
Model
Model: "sequential"
______________________________________________________________________
Layer (type) Output Shape Param #
======================================================================
dense_features (DenseFeatures multiple 0
)
dense_5 (Dense) multiple 5120
dense_4 (Dense) multiple 32896
dense_3 (Dense) multiple 8256
dense_2 (Dense) multiple 2080
dense_1 (Dense) multiple 528
dropout (Dropout) multiple 0
dense (Dense) multiple 17
======================================================================
Total params: 48,897
Trainable params: 48,897
Non-trainable params: 0
______________________________________________________________________
$num.trees
[1] 3000
$mtry
[1] 1
$min.node.size
[1] 6
$replace
[1] TRUE
$sample.fraction
[1] 0.2
$formula
shrub ~ .
$params
$params$learning_rate
[1] 0.01
$params$nrounds
[1] 2500
$params$num_leaves
[1] 14
$params$max_depth
[1] -1
$params$extra_trees
[1] FALSE
$params$min_data_in_leaf
[1] 10
$params$bagging_fraction
[1] 0.5
$params$bagging_freq
[1] 1
$params$feature_fraction
[1] 0.9
$params$min_data_in_bin
[1] 3
$params$lambda_l1
[1] 0
$params$lambda_l2
[1] 0.5
$params$force_col_wise
[1] TRUE
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Mahoney (2022, Jan. 21). CAFRI Labs: Shrubland 1.0.2: Balanced Diet. Retrieved from https://cafri-labs.github.io/acceptable-growing-stock/posts/shrubland-102-balanced-diet/
BibTeX citation
@misc{mahoney2022shrubland, author = {Mahoney, Mike}, title = {CAFRI Labs: Shrubland 1.0.2: Balanced Diet}, url = {https://cafri-labs.github.io/acceptable-growing-stock/posts/shrubland-102-balanced-diet/}, year = {2022} }