Model Bootstrap Uncertainty Estimates for Tax Parcels in WWE

Proof-of-concept analytical estimates of standard error for AGB predictions aggregated within 2019 NYS tax parcels in the Warren, Washington, Essex LiDAR coverage

Lucas Johnson
2022-02-16

Description

SE Computations

The SE computations (by way of variance) generally follow the approach described in the McRoberts 2011 paper and CEOS doc (Section 4.2.4).

100 models were developed with 100 different bootstrap samples of the training data. Each model was used to produce an AGB predicted surface (using the linear model ensemble) for the Warren, Washington, Essex LiDAR coverage.

Aggregate prediction (tax parcel AGB estimates) were computed as the sum of two components:

  1. Across iteration variance.
  2. Average within iteration variance.

Across iteration variance was computed as follows:

1.1 Compute tax parcel predictions for each of the 100 AGB surfaces.
1.2 Compute variance across all 100 estimates for given tax parcel.

Average within iteration variance was computed as follows:

2.1 Use each model to make plot predictions for the standard holdout set of
  plots.
2.2 Compute the sum of the squared errors for each test plot, and divide 
  by the number of test plots squared. 
  (See step 5b in box 4.1 in the CEOS doc)
2.3 Average the values from 2.2 across all iterations.

Note This is likely an underestimate as it is computed across the entire set of LiDAR coverages. A better estimate would use only plots within the Warren, Washington, Essex coverage. The CEOS doc specifically uses this component of variance for whole map AGB estimates (global avg, grand total), and the disconnect is that we are leveraging it for small area estimation.

Variance is converted to SE by dividing the square-root of the aggregate estimate variance by the number of iterations (100).

Data

The Warren, Washington, Essex LiDAR region was leveraged for this analysis. Note that this area is relatively homogeneous with respect to landcover, so we might expect these error estimates to be optimistic relative to other regions.

The LINMOD ensemble predictions were used in this analysis. No masking was conducted as part of this analysis.

2019 tax parcels were used as aggregation units. These aggregation units make sense from an application standpoint, though perhaps more arbitrary units of aggregation would be more suitable.

Results

Summary of Error Components

Average Contributions:

  1. Across Iteration Variance: 70.6%
  2. Within Iteration Variance: 29.4%

Distribution of % Error (capped at 200%)

Parcel Errors By Parcel Size

SE By Parcel Size

% Error (capped at 200%) By Parcel Size

Summarized Parcel Errors for Size Groupings

Groups are exclusive, where each point along the x-axis represents the center of a 10 acre summary group. So where the x-axis says ‘55’ we are summarizing all parcels larger than 50 acres in size and smaller than or equal to 60 acres in size.

Parcel Maps

Percent errors capped at 200% for figure clarity

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Johnson (2022, Feb. 16). CAFRI Labs: Model Bootstrap Uncertainty Estimates for Tax Parcels in WWE. Retrieved from https://cafri-labs.github.io/acceptable-growing-stock/posts/model-bootstrap-uncertainty-estimates-for-tax-parcels-in-wwe/

BibTeX citation

@misc{johnson2022model,
  author = {Johnson, Lucas},
  title = {CAFRI Labs: Model Bootstrap Uncertainty Estimates for Tax Parcels in WWE},
  url = {https://cafri-labs.github.io/acceptable-growing-stock/posts/model-bootstrap-uncertainty-estimates-for-tax-parcels-in-wwe/},
  year = {2022}
}