Validation

Overview

We provide various geostatistical cross-validation methods to compare interpolation models or learning models. These methods are accessible through the cverror function:

GeoStatsValidation.cverror — Function

cverror(model::GeoStatsModel, geotable, method; kwargs...)

Estimate cross-validation error of geostatistical model on given geotable with error estimation method using Interpolate or InterpolateNeighbors depending on kwargs.

cverror(model::StatsLearnModel, geotable, method)
cverror((model, feats => targs), geotable, method)

Estimate cross-validation error of statistical learning model on given geotable with error estimation method.

source

As an example, consider the block cross-validation error of the following decision tree learning model:

using GeoStats
using GeoIO

# load geospatial data
Ω = GeoIO.load("data/agriculture.csv", coords = ("x", "y"))

# 20%/80% split along the (1, -1) direction
Ωₛ, Ωₜ = geosplit(Ω, 0.2, (1.0, -1.0))

# features and label for supervised learning
feats = ["band1", "band2", "band3", "band4"]
label = "crop"

# learning model
model = DecisionTreeClassifier()

# loss function
loss = MisclassLoss()

# block cross-validation with r = 30.
bcv = BlockValidation(30., loss = Dict("crop" => loss))

# estimate of generalization error
ϵ̂ = cverror((model, feats => label), Ωₛ, bcv)

Dict{Symbol, Float64} with 1 entry:
  :crop => 0.226942

We can unhide the labels in the target domain and compute the actual error for comparison:

# train in Ωₛ and predict in Ωₜ
Ω̂ₜ = Ωₜ |> Learn(Ωₛ, model, feats => label)

# actual error of the model
ϵ = mean(loss.(Ωₜ.crop, Ω̂ₜ.crop))

0.23729575775745765

Methods

Leave-one-out

GeoStatsValidation.LeaveOneOut — Type

LeaveOneOut(; loss=Dict())

Leave-one-out validation. Optionally, specify a dictionary of loss functions from LossFunctions.jl for some of the variables.

References

Stone. 1974. Cross-Validatory Choice and Assessment of Statistical Predictions

source

Leave-ball-out

GeoStatsValidation.LeaveBallOut — Type

LeaveBallOut(ball; loss=Dict())

Leave-ball-out (a.k.a. spatial leave-one-out) validation. Optionally, specify a dictionary with loss functions from LossFunctions.jl for some of the variables.

LeaveBallOut(radius; loss=Dict())

By default, use Euclidean ball of given radius in space.

References

Le Rest et al. 2014. Spatial leave-one-out cross-validation for variable selection in the presence of spatial autocorrelation

source

K-fold

GeoStatsValidation.KFoldValidation — Type

KFoldValidation(k; shuffle=true, loss=Dict())

k-fold cross-validation. Optionally, shuffle the data, and specify a dictionary with loss functions from LossFunctions.jl for some of the variables.

References

Geisser, S. 1975. The predictive sample reuse method with applications
Burman, P. 1989. A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods

source

Block

GeoStatsValidation.BlockValidation — Type

BlockValidation(sides; loss=Dict())

Cross-validation with blocks of given sides. Optionally, specify a dictionary with loss functions from LossFunctions.jl for some of the variables.

References

Roberts et al. 2017. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure
Pohjankukka et al. 2017. Estimating the prediction performance of spatial models via spatial k-fold cross-validation

source

Weighted

GeoStatsValidation.WeightedValidation — Type

WeightedValidation(weighting, folding; lambda=1.0, loss=Dict())

An error estimation method which samples are weighted with weighting method and split into folds with folding method. Weights are raised to lambda power in [0,1]. Optionally, specify a dictionary with loss functions from LossFunctions.jl for some of the variables.

References

Sugiyama et al. 2006. Importance-weighted cross-validation for covariate shift
Sugiyama et al. 2007. Covariate shift adaptation by importance weighted cross validation

source

Density-ratio

GeoStatsValidation.DensityRatioValidation — Type

DensityRatioValidation(k; [options])

Density ratio validation where weights are first obtained with density ratio estimation, and then used in k-fold weighted cross-validation.

Options

shuffle - Shuffle the data before folding (default to true)
estimator - Density ratio estimator (default to LSIF())
optlib - Optimization library (default to default_optlib(estimator))
lambda - Power of density ratios (default to 1.0)
loss - Dictionary with loss functions (default to Dict())

Please see DensityRatioEstimation.jl for a list of supported estimators.

References

Hoffimann et al. 2020. Geostatistical Learning: Challenges and Opportunities

source