Validation

GeoStats.jl was designed to, among other things, facilitate rigorous comparison of different geostatistical models in the literature. As a user of geostatistics, you may be interested in trying various models on a given data set to pick the one with best performance. As a researcher in the field, you may be interested in benchmarking your new model against other established models.

Errors of geostatistical solvers can be estimated with the cverror function:

GeoStatsValidation.cverror — Function

cverror(model::GeoStatsModel, geotable, method; kwargs...)

Estimate error of model in a given geotable with error estimation method using Interpolate or InterpolateNeighbors depending on the passed kwargs.

cverror(model::StatsLearnModel, geotable, method)
cverror((model, invars => outvars), geotable, method)

Estimate error of model in a given geotable with error estimation method using the Learn transform.

For example, we can perform block cross-validation on a decision tree model using the following code:

using GeoStats
using GeoIO

# load geospatial data
Ω = GeoIO.load("data/agriculture.csv", coords = ("x", "y"))

# 20%/80% split along the (1, -1) direction
Ωₛ, Ωₜ = geosplit(Ω, 0.2, (1.0, -1.0))

# features and label for supervised learning
feats = [:band1,:band2,:band3,:band4]
label = :crop

# learning model
model = DecisionTreeClassifier()

# loss function
loss = MisclassLoss()

# block cross-validation with r = 30.
bcv = BlockValidation(30., loss = Dict(:crop => loss))

# estimate of generalization error
ϵ̂ = cverror((model, feats => label), Ωₛ, bcv)

Dict{Symbol, Float64} with 1 entry:
  :crop => 0.227837

We can unhide the labels in the target domain and compute the actual error for comparison:

# train in Ωₛ and predict in Ωₜ
Ω̂ₜ = Ωₜ |> Learn(Ωₛ, model, feats => label)

# actual error of the model
ϵ = mean(loss.(Ωₜ.crop, Ω̂ₜ.crop))

0.2380452705741268

Below is the list of currently implemented validation methods.

Leave-one-out

GeoStatsValidation.LeaveOneOut — Type

LeaveOneOut(; loss=Dict())

Leave-one-out validation. Optionally, specify loss function from LossFunctions.jl for some of the variables.

References

Stone. 1974. Cross-Validatory Choice and Assessment of Statistical Predictions

Leave-ball-out

GeoStatsValidation.LeaveBallOut — Type

LeaveBallOut(ball; loss=Dict())

Leave-ball-out (a.k.a. spatial leave-one-out) validation. Optionally, specify loss function from the LossFunctions.jl package for some of the variables.

LeaveBallOut(radius; loss=Dict())

By default, use Euclidean ball of given radius in space.

References

Le Rest et al. 2014. Spatial leave-one-out cross-validation for variable selection in the presence of spatial autocorrelation

K-fold

GeoStatsValidation.KFoldValidation — Type

KFoldValidation(k; shuffle=true, loss=Dict())

k-fold cross-validation. Optionally, shuffle the data, and specify loss function from LossFunctions.jl for some of the variables.

References

Geisser, S. 1975. The predictive sample reuse method with applications
Burman, P. 1989. A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods

Block

GeoStatsValidation.BlockValidation — Type

BlockValidation(sides; loss=Dict())

Cross-validation with blocks of given sides. Optionally, specify loss function from LossFunctions.jl for some of the variables. If only one side is provided, then blocks become cubes.

References

Roberts et al. 2017. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure
Pohjankukka et al. 2017. Estimating the prediction performance of spatial models via spatial k-fold cross-validation

Weighted

GeoStatsValidation.WeightedValidation — Type

WeightedValidation(weighting, folding; lambda=1.0, loss=Dict())

An error estimation method which samples are weighted with weighting method and split into folds with folding method. Weights are raised to lambda power in [0,1]. Optionally, specify loss function from LossFunctions.jl for some of the variables.

References

Sugiyama et al. 2006. Importance-weighted cross-validation for covariate shift
Sugiyama et al. 2007. Covariate shift adaptation by importance weighted cross validation

Density-ratio

GeoStatsValidation.DensityRatioValidation — Type

DensityRatioValidation(k; [parameters])

Density ratio validation where weights are first obtained with density ratio estimation, and then used in k-fold weighted cross-validation.

Parameters

shuffle - Shuffle the data before folding (default to true)
estimator - Density ratio estimator (default to LSIF())
optlib - Optimization library (default to default_optlib(estimator))
lambda - Power of density ratios (default to 1.0)

Please see DensityRatioEstimation.jl for a list of supported estimators.

References

Hoffimann et al. 2020. Geostatistical Learning: Challenges and Opportunities