Validation

GeoStats.jl was designed to, among other things, facilitate rigorous comparison of different geostatistical models in the literature. As a user of geostatistics, you may be interested in trying various models on a given data set to pick the one with best performance. As a researcher in the field, you may be interested in benchmarking your new model against other established models.

Errors of geostatistical solvers can be estimated with the cverror function:

GeoStatsValidation.cverrorFunction
cverror(model::GeoStatsModel, geotable, method; kwargs...)

Estimate error of model in a given geotable with error estimation method using Interpolate or InterpolateNeighbors depending on the passed kwargs.

cverror(model::StatsLearnModel, geotable, method)
cverror((model, invars => outvars), geotable, method)

Estimate error of model in a given geotable with error estimation method using the Learn transform.

For example, we can perform block cross-validation on a decision tree model using the following code:

using GeoStats
using GeoIO

# load geospatial data
Ω = GeoIO.load("data/agriculture.csv", coords = ("x", "y"))

# 20%/80% split along the (1, -1) direction
Ωₛ, Ωₜ = geosplit(Ω, 0.2, (1.0, -1.0))

# features and label for supervised learning
feats = [:band1,:band2,:band3,:band4]
label = :crop

# learning model
model = DecisionTreeClassifier()

# loss function
loss = MisclassLoss()

# block cross-validation with r = 30.
bcv = BlockValidation(30., loss = Dict(:crop => loss))

# estimate of generalization error
ϵ̂ = cverror((model, feats => label), Ωₛ, bcv)
Dict{Symbol, Float64} with 1 entry:
  :crop => 0.227837

We can unhide the labels in the target domain and compute the actual error for comparison:

# train in Ωₛ and predict in Ωₜ
Ω̂ₜ = Ωₜ |> Learn(Ωₛ, model, feats => label)

# actual error of the model
ϵ = mean(loss.(Ωₜ.crop, Ω̂ₜ.crop))
0.2380452705741268

Below is the list of currently implemented validation methods.

Leave-one-out

Leave-ball-out

K-fold

Block

GeoStatsValidation.BlockValidationType
BlockValidation(sides; loss=Dict())

Cross-validation with blocks of given sides. Optionally, specify loss function from LossFunctions.jl for some of the variables. If only one side is provided, then blocks become cubes.

References

Weighted

GeoStatsValidation.WeightedValidationType
WeightedValidation(weighting, folding; lambda=1.0, loss=Dict())

An error estimation method which samples are weighted with weighting method and split into folds with folding method. Weights are raised to lambda power in [0,1]. Optionally, specify loss function from LossFunctions.jl for some of the variables.

References

Density-ratio

GeoStatsValidation.DensityRatioValidationType
DensityRatioValidation(k; [parameters])

Density ratio validation where weights are first obtained with density ratio estimation, and then used in k-fold weighted cross-validation.

Parameters

  • shuffle - Shuffle the data before folding (default to true)
  • estimator - Density ratio estimator (default to LSIF())
  • optlib - Optimization library (default to default_optlib(estimator))
  • lambda - Power of density ratios (default to 1.0)

Please see DensityRatioEstimation.jl for a list of supported estimators.

References