How to work with "rasters"?
A "raster" is nothing more than a Julia array that is georeferenced over a geospatial grid. The "raster" model is popular in the scientific community for two main reasons:
It provides convenient syntax to access grid elements within "rectangular" regions, including syntax to name array dimensions as "x", "y", ... and extract rectangular regions based on coordinate values.
It can rely on DiskArrays.jl and similar packages for loading large datasets lazily, without consuming all the available RAM.
We illustrate how GeoTable
s over Grid
s share these benefits.
Convenient syntax
n-dimensional array syntax
Any GeoTable
over a Grid
supports the n-dimensional array syntax in the row selector.
Consider the following example from the NaturalEarth dataset:
using GeoArtifacts
raster = NaturalEarth.naturalearth1("water") |> Upscale(10, 5)
raster |> viewer

We can check the size of the underlying grid with
size(raster.geometry)
(1620, 1620)
This dataset is stored with LatLon
coordinates. For convenience, we always store the "horizontal" coordinate along the first axis of the grid (longitude in this case). We can slice the Earth as follows:
raster[(800:1620, :), :] |> viewer

raster[(:, 1:800), :] |> viewer

raster[(:, 500:800), :] |> viewer

We can also slice after 2D projections, which is more common in the literature:
projec = raster |> Proj(Robinson)
projec |> viewer

projec[(300:800, 600:1400), :] |> viewer

Slice
transform syntax
To select a subset of the dataset from actual coordinate ranges, we can use the Slice
transform. To slice the x
and y
coordinates of the projected dataset with values in kilometers, we do:
km = u"km" # define kilometer unit
projec |> Slice(x=(0km, 20000km), y=(0km, 10000km)) |> viewer

Notice that the result is no longer a "raster" because the Robinson
projection deforms the graticule, and we can’t simply rely on the sorted x
and y
coordinates of the first row and column to find indices across the other rows and columns. Although the result cannot be represented with the "raster" model, it can be represented with the GeoTable
model over a lazy domain view.
The x
and y
options can be replaced by lon
and lat
if the crs
of the domain is geographic. Below is an example of subgrid over the ellipsoid using lat
and lon
ranges:
° = u"°" # define degree unit
brazil = raster |> Slice(lat=(-60°, 20°), lon=(-100°, -20°))
brazil |> Rotate(RotZ(90)) |> viewer

Lazy loading
The Grid itself is lazy, it doesn’t allocate memory:
Base.summarysize(raster.geometry)
272
Remember that a n-dimensional array is simply a flat memory buffer with an additional tuple containing the number of elements along each dimension. In Julia, the vec
and reshape
operations are lazy:
X = rand(1000, 1000)
@allocated vec(X)
32
@allocated reshape(vec(X), 1000, 1000)
80
If you have a DiskArray
or any other array type that satisfies your memory requirements, simply georef
it over a Grid
to obtain a "lazy" GeoTable
:
georef((; x=vec(X)), CartesianGrid(1000, 1000))
x | geometry |
---|---|
Continuous | Quadrangle |
[NoUnits] | 🖈 Cartesian{NoDatum} |
0.308291 | Quadrangle((x: 0.0 m, y: 0.0 m), ..., (x: 0.0 m, y: 1.0 m)) |
0.275683 | Quadrangle((x: 1.0 m, y: 0.0 m), ..., (x: 1.0 m, y: 1.0 m)) |
0.0850912 | Quadrangle((x: 2.0 m, y: 0.0 m), ..., (x: 2.0 m, y: 1.0 m)) |
0.352538 | Quadrangle((x: 3.0 m, y: 0.0 m), ..., (x: 3.0 m, y: 1.0 m)) |
0.670556 | Quadrangle((x: 4.0 m, y: 0.0 m), ..., (x: 4.0 m, y: 1.0 m)) |
0.380137 | Quadrangle((x: 5.0 m, y: 0.0 m), ..., (x: 5.0 m, y: 1.0 m)) |
0.48213 | Quadrangle((x: 6.0 m, y: 0.0 m), ..., (x: 6.0 m, y: 1.0 m)) |
0.356601 | Quadrangle((x: 7.0 m, y: 0.0 m), ..., (x: 7.0 m, y: 1.0 m)) |
0.453532 | Quadrangle((x: 8.0 m, y: 0.0 m), ..., (x: 8.0 m, y: 1.0 m)) |
0.211697 | Quadrangle((x: 9.0 m, y: 0.0 m), ..., (x: 9.0 m, y: 1.0 m)) |
⋮ | ⋮ |
The GeoIO.jl package is highly recommended as it will take care of these low-level memory details for you.