Another important tool in geospatial data science is the geospatial join. We will introduce the concept with a practical example, and will explain how it is related to the standard join of two tables.
9.1 Motivation
The split-apply-combine pattern that we learned in the previous chapter requires a single geotable with all the relevant information in it. However, many questions in geospatial data science can only be answered with information that is spread across multiple geotables. Hence, the need to join these geotables before attempting to @groupby, @transform and @combine the information.
Letâs consider a simple example where we are given two geotables, one containing people who shared their latitude and longitude coordinates:
The georef function has a method that accepts the names of columns with geospatial coordinates. In this example, the table already has the LATITUDE and LONGITUDE coordinates.
Letâs visualize the geometries of these two geotables:
fig = Mke.Figure()ax = Mke.Axis(fig[1,1], title ="People and countries", xlabel ="longitude", ylabel ="latitude")viz!(countries.geometry)viz!(people.geometry, color ="teal", pointsize =10)fig
â Warning: Found `resolution` in the theme when creating a `Scene`. The `resolution` keyword for `Scene`s and `Figure`s has been deprecated. Use `Figure(; size = ...` or `Scene(; size = ...)` instead, which better reflects that this is a unitless size and not a pixel resolution. The key could also come from `set_theme!` calls or related theming functions.
â @ Makie ~/.julia/packages/Makie/ND0gA/src/scenes.jl:220
Our goal in this example is to attach the âCOUNTRYâ and âREGIONâ information to each individual based on their location, represented as a point. In other words, we want to find the countries that contain the location, and then copy the columns to the people.
9.2 Joining geotables
The geojoin function can be used to join two geotables using a geometric predicate function:
geojoin(people, countries, pred =â)
5Ă6 GeoTable over 5 PointSet{2,Float64}
NAME
AGE
HEIGHT
COUNTRY
REGION
geometry
Categorical
Continuous
Continuous
Categorical
Categorical
Point2
[NoUnits]
[yr]
[m]
[NoUnits]
[NoUnits]
John
34.0 yr
1.78 m
Brazil
South America
(-43.1789, -22.9671)
Mary
12.0 yr
1.56 m
United States of America
Northern America
(-122.17, 37.4277)
Paul
23.0 yr
1.7 m
Australia
Australia and New Zealand
(153.044, -27.4862)
Anne
39.0 yr
1.8 m
China
Eastern Asia
(116.408, 39.9036)
Kate
28.0 yr
1.72 m
missing
missing
(-32.4114, -3.84731)
In the example above, we used the â predicate to check if a Point in the people geotable is in a MultiPolygon in the countries geotable. The default predicate in geojoin is the intersects function that we covered in previous chapters.
Notice that Kateâs âCOUNTRYâ and âREGIONâ are missing. That is because Kate lives in the Fernando de Noronha island, which is not present in the countries geotable. To retain just those people for which the geometric predicate evaluates true, we can use a different kind of geojoin known as inner:
geojoin(people, countries, kind =:inner)
4Ă6 GeoTable over 4 view(::PointSet{2,Float64}, [1, 2, 3, 4])
NAME
AGE
HEIGHT
COUNTRY
REGION
geometry
Categorical
Continuous
Continuous
Categorical
Categorical
Point2
[NoUnits]
[yr]
[m]
[NoUnits]
[NoUnits]
John
34.0 yr
1.78 m
Brazil
South America
(-43.1789, -22.9671)
Mary
12.0 yr
1.56 m
United States of America
Northern America
(-122.17, 37.4277)
Paul
23.0 yr
1.7 m
Australia
Australia and New Zealand
(153.044, -27.4862)
Anne
39.0 yr
1.8 m
China
Eastern Asia
(116.408, 39.9036)
By default, the left kind is used. Like in standard join, the geojoin is not commutative. If we swap the order of the geotables, the result will be different:
geojoin(countries, people, kind =:inner)
4Ă6 GeoTable over 4 view(::GeometrySet{2,Float64}, [5, 30, 138, 140])
COUNTRY
REGION
NAME
AGE
HEIGHT
geometry
Categorical
Categorical
Categorical
Continuous
Continuous
MultiPolygon
[NoUnits]
[NoUnits]
[NoUnits]
[yr]
[m]
United States of America
Northern America
Mary
12.0 yr
1.56 m
Multi(10ĂPolyArea)
Brazil
South America
John
34.0 yr
1.78 m
Multi(1ĂPolyArea)
Australia
Australia and New Zealand
Paul
23.0 yr
1.7 m
Multi(2ĂPolyArea)
China
Eastern Asia
Anne
39.0 yr
1.8 m
Multi(2ĂPolyArea)
To learn about the different kinds of join, check DataFrames.jlâs documentation on joins.
Note
In database-style join, the predicate function isequal is applied to arbitrary columns of tables. In geojoin, geometric predicate functions are applied to the geometry column of geotables.
The tablejoin function can be used for database-style join between a geotable and a simple table (e.g., DataFrame). The result will be a new geotable over a subset of geometries from the first argument. Please check the documentation for more details.
Tip for all users
Besides geojoin, the framework also provides vertical and horizontal concatenation of geotables. Vertical concatenation is achieved with vcat as follows:
All columns are preserved by default, which corresponds to the option kind = :union. Absent columns are filled with missing values. The option kind = :intersect can be used to retain only the columns that are present in all geotables:
vcat(gt1, gt2, kind =:intersect)
6Ă2 GeoTable over 6 GeometrySet{1,Float64}
b
geometry
Categorical
Segment
[NoUnits]
4
Segment((0.0,), (1.0,))
5
Segment((1.0,), (2.0,))
6
Segment((2.0,), (3.0,))
4
Segment((0.0,), (1.0,))
5
Segment((1.0,), (2.0,))
6
Segment((2.0,), (3.0,))
Horizontal concatenation is achieved with hcat as follows:
hcat(gt1, gt2)
3Ă5 GeoTable over 3 CartesianGrid{1,Float64}
a
b
b_
c
geometry
Categorical
Categorical
Categorical
Categorical
Segment
[NoUnits]
[NoUnits]
[NoUnits]
[NoUnits]
1
4
4
7
Segment((0.0,), (1.0,))
2
5
5
8
Segment((1.0,), (2.0,))
3
6
6
9
Segment((2.0,), (3.0,))
Unique column names are produced with underscore suffixes whenever a column appears in multiple geotables. Julia provides convenient syntax for vcat and hcat:
[gt1 gt2]
6Ă4 GeoTable over 6 GeometrySet{1,Float64}
a
b
c
geometry
Categorical
Categorical
Categorical
Segment
[NoUnits]
[NoUnits]
[NoUnits]
1
4
missing
Segment((0.0,), (1.0,))
2
5
missing
Segment((1.0,), (2.0,))
3
6
missing
Segment((2.0,), (3.0,))
missing
4
7
Segment((0.0,), (1.0,))
missing
5
8
Segment((1.0,), (2.0,))
missing
6
9
Segment((2.0,), (3.0,))
[gt1 gt2]
3Ă5 GeoTable over 3 CartesianGrid{1,Float64}
a
b
b_
c
geometry
Categorical
Categorical
Categorical
Categorical
Segment
[NoUnits]
[NoUnits]
[NoUnits]
[NoUnits]
1
4
4
7
Segment((0.0,), (1.0,))
2
5
5
8
Segment((1.0,), (2.0,))
3
6
6
9
Segment((2.0,), (3.0,))
9.3 Common predicates
In geospatial data science, the most common geometric predicates used in geojoin are â, â, ==, â and intersects as illustrated in Table 9.1. Specific applications may require custom predicates, which can be easily defined in pure Julia with the Meshes.jl module. For example, it is sometimes convenient to define geometric predicates in terms of a distance and a threshold:
Table 9.1: Common geometric predicates in geospatial join
PREDICATE
EXAMPLE
â
â
==
â
intersects
9.4 Multiple matches
Sometimes the predicate function will evaluate true for multiple rows of the geotables. In this case, we need to decide which value will be copied to the resulting geotable. The geojoin accepts reduction functions for each column. For example, suppose that Kate decided to move to the continent:
Now, both John and Kate live in Brazil according the countries geotable:
geojoin(people, countries)
5Ă6 GeoTable over 5 PointSet{2,Float64}
NAME
AGE
HEIGHT
COUNTRY
REGION
geometry
Categorical
Continuous
Continuous
Categorical
Categorical
Point2
[NoUnits]
[yr]
[m]
[NoUnits]
[NoUnits]
John
34.0 yr
1.78 m
Brazil
South America
(-43.1789, -22.9671)
Mary
12.0 yr
1.56 m
United States of America
Northern America
(-122.17, 37.4277)
Paul
23.0 yr
1.7 m
Australia
Australia and New Zealand
(153.044, -27.4862)
Anne
39.0 yr
1.8 m
China
Eastern Asia
(116.408, 39.9036)
Kate
28.0 yr
1.72 m
Brazil
South America
(-35.7126, -9.66628)
If we swap the order of the geotables in the geojoin, we need to decide how to reduce the multiple values of âNAMEâ, âAGEâ and âHEIGHTâ in the resulting geotable. By default, the mean function is used to reduce Continuous variables, and the first function is used otherwise:
geojoin(countries, people, kind =:inner)
4Ă6 GeoTable over 4 view(::GeometrySet{2,Float64}, [5, 30, 138, 140])
COUNTRY
REGION
NAME
AGE
HEIGHT
geometry
Categorical
Categorical
Categorical
Continuous
Continuous
MultiPolygon
[NoUnits]
[NoUnits]
[NoUnits]
[yr]
[m]
United States of America
Northern America
Mary
12.0 yr
1.56 m
Multi(10ĂPolyArea)
Brazil
South America
John
31.0 yr
1.75 m
Multi(1ĂPolyArea)
Australia
Australia and New Zealand
Paul
23.0 yr
1.7 m
Multi(2ĂPolyArea)
China
Eastern Asia
Anne
39.0 yr
1.8 m
Multi(2ĂPolyArea)
We can specify custom reduction functions using the following syntax:
Congratulations on finishing Part III of the book. Letâs quickly review what we learned so far:
In order to answer geoscientific questions with @groupby, @transform and @combine, we need a single geotable with all the relevant information. This single geotable is often the result of a geojoin with multiple geotables from different sources of information.
There are various kinds of geojoin such as inner and left, which match the behavior of standard join in databases. We recommend the DataFrames.jl documentation to learn more.
The geojoin produces matches with geometric predicate functions. The most common are illustrated in Table 9.1, but custom predicates can be easily defined in pure Julia using functions defined in the Meshes.jl module.
In the next part of the book, you will learn a final important tool that is missing in our toolkit for advanced geospatial data science. The concepts in the following chapters are extremely important.