from pyinaturalist import get_observations
import geopandas as gpd
import pandas as pd
import numpy as np
from shapely.geometry import Point
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import plotly.express as px
import zipfile, requests, io
Mapping Biodiversity: Visualizing iNaturalist Observations with Python
Today, we’ll embark on an adventure from the field to the digital realm. Let’s explore how to extract, map, and analyze global iNaturalist observations using Python.
Overview
In this blog post, we’ll walk through the following steps to map and analyze iNaturalist observations using Python:
- Download iNaturalist observations from around the world.
- Display these observations on a world map.
- Intersect observations with country and national boundaries.
- Generate interactive graphs to explore observations at different levels.
Tools and Libraries
I will be using a Python environment with a selection of my favourite libraries, as explained here.
We will be using the following libraries in this blog post:
pyinaturalist: A Python client for the iNaturalist API, allowing access to observation data.
Matplotlib basemap toolkit: A library for plotting 2D data on maps in Python.
geopandas: An extension of pandas that supports spatial data operations, making it easier to work with geographic datasets.
plotly: A graphing library that makes interactive plots and dashboards with ease.
Step-by-Step Guide
Step 1: Downloading iNaturalist Observations
Let’s begin by fetching iNaturalist observations with pyinaturalist
. We will need a selection of global observations so we select the user neomapas
1, and see where in the world they have been.
= 'neomapas'
username = get_observations(user_id=username, per_page=0)
observations = observations['total_results']
n_obs
# First we need to figure out how many observations to expect:
print("User _{}_ has {} observations in iNaturalist".format(username,n_obs))
User _neomapas_ has 1134 observations in iNaturalist
The maximum number of observations we can download in each query 200, so we need to use pagination to get all results. For each query we will extract a minimum selection of fields that we will use for summarising the data (coordinates, place and species guess), but there are many other fields that could be important to include for more in depth explorations.
=list()
records=1
jwhile len(records) < n_obs:
print("Requesting observations from user _{}_: page {}, total of {} observations downloaded".format(username,j,min(j*200,n_obs)))
= get_observations(user_id=username,per_page=200,page=j)
observations for obs in observations['results']:
= {
record 'longitude': obs['location'][1],
'latitude': obs['location'][0],
'location': obs['place_guess'],
'species guess': obs['species_guess'],
#'quality': obs['quality_grade'],
#'description': obs['description'],
#'observed on': obs['observed_on'],
}if len(obs['observation_photos'])>0:
'url'] = obs['observation_photos'][0]['photo']['url']
record['attribution'] = obs['observation_photos'][0]['photo']['attribution']
record[
records.append(record)=j+1 j
Requesting observations from user _neomapas_: page 1, total of 200 observations downloaded
Requesting observations from user _neomapas_: page 2, total of 400 observations downloaded
Requesting observations from user _neomapas_: page 3, total of 600 observations downloaded
Requesting observations from user _neomapas_: page 4, total of 800 observations downloaded
Requesting observations from user _neomapas_: page 5, total of 1000 observations downloaded
Requesting observations from user _neomapas_: page 6, total of 1134 observations downloaded
Now we need to bundle all these records into a data frame with geospatial information using geopandas
. We define a data frame with pandas
and transform the numeric variables latitude
and longitude
into a geometry with a explicit Coordinate Reference System (CRS):
= [Point(float(obs['longitude']), float(obs['latitude'])) for obs in records]
gs =gpd.GeoDataFrame(records, geometry=gs, crs="EPSG:4326") inat_obs_world
Step 2: Creating a Map of Observations
We will visualize these observation on top of a simple basemap using the Matplotlib basemap toolkit
.
We can use a nice global projection like Kavrayskiy VII, then add the points from the coordinates in our dataframe:
= Basemap(projection='kav7',lon_0=30)
m ='#99ffff')
m.drawmapboundary(fill_color='#cc9966',lake_color='#99ffff')
m.fillcontinents(color
= inat_obs_world.latitude
lats = inat_obs_world.longitude
lons = m(lons,lats)
x, y 3,marker='o',color='k')
m.scatter(x,y,
'Locations of %s inat Observations' %\
plt.title(len(lats)),fontsize=12)
( plt.show()
Step 3: Intersecting with Country and region Boundaries
We’ll overlay these observations on administrative boundaries using geopandas
.
World Bank data
I found a link to these good quality data in the World Bank Group Data Catalog
='https://datacatalogfiles.worldbank.org/ddh-published-v2/0038272/3/DR0046659/wb_countries_admin0_10m.zip'
zip_url= requests.get(zip_url)
r = zipfile.ZipFile(io.BytesIO(r.content))
z for j in z.namelist():
print(j)
WB_countries_Admin0_10m/
WB_countries_Admin0_10m/WB_countries_Admin0_10m.dbf
WB_countries_Admin0_10m/WB_countries_Admin0_10m.shp
WB_countries_Admin0_10m/WB_countries_Admin0_10m.shp.xml
WB_countries_Admin0_10m/WB_countries_Admin0_10m.prj
WB_countries_Admin0_10m/WB_countries_Admin0_10m.sbx
WB_countries_Admin0_10m/WB_countries_Admin0_10m.shx
WB_countries_Admin0_10m/WB_countries_Admin0_10m.cpg
WB_countries_Admin0_10m/WB_countries_Admin0_10m.sbn
Load the country boundaries from the web by combining the url to the zipfile and the file name within the zipfile:
= "WB_countries_Admin0_10m/WB_countries_Admin0_10m.shp"
shp_file = 'zip+{}!/{}'.format(zip_url, shp_file)
remote_path
= gpd.read_file(remote_path) countries
And now intersect the data with a spatial join:
= inat_obs_world.sjoin(countries, how='left') obs_with_countries
But wait, we have some observations unassigned to countries:
obs_with_countries.loc[obs_with_countries.WB_NAME.isna()]
longitude | latitude | location | species guess | url | attribution | geometry | index_right | OBJECTID | featurecla | ... | NAME_RU | NAME_SV | NAME_TR | NAME_VI | NAME_ZH | WB_NAME | WB_RULES | WB_REGION | Shape_Leng | Shape_Area | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
25 | 126.427191 | 35.739611 | Jeollabuk-do, KR | Black-spotted Frog | https://inaturalist-open-data.s3.amazonaws.com... | (c) JR Ferrer-Paris, some rights reserved (CC BY) | POINT (126.42719 35.73961) | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
75 | 150.863021 | -34.671283 | Kiama NSW 2533, Australia | Red Wattlebird | https://inaturalist-open-data.s3.amazonaws.com... | (c) JR Ferrer-Paris, some rights reserved (CC BY) | POINT (150.86302 -34.67128) | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
76 | 150.863021 | -34.671283 | Kiama NSW 2533, Australia | Little Wattlebird | https://inaturalist-open-data.s3.amazonaws.com... | (c) JR Ferrer-Paris, some rights reserved (CC BY) | POINT (150.86302 -34.67128) | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
77 | 150.863021 | -34.671283 | Kiama NSW 2533, Australia | Purple Rock Crab | https://inaturalist-open-data.s3.amazonaws.com... | (c) JR Ferrer-Paris, some rights reserved (CC BY) | POINT (150.86302 -34.67128) | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
160 | -79.879595 | -2.193212 | Guayaquil, Guayas, Ecuador | Golden-olive Woodpecker | https://inaturalist-open-data.s3.amazonaws.com... | (c) JR Ferrer-Paris, some rights reserved (CC BY) | POINT (-79.8796 -2.19321) | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1076 | -69.332623 | 11.578649 | Falcón, VE | roughbark lignum-vitae | https://inaturalist-open-data.s3.amazonaws.com... | (c) JR Ferrer-Paris, some rights reserved (CC BY) | POINT (-69.33262 11.57865) | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1082 | 19.841887 | 60.228436 | Finström, Åland | Small Copper | https://inaturalist-open-data.s3.amazonaws.com... | (c) JR Ferrer-Paris, some rights reserved (CC BY) | POINT (19.84189 60.22844) | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1083 | 20.030967 | 60.207284 | Sund, Åland | ベニシジミ | https://inaturalist-open-data.s3.amazonaws.com... | (c) JR Ferrer-Paris, some rights reserved (CC BY) | POINT (20.03097 60.20728) | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1084 | 19.943076 | 60.220928 | Finström, Åland | Ähriger Ehrenpreis | https://inaturalist-open-data.s3.amazonaws.com... | (c) JR Ferrer-Paris, some rights reserved (CC BY) | POINT (19.94308 60.22093) | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1096 | -67.963415 | 10.495212 | Barrio de la Base Naval, Puerto Cabello 2050, ... | Blue Rainbow lizard | https://inaturalist-open-data.s3.amazonaws.com... | (c) JR Ferrer-Paris, some rights reserved (CC BY) | POINT (-67.96341 10.49521) | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
68 rows × 60 columns
We can improve this by using a join based on the nearest feature, but we will need to use an appropriate projection (for example the Eckert IV projection, ESRI:54012
):
= gpd.sjoin_nearest(
obs_with_countries "ESRI:54012"),
inat_obs_world.to_crs("ESRI:54012"),
countries.to_crs(="distances",
distance_col="left") how
ASGS structure
Since we have so many observations in Australia, we could break these down by regional areas. For example using the Australian Statistical Geographical Standard structure.
We go through similar steps as before, first check the zipfile:
='https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3/jul2021-jun2026/access-and-downloads/digital-boundary-files/SA4_2021_AUST_SHP_GDA2020.zip'
zip_url= requests.get(zip_url)
r = zipfile.ZipFile(io.BytesIO(r.content))
z for j in z.namelist():
print(j)
SA4_2021_AUST_GDA2020.dbf
SA4_2021_AUST_GDA2020.prj
SA4_2021_AUST_GDA2020.shp
SA4_2021_AUST_GDA2020.shx
SA4_2021_AUST_GDA2020.xml
Read the data from the url:
= "SA4_2021_AUST_GDA2020.shp"
shp_file = 'zip+{}!/{}'.format(zip_url, shp_file)
remote_path = gpd.read_file(zip_url) australia
Use the join with the nearest feature but truncate to 50 km maximum distance:
= gpd.sjoin_nearest(
obs_with_countries_and_regions
obs_with_countries, "ESRI:54012"),
australia.to_crs(="distances",
distance_col="left",
how=50000,
max_distance='global',
lsuffix='australia') rsuffix
Let’s double check that observations with Australian regional data come from Australia:
obs_with_countries_and_regions.loc[
obs_with_countries_and_regions.STE_NAME21.notna(),"WB_NAME"].value_counts()
WB_NAME
Australia 712
Name: count, dtype: int64
And from which countries do the observations without Australian regional data come from?
obs_with_countries_and_regions.loc[
obs_with_countries_and_regions.STE_NAME21.isna(),"WB_NAME"].value_counts()
WB_NAME Venezuela, Republica Bolivariana de 172 Mexico 98 Colombia 65 South Africa 17 Kenya 14 Uganda 10 Poland 8 Finland 8 Rwanda 4 Germany 4 Spain 4 Belarus 3 Italy 3 Singapore 3 Panama 2 Ecuador 2 Trinidad and Tobago 2 Korea, Republic of 1 Guyana 1 Switzerland 1 Name: count, dtype: int64
We will need to fill in a placeholder value here to show these in the plot below:
obs_with_countries_and_regions.loc[
obs_with_countries_and_regions.STE_NAME21.isna(),"STE_NAME21"] = "..."
Step 4: Interactive Graphs by Continent, Country, and Region
Finally, we’ll visualize the complete data using an interactive sunburst plot in plotly
.
We start by grouping our data using continent, country and state and counting the number of unique species and places names.
= {'species guess':['count',pd.Series.nunique],
aggfuncs 'location':['count',pd.Series.nunique]}
= ['CONTINENT','WB_NAME', # from Worldbank data
colnames'STE_NAME21'] # from ASGS data
=obs_with_countries_and_regions.groupby(colnames).agg(aggfuncs).reset_index()
obs_by_group= [' '.join(col).strip() for col in obs_by_group.columns.values] obs_by_group.columns
Now we create the plotly graph using this object. We can select to visualise data by number of unique species:
= px.sunburst(obs_by_group, path=colnames, values='species guess nunique', )
fig fig.show()
Or the count of observations based on locations:
= px.sunburst(obs_by_group, path=colnames, values='location count', )
fig fig.show()
And that’s it!
Conclusion
In summary, we:
- Downloaded observations with
pyinaturalist
. - Mapped observations on a
basemap
of the world. - Intersected observations with country and state boundaries using
geopandas
. - Created insightful graphs with
plotly
.
These steps transform field observations into engaging digital narratives, offering powerful ways to visualize biodiversity. Whether you’re a scientist, a digital cartographer, or a nature enthusiast, these tools can enhance your connection to the natural world.
Happy mapping and exploring! 🌍
Feel free to share your own mapping adventures or ask questions in the comments.
Footnotes
My alter ego in the iNat world↩︎