Around the world with iNaturalist

Python
iNaturalist
Plotly
global
basemap
Author

José R. Ferrer-Paris

Published

March 17, 2025

Mapping Biodiversity: Visualizing iNaturalist Observations with Python

Today, we’ll embark on an adventure from the field to the digital realm. Let’s explore how to extract, map, and analyze global iNaturalist observations using Python.

Overview

In this blog post, we’ll walk through the following steps to map and analyze iNaturalist observations using Python:

  1. Download iNaturalist observations from around the world.
  2. Display these observations on a world map.
  3. Intersect observations with country and national boundaries.
  4. Generate interactive graphs to explore observations at different levels.

Tools and Libraries

I will be using a Python environment with a selection of my favourite libraries, as explained here.

We will be using the following libraries in this blog post:

from pyinaturalist import get_observations
import geopandas as gpd
import pandas as pd
import numpy as np
from shapely.geometry import Point
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import plotly.express as px
import zipfile, requests, io
  • pyinaturalist: A Python client for the iNaturalist API, allowing access to observation data.

  • Matplotlib basemap toolkit: A library for plotting 2D data on maps in Python.

  • geopandas: An extension of pandas that supports spatial data operations, making it easier to work with geographic datasets.

  • plotly: A graphing library that makes interactive plots and dashboards with ease.

Step-by-Step Guide

Step 1: Downloading iNaturalist Observations

Let’s begin by fetching iNaturalist observations with pyinaturalist. We will need a selection of global observations so we select the user neomapas1, and see where in the world they have been.

username = 'neomapas'
observations = get_observations(user_id=username, per_page=0)
n_obs = observations['total_results']

# First we need to figure out how many observations to expect:
print("User _{}_ has {} observations in iNaturalist".format(username,n_obs))
User _neomapas_ has 1134 observations in iNaturalist

The maximum number of observations we can download in each query 200, so we need to use pagination to get all results. For each query we will extract a minimum selection of fields that we will use for summarising the data (coordinates, place and species guess), but there are many other fields that could be important to include for more in depth explorations.

records=list()
j=1
while len(records) < n_obs:
    print("Requesting observations from user _{}_: page {}, total of {} observations downloaded".format(username,j,min(j*200,n_obs)))
    observations = get_observations(user_id=username,per_page=200,page=j)
    for obs in observations['results']:
        record = {
            'longitude': obs['location'][1],
            'latitude': obs['location'][0],
            'location': obs['place_guess'],
            'species guess': obs['species_guess'],
            #'quality': obs['quality_grade'],
            #'description': obs['description'],
            #'observed on': obs['observed_on'],
        }
        if len(obs['observation_photos'])>0:
            record['url'] = obs['observation_photos'][0]['photo']['url']
            record['attribution'] = obs['observation_photos'][0]['photo']['attribution']
        records.append(record)
    j=j+1
Requesting observations from user _neomapas_: page 1, total of 200 observations downloaded
Requesting observations from user _neomapas_: page 2, total of 400 observations downloaded
Requesting observations from user _neomapas_: page 3, total of 600 observations downloaded
Requesting observations from user _neomapas_: page 4, total of 800 observations downloaded
Requesting observations from user _neomapas_: page 5, total of 1000 observations downloaded
Requesting observations from user _neomapas_: page 6, total of 1134 observations downloaded

Now we need to bundle all these records into a data frame with geospatial information using geopandas. We define a data frame with pandas and transform the numeric variables latitude and longitude into a geometry with a explicit Coordinate Reference System (CRS):

gs = [Point(float(obs['longitude']), float(obs['latitude']))  for obs in records]
inat_obs_world=gpd.GeoDataFrame(records, geometry=gs, crs="EPSG:4326")

Step 2: Creating a Map of Observations

We will visualize these observation on top of a simple basemap using the Matplotlib basemap toolkit.

We can use a nice global projection like Kavrayskiy VII, then add the points from the coordinates in our dataframe:

m = Basemap(projection='kav7',lon_0=30)
m.drawmapboundary(fill_color='#99ffff')
m.fillcontinents(color='#cc9966',lake_color='#99ffff')

lats = inat_obs_world.latitude
lons = inat_obs_world.longitude
x, y = m(lons,lats)
m.scatter(x,y,3,marker='o',color='k')

plt.title('Locations of %s inat Observations' %\
        (len(lats)),fontsize=12)
plt.show()

Step 3: Intersecting with Country and region Boundaries

We’ll overlay these observations on administrative boundaries using geopandas.

World Bank data

I found a link to these good quality data in the World Bank Group Data Catalog

zip_url='https://datacatalogfiles.worldbank.org/ddh-published-v2/0038272/3/DR0046659/wb_countries_admin0_10m.zip'
r = requests.get(zip_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
for j in z.namelist():
    print(j)
WB_countries_Admin0_10m/
WB_countries_Admin0_10m/WB_countries_Admin0_10m.dbf
WB_countries_Admin0_10m/WB_countries_Admin0_10m.shp
WB_countries_Admin0_10m/WB_countries_Admin0_10m.shp.xml
WB_countries_Admin0_10m/WB_countries_Admin0_10m.prj
WB_countries_Admin0_10m/WB_countries_Admin0_10m.sbx
WB_countries_Admin0_10m/WB_countries_Admin0_10m.shx
WB_countries_Admin0_10m/WB_countries_Admin0_10m.cpg
WB_countries_Admin0_10m/WB_countries_Admin0_10m.sbn

Load the country boundaries from the web by combining the url to the zipfile and the file name within the zipfile:

shp_file = "WB_countries_Admin0_10m/WB_countries_Admin0_10m.shp"
remote_path = 'zip+{}!/{}'.format(zip_url, shp_file)

countries = gpd.read_file(remote_path)

And now intersect the data with a spatial join:

obs_with_countries = inat_obs_world.sjoin(countries, how='left')

But wait, we have some observations unassigned to countries:

obs_with_countries.loc[obs_with_countries.WB_NAME.isna()]

longitude latitude location species guess url attribution geometry index_right OBJECTID featurecla ... NAME_RU NAME_SV NAME_TR NAME_VI NAME_ZH WB_NAME WB_RULES WB_REGION Shape_Leng Shape_Area
25 126.427191 35.739611 Jeollabuk-do, KR Black-spotted Frog https://inaturalist-open-data.s3.amazonaws.com... (c) JR Ferrer-Paris, some rights reserved (CC BY) POINT (126.42719 35.73961) NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
75 150.863021 -34.671283 Kiama NSW 2533, Australia Red Wattlebird https://inaturalist-open-data.s3.amazonaws.com... (c) JR Ferrer-Paris, some rights reserved (CC BY) POINT (150.86302 -34.67128) NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
76 150.863021 -34.671283 Kiama NSW 2533, Australia Little Wattlebird https://inaturalist-open-data.s3.amazonaws.com... (c) JR Ferrer-Paris, some rights reserved (CC BY) POINT (150.86302 -34.67128) NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
77 150.863021 -34.671283 Kiama NSW 2533, Australia Purple Rock Crab https://inaturalist-open-data.s3.amazonaws.com... (c) JR Ferrer-Paris, some rights reserved (CC BY) POINT (150.86302 -34.67128) NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
160 -79.879595 -2.193212 Guayaquil, Guayas, Ecuador Golden-olive Woodpecker https://inaturalist-open-data.s3.amazonaws.com... (c) JR Ferrer-Paris, some rights reserved (CC BY) POINT (-79.8796 -2.19321) NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1076 -69.332623 11.578649 Falcón, VE roughbark lignum-vitae https://inaturalist-open-data.s3.amazonaws.com... (c) JR Ferrer-Paris, some rights reserved (CC BY) POINT (-69.33262 11.57865) NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1082 19.841887 60.228436 Finström, Åland Small Copper https://inaturalist-open-data.s3.amazonaws.com... (c) JR Ferrer-Paris, some rights reserved (CC BY) POINT (19.84189 60.22844) NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1083 20.030967 60.207284 Sund, Åland ベニシジミ https://inaturalist-open-data.s3.amazonaws.com... (c) JR Ferrer-Paris, some rights reserved (CC BY) POINT (20.03097 60.20728) NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1084 19.943076 60.220928 Finström, Åland Ähriger Ehrenpreis https://inaturalist-open-data.s3.amazonaws.com... (c) JR Ferrer-Paris, some rights reserved (CC BY) POINT (19.94308 60.22093) NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1096 -67.963415 10.495212 Barrio de la Base Naval, Puerto Cabello 2050, ... Blue Rainbow lizard https://inaturalist-open-data.s3.amazonaws.com... (c) JR Ferrer-Paris, some rights reserved (CC BY) POINT (-67.96341 10.49521) NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

68 rows × 60 columns

We can improve this by using a join based on the nearest feature, but we will need to use an appropriate projection (for example the Eckert IV projection, ESRI:54012):

obs_with_countries = gpd.sjoin_nearest(
    inat_obs_world.to_crs("ESRI:54012"), 
    countries.to_crs("ESRI:54012"), 
    distance_col="distances", 
    how="left")

ASGS structure

Since we have so many observations in Australia, we could break these down by regional areas. For example using the Australian Statistical Geographical Standard structure.

We go through similar steps as before, first check the zipfile:

zip_url='https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3/jul2021-jun2026/access-and-downloads/digital-boundary-files/SA4_2021_AUST_SHP_GDA2020.zip'
r = requests.get(zip_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
for j in z.namelist():
    print(j)
SA4_2021_AUST_GDA2020.dbf
SA4_2021_AUST_GDA2020.prj
SA4_2021_AUST_GDA2020.shp
SA4_2021_AUST_GDA2020.shx
SA4_2021_AUST_GDA2020.xml

Read the data from the url:

shp_file = "SA4_2021_AUST_GDA2020.shp"
remote_path = 'zip+{}!/{}'.format(zip_url, shp_file)
australia = gpd.read_file(zip_url)

Use the join with the nearest feature but truncate to 50 km maximum distance:

obs_with_countries_and_regions = gpd.sjoin_nearest(
    obs_with_countries, 
    australia.to_crs("ESRI:54012"), 
    distance_col="distances", 
    how="left", 
    max_distance=50000,
    lsuffix='global',
    rsuffix='australia')

Let’s double check that observations with Australian regional data come from Australia:

obs_with_countries_and_regions.loc[
    obs_with_countries_and_regions.STE_NAME21.notna(),
    "WB_NAME"].value_counts()

WB_NAME
Australia    712
Name: count, dtype: int64

And from which countries do the observations without Australian regional data come from?

obs_with_countries_and_regions.loc[
    obs_with_countries_and_regions.STE_NAME21.isna(),
    "WB_NAME"].value_counts()

WB_NAME
Venezuela, Republica Bolivariana de    172
Mexico                                  98
Colombia                                65
South Africa                            17
Kenya                                   14
Uganda                                  10
Poland                                   8
Finland                                  8
Rwanda                                   4
Germany                                  4
Spain                                    4
Belarus                                  3
Italy                                    3
Singapore                                3
Panama                                   2
Ecuador                                  2
Trinidad and Tobago                      2
Korea, Republic of                       1
Guyana                                   1
Switzerland                              1
Name: count, dtype: int64

We will need to fill in a placeholder value here to show these in the plot below:

obs_with_countries_and_regions.loc[
    obs_with_countries_and_regions.STE_NAME21.isna(),
    "STE_NAME21"] = "..."

Step 4: Interactive Graphs by Continent, Country, and Region

Finally, we’ll visualize the complete data using an interactive sunburst plot in plotly.

We start by grouping our data using continent, country and state and counting the number of unique species and places names.

aggfuncs = {'species guess':['count',pd.Series.nunique],
           'location':['count',pd.Series.nunique]}
colnames= ['CONTINENT','WB_NAME', # from Worldbank data
    'STE_NAME21'] # from ASGS data
obs_by_group=obs_with_countries_and_regions.groupby(colnames).agg(aggfuncs).reset_index()
obs_by_group.columns = [' '.join(col).strip() for col in obs_by_group.columns.values]

Now we create the plotly graph using this object. We can select to visualise data by number of unique species:

fig = px.sunburst(obs_by_group, path=colnames, values='species guess nunique', )
fig.show()

Or the count of observations based on locations:

fig = px.sunburst(obs_by_group, path=colnames, values='location count', )
fig.show()

And that’s it!

Conclusion

In summary, we:

  1. Downloaded observations with pyinaturalist.
  2. Mapped observations on a basemap of the world.
  3. Intersected observations with country and state boundaries using geopandas.
  4. Created insightful graphs with plotly.

These steps transform field observations into engaging digital narratives, offering powerful ways to visualize biodiversity. Whether you’re a scientist, a digital cartographer, or a nature enthusiast, these tools can enhance your connection to the natural world.

Happy mapping and exploring! 🌍

Feel free to share your own mapping adventures or ask questions in the comments.

Footnotes

  1. My alter ego in the iNat world↩︎