from pyinaturalist import get_observations
import pandas as pd
import numpy as np
import geopandas as gpd
from IPython.display import display, HTMLAll my Butterfly observations around the world
Papilionoidea
My butterfly observations in iNaturalist
One of my beloved subjects of iNat explorations are the butterflies. I have a long story with them: during my biology studies I worked with endemic Brown butterflies (Satyridae) in the Páramos of Venezuela; in my PhD I was monitoring distribution and abundance of White and yellow butterflies (Pieridae) across the country; and for my postdoc in South Africa I analysed the global relationships between butterfly species and their larval host plants. Now my research focus has shifted, but I still like to record the butterflies that I detect in my nature walks.
Here I want to find out how many species I have recorded in each of the countries I have visited, and break this down by families and subfamilies.
Overview
Here I reproduce the minimum number of steps required to create a summary of observation for the selected taxonomic group and regions of interest.
- Download iNaturalist observation and identification information for a specific query.
- Download spatial data representing regions of interest and
- Intersect observations with these boundaries.
- Group and summarise information based on taxonomic and geographic information.
Tools and Libraries
I will be using a Python environment with a selection of my favourite libraries, as explained here.
We will be using the following libraries in this blog post:
Step-by-Step Guide
Step 1: Downloading iNaturalist observations
Let’s begin by fetching iNaturalist observations with pyinaturalist. We will need a selection of global observations so we select the user neomapas, and see where in the world they have been.
username = 'neomapas'
taxonid=47224
observations = get_observations(user_id=username, taxon_id=taxonid, per_page=0)
n_obs = observations['total_results']Let’s print an overview of these total results:
print("User {} has {} observations of Papilionoidea (taxon id {}) in iNaturalist".format(username,n_obs,taxonid))User neomapas has 401 observations of Papilionoidea (taxon id 47224) in iNaturalist
Updating this after my visit to Bali!
The maximum number of observations we can download in each query 200, so we need to use pagination to get all results. For each query we will extract a selection of fields that we will use for summarising the observation records (coordinates, species guess, quality grade), and at the same time we will extract the taxonomic information from each identification.
records=list()
idrecords=list()
msg="Requesting observations from user _{}_: page {}, total of {} observations downloaded"
j=1
while len(records) < n_obs:
print(msg.format(username,j,min(j*200,n_obs)))
observations = get_observations(
user_id='neomapas',
taxon_id=47224,
per_page=1000,
page=j)
for obs in observations['results']:
record = {
'uuid': obs['uuid'],
'quality': obs['quality_grade'],
'description': obs['description'],
'location': obs['place_guess'],
'longitude': obs['location'][1],
'latitude': obs['location'][0],
'species guess': obs['species_guess'],
'observed on': obs['observed_on'],
'points': obs['faves_count'] * 10 + obs['comments_count'] + obs['identifications_count'] * 3,
}
for id in obs['identifications']:
ca = id['category']
fch = id['created_at']
idrecord = {
'uuid': obs['uuid'],
'quality_grade': obs['quality_grade'],
'id_category': ca,
'created': fch}
for anc in id['taxon']['ancestors']:
idrecord[anc['rank']] = anc['name']
idrecord[id['taxon']['rank']] = id['taxon']['name']
idrecords.append(idrecord)
if len(obs['observation_photos'])>0:
record['url'] = obs['observation_photos'][0]['photo']['url']
record['attribution'] = obs['observation_photos'][0]['photo']['attribution']
records.append(record)
j=j+1Requesting observations from user _neomapas_: page 1, total of 200 observations downloaded
Requesting observations from user _neomapas_: page 2, total of 400 observations downloaded
Requesting observations from user _neomapas_: page 3, total of 401 observations downloaded
This example requires extracting some additional information that is nested within the json structure of the API response. I explain some of the details in this post.
We transform these sets of records into two data frames with pandas:
inat_obs=pd.DataFrame(records)
inat_ids=pd.DataFrame(idrecords)Step 2: Merging observation and identification information
A tricky step is to transform species guess information into full taxonomic information. Here I am using the identification information included in the response from the get_observation function to reconstruct the taxonomic information. The problem is that there are multiple id suggestions per observation, and we have to filter the unvalidated ids first.
In this case most of my butterfly observations are research grade:
inat_obs.groupby(['quality']).agg({"uuid": pd.Series.nunique})| uuid | |
|---|---|
| quality | |
| casual | 1 |
| needs_id | 157 |
| research | 243 |
Research grade observations will always have improving and supporting identifications:
inat_ids.groupby(['quality_grade','id_category']).agg({"uuid": pd.Series.nunique})| uuid | ||
|---|---|---|
| quality_grade | id_category | |
| casual | improving | 1 |
| leading | 1 | |
| needs_id | improving | 78 |
| leading | 152 | |
| maverick | 2 | |
| supporting | 6 | |
| research | improving | 243 |
| leading | 24 | |
| maverick | 44 | |
| supporting | 231 |
We can use this trick to select the taxonomic information from the best id of each observation:
ss=inat_ids.id_category.isin(['improving','supporting'])
cols=['uuid','family','subfamily']
best_ids = inat_ids.loc[ss,cols].drop_duplicates().dropna()And now merge these best ids back with the observation records.
inat_obs_ids = inat_obs.join(best_ids.set_index('uuid'), on='uuid')Later in the code, I will need to format a html string to define figures with captions, let’s do this now for each record in this list of records:
inat_obs_ids['figure'] = [
"<figure class='mini'><a href='https://www.inaturalist.org/observations/%s' target=_blank><img src='%s' height=50><figcaption class='mini'>%s: <i>%s</i></figcaption></a></figure>" % (
record['uuid'],
record['url'],
record['family'],
record['species guess'])
for idx,record in inat_obs_ids.iterrows()
]Step 3: Number of unique observations and species per family
Now we can summarise the information in this combined dataframe to get the unique number of observations (with their unique universal ids, or uuid) and species for each family and subfamily of Butterflies:
aggfuns = {
"uuid": pd.Series.nunique,
"species guess": pd.Series.nunique
}
inat_obs_ids.groupby(['family','subfamily',]).agg(aggfuns)| uuid | species guess | ||
|---|---|---|---|
| family | subfamily | ||
| Hesperiidae | Eudaminae | 1 | 1 |
| Hesperiinae | 3 | 2 | |
| Pyrginae | 3 | 3 | |
| Tagiadinae | 1 | 1 | |
| Trapezitinae | 3 | 2 | |
| Lycaenidae | Lycaeninae | 4 | 3 |
| Polyommatinae | 42 | 21 | |
| Theclinae | 4 | 4 | |
| Nymphalidae | Apaturinae | 1 | 1 |
| Biblidinae | 10 | 9 | |
| Charaxinae | 1 | 1 | |
| Danainae | 20 | 9 | |
| Heliconiinae | 14 | 10 | |
| Limenitidinae | 5 | 4 | |
| Nymphalinae | 36 | 26 | |
| Satyrinae | 31 | 15 | |
| Papilionidae | Papilioninae | 19 | 14 |
| Parnassiinae | 1 | 1 | |
| Pieridae | Coliadinae | 17 | 14 |
| Pierinae | 43 | 28 | |
| Riodinidae | Euselasiinae | 1 | 1 |
| Riodininae | 3 | 3 |
Step 4: Number of unique observations and species ids per country
In order to combine the observation records with external spatial information, we need to add proper geospatial information to this data frame using geopandas. For this, we first transform the numeric variables latitude and longitude into a geometry with a explicit Coordinate Reference System (CRS):
gs = gpd.points_from_xy(inat_obs_ids.longitude, inat_obs_ids.latitude, crs="EPSG:4326")
inat_obs_xy=gpd.GeoDataFrame(inat_obs_ids, geometry=gs)Now we need to download the external geospatial data representing the country boundaries. This is very easy to do thanks to the great features of geopandas.read_file function.
For global data I like to use the high resolution, World Bank Official Administrative Boundaries available from the World Bank Group Data Catalog. This services provides a link to access the a zip file with the spatial vector files, and we need to construct a remote path to read the shapefile inside the zipfile:
zip_url='https://datacatalogfiles.worldbank.org/ddh-published-v2/0038272/3/DR0046659/wb_countries_admin0_10m.zip'
shp_file = "WB_countries_Admin0_10m/WB_countries_Admin0_10m.shp"
remote_path = 'zip+{}!/{}'.format(zip_url, shp_file)In this way geopandas is able to download and read the file from the cloud:
WB0 = gpd.read_file(remote_path)Now we are ready to overlay the iNaturalist observations onto the administrative boundaries using another geopandas function: sjoin.
inat_obs_world = inat_obs_xy.sjoin(WB0, how="left")And finally we can aggregate information by country name (column WB_NAME):
aggfuns = {
"uuid": pd.Series.nunique,
"species guess": pd.Series.nunique
}
inat_obs_world.groupby("WB_NAME").agg(aggfuns)| uuid | species guess | |
|---|---|---|
| WB_NAME | ||
| Australia | 104 | 45 |
| Colombia | 3 | 3 |
| Costa Rica | 2 | 2 |
| Finland | 5 | 5 |
| Germany | 1 | 1 |
| Indonesia | 46 | 28 |
| Kenya | 5 | 5 |
| Mexico | 36 | 34 |
| Panama | 3 | 3 |
| Peru | 7 | 4 |
| Rwanda | 7 | 6 |
| Singapore | 2 | 1 |
| South Africa | 33 | 22 |
| Switzerland | 1 | 1 |
| Tajikistan | 4 | 4 |
| Uganda | 2 | 2 |
| United Arab Emirates | 1 | 1 |
| Venezuela, Republica Bolivariana de | 100 | 81 |
Step 5: Displaying a sample of observations
Now we combine spatial and taxonomic information to get a wall of pictures showing the most interesting observations for each butterfly family in each of the countries I have visited.
These lines of code perform a couple of tricks. I group the data twice, first I do the selection based on the points column for each combination of country and family, then I iterate across the countries and join the figures in a list. I then use display and HTML functions to read the formatted text strings as html elements1 to organise the figures and captions on this webpage.
selection = (
inat_obs_world
.sort_values('points')
.groupby(['WB_NAME','family'])
.first()
.groupby(['WB_NAME'])
.agg({'figure':'unique'})
)
sections = list()
for idx,row in selection.iterrows():
sectionfigures=" ".join(row['figure'])
sectionname="<figure class='mini'><p class='figsection'>%s </p></figure>" % idx
sections.append(sectionname + sectionfigures)
allsections="<div class='container'>%s</div>" % ("".join(sections))
display(HTML(allsections))Australia





Colombia

Costa Rica

Finland


Germany

Indonesia




Kenya


Mexico






Panama

Rwanda



Singapore

South Africa



Switzerland

Tajikistan



Uganda


United Arab Emirates

Venezuela, Republica Bolivariana de





Conclusion
In summary, we:
- Downloaded observations with
pyinaturalist. - Merged the identification information with the observation information.
- Intersected observations with country boundaries using
geopandas. - Used
pandasfunctions to group and aggregate the data.