from pyinaturalist import get_observations
import pandas as pd
import numpy as np
from IPython.display import display, HTMLCurious curlews
Stone curlews
The stone-curlews, dikkops and thick-knees, belong to the family Burhinidae. There are 10 species distributed throughout the tropical and temperate parts of the world. Most species in this family have a preference for arid or semiarid habitats and I have encountered a few of them in my iNaturalist adventures.
Stone curlews are curious-looking birds with long legs, big eyes and an intense stare. The Bush stone-curlew is quite the Boogie Woogie Bird and I witnessed their blood-curdling shrieks during a visit to Cairns.
In this post I summarise these observations from different continents and keep track of the species in this family that I have already observed.
Overview
Here I reproduce the minimum number of steps required to create a summary of observation for the selected taxonomic group.
- Download iNaturalist observation and identification information for a specific query.
- Summarise identification information to organise records by family, genus and species.
- Group and summarise information based on taxonomic and geographic information.
Tools and Libraries
I will be using a Python environment with a selection of my favourite libraries, as explained here.
We will be using the following libraries in this blog post:
Step-by-Step Guide
Step 1: Downloading iNaturalist observations
Let’s begin by fetching iNaturalist observations with pyinaturalist. We will need a selection of global observations so we select the user neomapas1, and see where in the world they have been.
username = 'neomapas'
taxonid=[4917, ]
observations = get_observations(user_id=username, taxon_id=taxonid, per_page=0)
n_obs = observations['total_results']Despite the name, the stone-curlews are not closely related to the curlews in the sandpiper family Scolopacidae. For Curlews I would use ID: 3894
Let’s print an overview of these total results:
print("User {} has {} observations of Stone-curlews (taxon id {}) in iNaturalist".format(username,n_obs,taxonid))User neomapas has 7 observations of Stone-curlews (taxon id [4917]) in iNaturalist
The maximum number of observations we can download in each query 200, so we can download all restuls in one call and we will extract a selection of fields that we will use for summarising the observation records (coordinates, species guess, quality grade), and at the same time we will extract the taxonomic information from each identification.
records=list()
idrecords=list()
observations = get_observations(
user_id='neomapas',
taxon_id=taxonid,
per_page=1000)
for obs in observations['results']:
record = {
'uuid': obs['uuid'],
'quality': obs['quality_grade'],
'description': obs['description'],
'location': obs['place_guess'],
'country': obs['place_guess'].split(',')[-1],
'longitude': obs['location'][1],
'latitude': obs['location'][0],
'species guess': obs['species_guess'],
'observed on': obs['observed_on']
}
for id in obs['identifications']:
ca = id['category']
fch = id['created_at']
idrecord = {
'uuid': obs['uuid'],
'quality_grade': obs['quality_grade'],
'id_category': ca,
'created': fch}
for anc in id['taxon']['ancestors']:
idrecord[anc['rank']] = anc['name']
idrecord[id['taxon']['rank']] = id['taxon']['name']
idrecords.append(idrecord)
if len(obs['observation_photos'])>0:
record['url'] = obs['observation_photos'][0]['photo']['url']
records.append(record)This example requires extracting some additional information that is nested within the json structure of the API response. I explain some of the details in this post.
Later in the code, I will need to format a html string to define figures with captions, let’s do this now for each record in this list of records:
for record in records:
record['figure']="<figure class='medium'><a href='https://www.inaturalist.org/observations/%s' target=_blank><img src='%s' height=90><figcaption class='medium'>%s</figcaption></a></figure>" % (
record['uuid'],
record['url'],
record['location']
)We transform these sets of records into two data frames with pandas:
inat_obs=pd.DataFrame(records)
inat_ids=pd.DataFrame(idrecords)Step 2: Merging observation and identification information
A tricky step is to transform species guess information into full taxonomic information. Here I am using the identification information included in the response from the get_observation function to reconstruct the taxonomic information. The problem is that there are multiple id suggestions per observation, and we have to filter the unvalidated ids first.
In this case most of the observations are research grade:
inat_obs.groupby(['quality']).agg({"uuid": pd.Series.nunique})| uuid | |
|---|---|
| quality | |
| research | 7 |
Research grade observations will always have improving and supporting identifications:
inat_ids.groupby(['quality_grade','id_category']).agg({"uuid": pd.Series.nunique})| uuid | ||
|---|---|---|
| quality_grade | id_category | |
| research | improving | 7 |
| supporting | 7 |
We can use this trick to select the taxonomic information from the best id of each observation:
ss=inat_ids.id_category.isin(['improving','supporting'])
cols=['uuid','family','genus','species']
best_ids = inat_ids.loc[ss,cols].drop_duplicates().dropna()And now merge these best ids back with the observation records.
inat_obs_ids = inat_obs.join(best_ids.set_index('uuid'), on='uuid')Step 3: Number of unique observations and species per family
Now we can summarise the information in this combined dataframe to get the unique number of observations (with their unique universal ids, or uuid) and species for each family, genus, country and species:
aggfuns = {
"uuid": pd.Series.nunique,
"location": pd.Series.nunique
}
inat_obs_ids.groupby(['family','genus','country','species']).agg(aggfuns)| uuid | location | ||||
|---|---|---|---|---|---|
| family | genus | country | species | ||
| Burhinidae | Burhinus | AU | Burhinus grallarius | 2 | 2 |
| South Africa | Burhinus vermiculatus | 2 | 2 | ||
| Esacus | AU | Esacus magnirostris | 1 | 1 | |
| Hesperoburhinus | PE | Hesperoburhinus superciliaris | 1 | 1 | |
| Venezuela | Hesperoburhinus bistriatus | 1 | 1 |
Check out my posts about my observations in Cairns, Lima and my 2010 trip to South Africa.
Step 4: Gallery of images
These lines of code will prepare html code to preview photos for each observation. I group the photos by species, then I iterate across the grouped data frame to combine the information in column figure and join the figures in a container. I use the functions display and HTML to display the output in this document.
selection = (
inat_obs_ids
.groupby(['species'])
.agg({'figure':'unique'})
)
sections = list()
for idx,row in selection.iterrows():
sectionfigures=" ".join(row['figure'])
sectionname="<figure class='medium'><p class='figsection-medium'><i>%s</i></p></figure>" % idx
sections.append(sectionname + sectionfigures)
allsections="<div class='container'>%s</div>" % ("".join(sections))
display(HTML(allsections))Burhinus grallarius


Burhinus vermiculatus


Esacus magnirostris

Hesperoburhinus bistriatus

Hesperoburhinus superciliaris

The look of the output html code depends on the site’s css style definitions. Look at this file if you want to reuse/adapt my style.
Conclusion
In summary, we:
- Downloaded observations with
pyinaturalist. - Merged the identification information with the observation information.
- Used
pandasfunctions to group and aggregate the data. - Used
IPython.displayto create a gallery of images
Footnotes
My alter ego in the iNat world↩︎