from pyinaturalist import get_observations
import pandas as pd
from IPython.display import display, MarkdownWhat information is there in a single iNat obs?
The iNaturalist API provides a lot of information for each observation. Most of the time we are interested in the basic data about the observation (quality grade, species name, location, date observed, etc). Sometimes we need to drill down to get more complex data out of it, for example photos, identifications, the taxonomic hierarchy, etc.
Let’s dissect a single iNaturalist observation as queried by the pynaturalist library in Python. I use this example to illustrate some of the more complex code that I use in other posts.
Import libraries
We need to import the following libraries:
Querying the iNaturalist API
Here I use the function get_observation to query the iNaturalist API for one record:
qry = get_observations(uuid='03d78f46-b606-46f3-a3d1-b9b88baded80')This query returns a Python dictionary, a powerful built-in data type that stores key-value pairs for efficient data retrieval and manipulation. Let’s check the keys for this dictionary:
qry.keys()dict_keys(['total_results', 'page', 'per_page', 'results'])
Here we check the value for total_results and confirm that the query returned one, and only one, observation:
qry['total_results']1
The observation information is stored as a list of dictionaries under the results key:
type(qry['results'])<class 'list'>
So in this case we can extract the first element of the list with index 0:
obs = qry['results'][0]This observation is a dictionary with many, many key-value pairs:
print(obs.keys())dict_keys(['quality_grade', 'taxon_geoprivacy', 'annotations', 'uuid', 'observed_on_details', 'id', 'cached_votes_total', 'identifications_most_agree', 'created_at_details', 'species_guess', 'identifications_most_disagree', 'tags', 'positional_accuracy', 'comments_count', 'site_id', 'created_time_zone', 'license_code', 'observed_time_zone', 'quality_metrics', 'public_positional_accuracy', 'reviewed_by', 'oauth_application_id', 'flags', 'created_at', 'description', 'time_zone_offset', 'project_ids_with_curator_id', 'observed_on', 'observed_on_string', 'updated_at', 'sounds', 'place_ids', 'captive', 'taxon', 'ident_taxon_ids', 'outlinks', 'faves_count', 'ofvs', 'num_identification_agreements', 'preferences', 'identification_disagreements_count', 'comments', 'map_scale', 'uri', 'project_ids', 'community_taxon_id', 'geojson', 'owners_identification_from_vision', 'identifications_count', 'obscured', 'num_identification_disagreements', 'geoprivacy', 'location', 'votes', 'spam', 'user', 'mappable', 'identifications_some_agree', 'project_ids_without_curator_id', 'place_guess', 'identifications', 'project_observations', 'observation_photos', 'photos', 'faves', 'non_owner_ids'])
Some of the values of this dictionary are strings with unique values, others are list or dictionaries, and some are nested structures with certain degree of complexity. Here some examples:
Main bits of information
Id
The id and uuid values are unique identifiers for the observation, and both can be used to query the API, as exemplified above, or to construct an url to view the observation in the iNaturalist webpage:
oid = obs['uuid']
print("https://www.inaturalist.org/observations/{}".format(oid))https://www.inaturalist.org/observations/03d78f46-b606-46f3-a3d1-b9b88baded80
But there is also a uri value that provides a direct url as well:
print(obs['uri'])http://www.inaturalist.org/observations/1486060
Quality grade
The key for quality_grade stores the result of iNat’s Data Quality Assessment. Verifiable observations are labeled Needs ID until they either attain Research Grade status, or are voted to Casual.
for k in [
'quality_grade',
'quality_metrics']:
print(k, obs[k])quality_grade research
quality_metrics []
Place and species guess
The simplest way to query spatial information is to use the location value, which returns a pair of coordinates (latitude and longitude). The place guess value provides a name or sometimes an address summary for that location.
for k in [
'place_guess',
'location',
'positional_accuracy',
'public_positional_accuracy',
'place_ids',
'geojson', 'geoprivacy', 'mappable',
'site_id']:
print(k, obs[k])place_guess Ecoparque Ojo de Agua El Cardón
location [10.47873, -71.16042]
positional_accuracy 1000
public_positional_accuracy 1000
place_ids [1303, 7502, 47887, 56773, 66741, 82257, 97389]
geojson {'type': 'Point', 'coordinates': [-71.16042, 10.47873]}
geoprivacy None
mappable True
site_id 9
The other keys provide other supporting spatial information. In this case, the place guess does not include the country or other context for the location. This information can be obtained using the place_ids, but I will need to explain that in another post.
Species guess
Similarly, for the taxonomic information, the species_guess values is a condensed identification summary.
print(obs['species_guess'])Arawacus lincoides
In this case the species guess is equal to the name of the species. In some cases it might be a different taxonomic rank, depending on community id for the observation. For example in this case the following values show some disagreement on the identification, but a consensus was reached for the current species name. I will discuss more on this below.
for k in [
'owners_identification_from_vision',
'identifications_count',
'identifications_most_agree',
'community_taxon_id',
'identifications_most_disagree',
'num_identification_disagreements']:
print(k, obs[k])owners_identification_from_vision False
identifications_count 4
identifications_most_agree True
community_taxon_id 144209
identifications_most_disagree False
num_identification_disagreements 1
Date of observation and creation
Temporal information about the observation on the field, and the creation of the record is stored in several key-value pairs using different format:
for k in [
'observed_on',
'observed_on_string',
'observed_on_details',
'observed_time_zone',
'created_at',
'created_at_details',
'updated_at']:
print(k, obs[k])observed_on 2015-01-10
observed_on_string 2015-01-10
observed_on_details {'date': '2015-01-10', 'day': 10, 'month': 1, 'year': 2015, 'hour': 0, 'week': 2}
observed_time_zone America/Caracas
created_at 2015-05-13 19:28:38-04:30
created_at_details {'date': '2015-05-13', 'day': 13, 'month': 5, 'year': 2015, 'hour': 19, 'week': 20}
updated_at 2021-10-21 19:47:00-04:00
Photo
Information about photos associated with the information are in two keys.
obs['observation_photos'][ { 'id': 1723528, 'position': 0, 'uuid': '51723b5e-2eec-4bd4-ba40-2b6d08d700d6', 'photo_id': 1841539, 'photo': { 'id': 1841539, 'license_code': 'cc-by', 'original_dimensions': {'width': 2048, 'height': 1536}, 'url': 'https://inaturalist-open-data.s3.amazonaws.com/photos/1841539/square.JPG', 'attribution': '(c) JR Ferrer-Paris, some rights reserved (CC BY)', 'flags': [], 'moderator_actions': [], 'hidden': False } } ]
The value of photos is a list of dictionaries. This observation has only one photo:
obs['photos'][ { 'id': 1841539, 'license_code': 'cc-by', 'original_dimensions': {'width': 2048, 'height': 1536}, 'url': 'https://inaturalist-open-data.s3.amazonaws.com/photos/1841539/square.JPG', 'attribution': '(c) JR Ferrer-Paris, some rights reserved (CC BY)', 'flags': [], 'moderator_actions': [], 'hidden': False } ]
We can use this information to generate a markdown text with the image and attribution text. I replaced the square with medium JPG file to show a larger version of the file.
photo = " {attr}".format(
url=obs['photos'][0]['url'].replace('square','medium'),
attr=obs['photos'][0]['attribution']
)
photo' (c) JR Ferrer-Paris, some rights reserved (CC BY)'
And then use the Markdown and display functions to display this image in this document.
mdtext = Markdown(photo)
display(mdtext) (c) JR Ferrer-Paris, some rights reserved (CC BY)
oauth Application id
This field has information about the software app that was used to upload the record to iNaturalist. It is empty when the record is created directly in the iNaturalist website.
obs['oauth_application_id']I used this to summarise records created with different tools.
Multiple ids and their taxonomic information
One of the complex aspects of iNat observations is the accumulation over time of multiple identification for a single observation.Research Grade observations have identifications supported by two or more users. Each observation is categorised depending on how it relates to the community taxon id.
Let’s see an example: we iterate here over all elements under identifications key, extract the category and created_at values, the add all the taxonomic name of the ancestors, and the taxon name element.
idrecords = list()
for id in obs['identifications']:
ca = id['category']
fch = id['created_at']
idrecord = {'id_category': ca, 'created': fch}
for anc in id['taxon']['ancestors']:
idrecord[anc['rank']] = anc['name']
idrecord[id['taxon']['rank']] = id['taxon']['name']
idrecords.append(idrecord)Here we have this number of identifications:
len(idrecords)5
We transform these idrecords into a dataframe and sort the data by the date of the identification:
ids = pd.DataFrame(idrecords).sort_values(['created'])Now we can explore the table:
ids| id_category | created | kingdom | phylum | subphylum | class | subclass | order | superfamily | family | subfamily | tribe | subtribe | genus | species | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | maverick | 2015-05-13T23:58:39+00:00 | Animalia | Arthropoda | Hexapoda | Insecta | Pterygota | Lepidoptera | Papilionoidea | Lycaenidae | Theclinae | Eumaeini | Strymonina | Arawacus | Arawacus togarna |
| 4 | maverick | 2018-04-22T18:28:16+00:00 | Animalia | Arthropoda | Hexapoda | Insecta | Pterygota | Lepidoptera | Papilionoidea | Lycaenidae | Theclinae | Eumaeini | Strymonina | Arawacus | Arawacus togarna |
| 1 | improving | 2020-12-08T19:51:16+00:00 | Animalia | Arthropoda | Hexapoda | Insecta | Pterygota | Lepidoptera | Papilionoidea | Lycaenidae | Theclinae | Eumaeini | Strymonina | Arawacus | Arawacus lincoides |
| 2 | supporting | 2020-12-09T01:50:31+00:00 | Animalia | Arthropoda | Hexapoda | Insecta | Pterygota | Lepidoptera | Papilionoidea | Lycaenidae | Theclinae | Eumaeini | Strymonina | Arawacus | Arawacus lincoides |
| 3 | supporting | 2021-10-21T23:47:00+00:00 | Animalia | Arthropoda | Hexapoda | Insecta | Pterygota | Lepidoptera | Papilionoidea | Lycaenidae | Theclinae | Eumaeini | Strymonina | Arawacus | Arawacus lincoides |
In this case the two older identification used a different species name under the Arawacus genus and are considered maverick ids, the third observation is the first to suggest the name Arawacus lincoides and is categorised as leading. The last two observations agree with this identification and are categorised as supporting.
Comments and annotations
There are several keys with information added to the observation in form of tags, flags, annotations, etc.
And comments can be extracted as a list of dictionaries: