Bee-utiful world!

Anthophila

Python
global
Apoidea
plotly
Author

José R. Ferrer-Paris

Published

February 14, 2026

Modified

May 2, 2026

My Bee observations in iNaturalist

My daughter loves bees and bees love her. A colony of bees nested on our house when she was about to be born, and we have this family connection with them. Always when I see a bee I smile… and if I have a camera at hand, I also make a photo to upload to iNat.

Here is a summary of the ones that are currently there, with regular updates comming each time I do a field trip or excursion. This post will include a list of how many species I have recorded in each of the countries I have visited, and break this down by families and subfamilies.

To bee or not to bee

Following the book of Laurence Parker, Bees of the world, I am using the taxon Anthophila as the root for my search. This is ranked as an Epifamily in iNats taxonomy, and includes seven families of bees.

Overview

In this post I bring together a personal and data‑driven exploration of my bee observations on iNaturalist. Starting from a single taxonomic concept — Anthophila, the bees — I retrieve all my observations worldwide and enrich them with taxonomic and geographic context. The workflow combines:

  • Automated data retrieval from the iNaturalist API
  • Taxonomic enrichment, from family down to species
  • Spatial summarisation, linking observations to countries and subnational regions
  • Interactive visualisation, using treemaps and image grids to explore patterns

The result is both a snapshot of my own natural history journey and a reproducible approach for anyone interested in analysing their biodiversity records.

Tools and Libraries

I will be using a Python environment with a selection of my favourite libraries, as explained here.

We will be using the following libraries in this blog post:

from pyinaturalist import get_observations
from pyinaturalist import (
    get_observations,
    get_taxa_by_id,
    get_places_by_id
    )
import pandas as pd
##import numpy as np
##import geopandas as gpd
##from IPython.display import display, HTML
from itertools import islice
import plotly.express as px
from IPython.display import display, HTML

Step-by-Step Guide

The following steps take us from raw API responses to structured summaries and visual outputs. While the implementation is specific to bees and my own account, the same approach can be generalised to any taxon or iNaturalist user.

Step 1: Downloading iNaturalist observations

Let’s begin by fetching iNaturalist observations with pyinaturalist. We will need a selection of global observations so we select the user neomapas, and see where in the world they have been.

username = 'neomapas'
taxonid=630955
observations = get_observations(user_id=username, taxon_id=taxonid, per_page=0)
n_obs = observations['total_results']

Let’s print an overview of these total results:

print("User {} has {} observations of Anthophila (taxon id {}) in iNaturalist".format(username,n_obs,taxonid))
User neomapas has 35 observations of Anthophila (taxon id 630955) in iNaturalist

The maximum number of observations we can download in each query 200, so we need to use pagination to get all results. For each query we will extract a selection of fields that we will use for summarising the observation records (coordinates, species guess, quality grade), and at the same time we will extract the taxonomic information from each identification.

records=list()
taxa=list()
places=list()
msg="Requesting observations from user _{}_: page {}, total of {} observations downloaded"
j=1
while len(records) < n_obs:
    print(msg.format(username,j,min(j*200,n_obs)))
    observations = get_observations(
        user_id='neomapas',
        taxon_id=taxonid,
        per_page=1000,
        page=j)
    for obs in observations['results']:
        record = {
            'uuid': obs['uuid'],
            'quality': obs['quality_grade'],
            'description': obs['description'],
            'location': obs['place_guess'],
            'longitude': obs['location'][1],
            'latitude': obs['location'][0],
            'species guess': obs['species_guess'],
            'observed on': obs['observed_on'],
            'points': obs['faves_count'] * 10 + obs['comments_count'] + obs['identifications_count'] * 3,
        }
        for pid in obs['place_ids']:
            place_record = {
                'uuid': obs['uuid'],
                'place id': pid
            }
            places.append(place_record)
        for tid in obs['ident_taxon_ids']:
            taxon_record = {
                'uuid': obs['uuid'],
                'taxon id': tid
            }
            taxa.append(taxon_record)
        if len(obs['observation_photos'])>0:
            record['url'] = obs['observation_photos'][0]['photo']['url']
            record['attribution'] = obs['observation_photos'][0]['photo']['attribution']
        records.append(record)
    j=j+1
Requesting observations from user _neomapas_: page 1, total of 35 observations downloaded

This example requires extracting some additional information that is nested within the json structure of the API response. I explain some of the details in this post.

Step 2: Structuring observations and resolving taxonomy

The iNaturalist API returns observations as nested JSON objects, which are flexible but not immediately convenient for analysis. In this step, we transform the raw records collected in the previous step into structured pandas data frames that we can efficiently manipulate and join.

We create three separate tables:

  • Observations (inat_obs): one row per observation, containing coordinates, dates, species guesses, quality metrics, and image metadata.
  • Taxa links (taxa): a long table linking each observation to all taxon IDs involved in its identifications.
  • Place links (places): a long table linking each observation to all associated place IDs.
inat_obs=pd.DataFrame(records)
taxa = pd.DataFrame(taxa)
places = pd.DataFrame(places)

Once these tables are created, we extract the full list of unique taxon IDs used in the identifications and query the iNaturalist API again to retrieve their taxonomic metadata.

all_taxa=list(set(taxa['taxon id']))
def chunk(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())

By requesting multiple rank levels in a single query, we can reconstruct the taxonomic hierarchy — from family down to species — and attach it back to the observation table.

for slc in chunk(all_taxa,30):
    taxa_query = get_taxa_by_id(slc, rank_level=[30,29,28,27,26,25,20,10,5])
    for res in taxa_query['results']:
        qry = taxa.loc[taxa['taxon id'] == res['id'],'uuid']
        inat_obs.loc[inat_obs.uuid.isin(qry), res['rank']] = res['name']

This enrichment step allows each observation to carry its full taxonomic context, making it possible to summarise bee diversity at multiple ranks and to explore how taxonomic resolution varies across space and observations.

Step 3: Resolving geographic information

Each observation in iNaturalist is linked to one or more places, represented internally by numeric place IDs. These may correspond to countries, states, parks, or other administrative or user‑defined regions.

At this step, we extract all unique place IDs associated with the observations and query the iNaturalist API to retrieve their metadata. We are specifically interested in two administrative levels:

  • admin level 0 → country
  • admin level 10 → state / first‑level administrative division

For each matching place, we assign the corresponding country or state name to all observations linked to that place. This allows us to later summarise bee observations geographically, both across and within countries.

all_places=list(set(places['place id']))
response = get_places_by_id(all_places,
                            admin_level=[0,10])
for res in response['results']:
    qry = places.loc[places['place id'] == res['id'],'uuid']
    if res['admin_level'] == 10:
        level='state'
    elif res['admin_level'] == 0:
        level='country'
    inat_obs.loc[inat_obs.uuid.isin(qry), level] = res['name']

Step 4: Grouping and visualising observations

With taxonomic and geographic information now attached to every observation, we can start exploring patterns in the data.

In this step, we define a helper function that:

  1. Groups observations by one or more categorical variables (e.g. family, genus, country),
  2. Aggregates them using a chosen summary statistic (here, simple counts),
  3. Visualises the result as an interactive Plotly treemap.
def group_and_plot_data(x,aggfuncs,groupcols):
    gd=x.groupby(groupcols).agg(aggfuncs).reset_index()
    gd.columns = [' '.join(col).strip() for col in gd.columns.values]
    value_col = gd.columns.values[-1]
    fig = px.treemap(gd, 
        path=[px.Constant("Bees obs"),] + groupcols,  
        values=value_col,
        color=value_col, 
        hover_data=[value_col],
        color_continuous_scale='RdBu')
    fig.update_layout(margin = dict(t=5, l=5, r=5, b=5))
    return(fig)

Treemaps are particularly well‑suited for this type of hierarchical data, as they allow us to simultaneously represent taxonomic structure and relative abundance.

group_columns = ['family','subfamily','tribe','genus','species']
agg_funcs = {'uuid':['count']}
fig1 = group_and_plot_data(
    inat_obs.fillna('-- unassigned --'), 
    agg_funcs, 
    group_columns)
fig1.show()

By changing the grouping variables, we can switch perspectives — from taxonomic composition to geographic distribution — without changing the underlying data pipeline.

group_columns = ['country','state','family']
agg_funcs = {'uuid':['count']}
fig1 = group_and_plot_data(
    inat_obs.fillna('-- unassigned --'), 
    agg_funcs, 
    group_columns)
fig1.show()

Step 5: Displaying a sample of observations

Now we combine spatial and taxonomic information to get a wall of pictures showing the most interesting observations for each butterfly family in each of the countries I have visited.

These lines of code perform a couple of tricks. I group the data twice, first I do the selection based on the points column for each combination of country and family, then I iterate across the countries and join the figures in a list. I then use display and HTML functions to read the formatted text strings as html elements1 to organise the figures and captions on this webpage.

inat_obs['figure'] = [
    "<figure class='mini'><a href='https://www.inaturalist.org/observations/%s' target=_blank><img src='%s' height=50><figcaption class='mini'>%s: <i>%s</i></figcaption></a></figure>" % (
        record['uuid'],
        record['url'],
        record['family'],
        record['species guess'])
    for idx,record in inat_obs.iterrows() 
]
selection = (
    inat_obs
    .sort_values('points')
    .groupby(['country','state','family'])
    .first()
    .groupby(['country','state'])
    .agg({'figure':'unique'})
)


sections = list()
for idx,row in selection.iterrows():
    sectionfigures="&nbsp;".join(row['figure'])
    sectionname="<figure class='mini'><p class='figsection'>%s<br>%s</p></figure>" % idx
    sections.append(sectionname + sectionfigures)

allsections="<div class='container'>%s</div>" % ("".join(sections))

display(HTML(allsections))

Interpreting the results

The treemaps above highlight two complementary dimensions of my observations:

  • Taxonomic depth: how records are distributed across families, subfamilies, tribes, genera, and species
  • Geographic breadth: how bee diversity varies across the countries and regions I have visited

Unassigned taxa remain visible in the plots, which is intentional: they reflect observations that still need identification or where the taxonomic resolution has not yet reached species level. In that sense, the figures are not static summaries but living dashboards that improve over time as identifications are refined.

The final image wall brings everything back to the observational level — individual records, individual encounters — reminding us that behind each aggregated count there is a moment in the field.

Conclusion

In summary, we:

  1. Downloaded iNaturalist observations for a specific user and taxon using the pyinaturalist API.
  2. Parsed and normalised nested API responses into tidy pandas data frames.
  3. Enriched observations with taxonomy, retrieving family‑to‑species information for all associated identifications.
  4. Linked observations to places, resolving countries and subnational regions from place IDs.
  5. Explored patterns interactively, using Plotly treemaps to visualise taxonomic and geographic structure.
  6. Curated representative observations, combining spatial, taxonomic, and engagement metrics into an image gallery.

Beyond bees, this workflow illustrates how personal biodiversity data can be transformed into an exploratory research object — one that evolves as new observations are added, identifications improve, and journeys continue.

And perhaps most importantly: every dot, every image, every branch in these figures comes from stopping for a moment, smiling at a bee, and paying attention.

Footnotes

  1. The look of the output html code depends on the site’s css style definitions. Look at this file if you want to reuse/adapt my style.↩︎