IDigBio API
iDigBio API Overview
This document serves as the official documentation for the iDigBio Application Programming Interface (API).
The iDigBio API is an abstraction layer for retrieving data from the iDigBio back-end data systems. The iDigBio API is a RESTful pattern HTTP API that primarily delivers data in JSON format. This abstraction allows reuse and mashup of aggregated data without needing to understand the complex underlying details of the back-end data storage. Currently, the public API supports GET requests for data read operations only.
Programmatic Search is a special case. See the section #Searching iDigBio below for more information.
Quick Start
iDigBio API endpoints follow the general form:
http://api.idigbio.org/{api_version}{endpoint}{optional_parameters}
Experienced programmers may wish to jump straight to the iDigBio API v1 Specification or read through the iDigBio API Examples which contains many more examples of the iDigBio API in action.
- Simple Example - Curculionidae
Let us say that we have already located the specimen record for a particular Curculionidae specimen (a family of weevils). The specimen record for our particular example is identified by the following iDigBio GUID:
"idigbio:uuid" : "354210ae-4aa3-49d2-8a66-78a86b019c7b"
To retrieve a specimen record from v1 of the API with the above iDigBio GUID, we issue an HTTP "GET" request to the following endpoint:
http://api.idigbio.org/v1/records/354210ae-4aa3-49d2-8a66-78a86b019c7b
and receive the following JSON document from the API (in this case, formatted for readability):
{ "idigbio:uuid" : "354210ae-4aa3-49d2-8a66-78a86b019c7b", "idigbio:etag" : "02736fd7318eafed62a4a5ff35175a27fa63983e", "idigbio:links" : { "mediarecord" : [ "http://api.idigbio.org/v1/mediarecords/59141135-813a-4db1-a527-009ae6d17101" ], "owner" : [ "872733a2-67a3-4c54-aa76-862735a5f334" ], "recordset" : [ "http://api.idigbio.org/v1/recordsets/69037495-438d-4dba-bf0f-4878073766f1" ] }, "idigbio:version" : 2, "idigbio:createdBy" : "872733a2-67a3-4c54-aa76-862735a5f334", "idigbio:recordIds" : [ "urn:uuid:b036a012-ba1e-41e0-a39a-76fc253640cf" ], "idigbio:dateModified" : "2014-04-22T07:33:16.129Z", "idigbio:data" : { "dwc:day" : "16", "dwc:identifiedBy" : "CPMAB", "idigbio:recordId" : "urn:uuid:b036a012-ba1e-41e0-a39a-76fc253640cf", "dwc:catalogNumber" : "NAUF4A0013309", "dwc:locality" : "Box Cyn. Santa Rita Mts.", "dwc:occurrenceID" : "1063507", "dwc:year" : "1967", "dwc:recordedBy" : "C.D. Johnson", "dwc:scientificName" : "Curculionidae", "dwc:basisOfRecord" : "PreservedSpecimen", "dwc:family" : "Curculionidae", "symbiotaverbatimScientificName" : "Curculionidae", "dwc:collectionCode" : "NAUF", "dcterms:modified" : "2013-12-20 13:00:36", "dwc:country" : "USA", "dcterms:references" : "http://symbiota4.acis.ufl.edu/scan/portal/collections/individual/index.php?occid=1063507", "dwc:eventDate" : "1967-08-16", "dwc:scientificNameAuthorship" : "Latreille, 1802", "dwc:collectionID" : "urn:uuid:c87a0756-fdd7-4cb6-9921-ca5774f8330e", "dwc:minimumElevationInMeters" : "1524", "dwc:verbatimElevation" : "5000'", "dwc:startDayOfYear" : "228", "dwc:month" : "8", "dwc:rights" : "http://creativecommons.org/licenses/by-nc-sa/3.0/", "dwc:stateProvince" : "Arizona", "dwc:genus" : "Curculionidae", "dwc:institutionCode" : "NAU", "dwc:county" : "Pima" } }
The iDigBio API Examples page shows how to drill down into this specimen record and retrieve an image associated with the specimen, as well as how one might search to locate the specimen record in the first place.
Endpoint Basics
Calling just the base URL will return a list of API version endpoints. For example, an HTTP GET request to "http://api.idigbio.org" will return the following JSON data:
{ "v1" : "http://api.idigbio.org/v1/", "check" : "http://api.idigbio.org/check", "v0" : "http://api.idigbio.org/v0/" }
Endpoint Properties
The iDigBio API tries to follow the REST paradigm's HATEOAS (Hypermedia as the Engine of Application State) model, which basically means that within each API endpoint we provide a list of relevant links to further API actions. This list typically is stored in "idigbio:links"
Other system level property names include
For Entity Endpoints:
- etag - the opaque identifier assigned to a specific version of a resource found at a URL
- dateModified - The date the entity was modified
- version - The entity's version number
- type - The entity's type
- uuid - The entity's uuid
- siblings - Any siblings the entity may have as a dictionary of uuids
- recordIds - A list of lookup keys for the entity
- data - The entity's encapsulated data element
For Collection Endpoints:
- items - the list of items in the collection
- itemCount - the number of total items in the collection
Entity Data
The data element for each entity can include any number of key-value pairs containing properties of the entity, including potentially values that are themselves lists or dictionaries. Typical key namespaces that might appear in each type are (in order of decreasing usefulness):
- Records: typically contains darwin core elements ( http://rs.tdwg.org/dwc/terms/index.htm ) describing a physical specimen, may also contain custom elements or elements defined by other standards.
- Mediarecords: typically contains Audubon Core elements ( http://terms.gbif.org/wiki/Audubon_Core_Term_List_(1.0_normative) ) describing a media capture event, may also contain custom elements or elements defined by other standards.
- Publishers: A top level entity for the data ingestion process, each publisher contains metadata about a publishing location such as an IPT installation or Symbiota portal.
- Recordsets: An entity largely derived from the publisher metadata. These serve as the join point between multiple data files for single collection, and all records and mediarecords in iDigBio are expected to be associated with a recordset that links them to a source.
- All other entities exposed via the api are either internal only concepts with no fixed definition, or are unused.
Optional API Parameters
Parameter | Endpoint type | Values | Description | Example |
---|---|---|---|---|
limit | Collections | [1-] | Controls the number of records returned by a collection url. Large numbers may cause requests to time out, but are significantly more efficient when attempting to query large numbers of records. | http://api.idigbio.org/v1/mediarecords?limit=100 |
offset | Collections | [0-] | Controls how many records to skip forward when paging through the API. Large offsets are extremely inefficient, so combinations of small limits and large offsets may cause requests to fail. | http://api.idigbio.org/v1/mediarecords?limit=100&offset=100 |
version | Entities | [0-current version], -1 for latest version | Return a specific version of a record from the data store. Can be used to query historical data for iDigBio records. | http://api.idigbio.org/v1/records/c93ebbee-64b5-4452-9e80-93bbfb11b815?version=0 |
quality | Entities | ["thumbnail", "web"], | Specifiy the quality of the image returned from the API (valid values are "thumbnail" and "web" which return images of width 260 and 600 pixels respectively). Omitting quality will return the full-size high quality image. | http://api.idigbio.org/v1/mediarecords/55dd6860-213d-4478-8bfa-b5486afcffda/media?quality=thumbnail http://api.idigbio.org/v1/mediarecords/55dd6860-213d-4478-8bfa-b5486afcffda/media?quality=web |
Searching iDigBio
Search Portal and Bulk Record Downloads
The recommended method for searching iDigBio is to use the Portal search, not the API. The portal also provides bulk download capabilities for aquiring larger sets of data. See: https://www.idigbio.org/portal
Elasticsearch Overview
The iDigBio API does not currently (yet!) provide query/search capabilities. However, the back-end Elasticsearch interface is public-facing and available for use by advanced users and programmers. This is the same interface that is used by the iDigBio Portal search.
Note: Direct queries to the iDigBio Elasticsearch service should be considered an Advanced operation.
According to the Elasticsearch project site, Elasticsearch is a "flexible and powerful open source, distributed, real-time search and analytics engine." Elasticsearch provides a RESTful web services interfaces and returns data in JSON format.
More detailed information on iDigBio Elasticsearch capabilities is available in iDigBio API v1 Specification#Search.
See iDigBio API Examples for Elasticsearch examples that are specific to iDigBio.