IDigBio API

From iDigBio
Revision as of 14:58, 21 May 2014 by Dstoner (talk | contribs) (add section for api examples, re-arrange)
Jump to navigation Jump to search


iDigBio API Overview

This document serves as the official documentation for the iDigBio Application Programming Interface (API).

The iDigBio API is an abstraction layer for retrieving data from the iDigBio back-end data systems. The iDigBio API is a RESTful pattern HTTP API that primarily delivers data in JSON format. This abstraction allows reuse and mashup of aggregated data without needing to understand the complex underlying details of the back-end data storage. Currently, the public API supports GET requests for data read operations only.

Programmatic Search is a special case. See the section #Searching iDigBio below for more information.

Specification

The current iDigBio API v1 Specification includes detailed information about the API endpoints, parameters, and values.

Examples

The iDigBio API Examples page contains many more examples of the iDigBio API in action.

Quick Start

iDigBio API endpoints follow the general form:

http://api.idigbio.org/{api_version}{endpoint}{optional_parameters}
Simple Example - Curculionidae

Let us say that we have already located the specimen record for a particular Curculionidae specimen (a family of weevils). The specimen record for our particular example is identified by the following iDigBio GUID:

"idigbio:uuid" : "354210ae-4aa3-49d2-8a66-78a86b019c7b"

To retrieve a specimen record from v1 of the API with the above iDigBio UUID, we issue an HTTP "GET" request to the following endpoint:

http://api.idigbio.org/v1/records/354210ae-4aa3-49d2-8a66-78a86b019c7b

and receive the following JSON document from the API (in this case, formatted for readability):

{
   "idigbio:uuid" : "354210ae-4aa3-49d2-8a66-78a86b019c7b",
   "idigbio:etag" : "02736fd7318eafed62a4a5ff35175a27fa63983e",
   "idigbio:links" : {
      "mediarecord" : [
         "http://api.idigbio.org/v1/mediarecords/59141135-813a-4db1-a527-009ae6d17101"
      ],
      "owner" : [
         "872733a2-67a3-4c54-aa76-862735a5f334"
      ],
      "recordset" : [
         "http://api.idigbio.org/v1/recordsets/69037495-438d-4dba-bf0f-4878073766f1"
      ]
   },
   "idigbio:version" : 2,
   "idigbio:createdBy" : "872733a2-67a3-4c54-aa76-862735a5f334",
   "idigbio:recordIds" : [
      "urn:uuid:b036a012-ba1e-41e0-a39a-76fc253640cf"
   ],
   "idigbio:dateModified" : "2014-04-22T07:33:16.129Z",
   "idigbio:data" : {
      "dwc:day" : "16",
      "dwc:identifiedBy" : "CPMAB",
      "idigbio:recordId" : "urn:uuid:b036a012-ba1e-41e0-a39a-76fc253640cf",
      "dwc:catalogNumber" : "NAUF4A0013309",
      "dwc:locality" : "Box Cyn. Santa Rita Mts.",
      "dwc:occurrenceID" : "1063507",
      "dwc:year" : "1967",
      "dwc:recordedBy" : "C.D. Johnson",
      "dwc:scientificName" : "Curculionidae",
      "dwc:basisOfRecord" : "PreservedSpecimen",
      "dwc:family" : "Curculionidae",
      "symbiotaverbatimScientificName" : "Curculionidae",
      "dwc:collectionCode" : "NAUF",
      "dcterms:modified" : "2013-12-20 13:00:36",
      "dwc:country" : "USA",
      "dcterms:references" : "http://symbiota4.acis.ufl.edu/scan/portal/collections/individual/index.php?occid=1063507",
      "dwc:eventDate" : "1967-08-16",
      "dwc:scientificNameAuthorship" : "Latreille, 1802",
      "dwc:collectionID" : "urn:uuid:c87a0756-fdd7-4cb6-9921-ca5774f8330e",
      "dwc:minimumElevationInMeters" : "1524",
      "dwc:verbatimElevation" : "5000'",
      "dwc:startDayOfYear" : "228",
      "dwc:month" : "8",
      "dwc:rights" : "http://creativecommons.org/licenses/by-nc-sa/3.0/",
      "dwc:stateProvince" : "Arizona",
      "dwc:genus" : "Curculionidae",
      "dwc:institutionCode" : "NAU",
      "dwc:county" : "Pima"
   }
}

The iDigBio API Examples page shows how to drill down into this specimen record and retrieve an image associated with the specimen, as well as how one might search to locate the specimen record in the first place.

Endpoint Properties

 This section needs to be moved to specification page and re-written -dstoner 

The iDigBio API tries to follow the REST paradigm's HATEOAS (Hypermedia as the Engine of Application State) model, which basically means that within each API endpoint we provide a list of relevant links to further API actions. This list typically is stored in "idigbio:links"

Other system level property names include

For Entity Endpoints:

  • etag - the opaque identifier assigned to a specific version of a resource found at a URL
  • dateModified - The date the entity was modified
  • version - The entity's version number
  • type - The entity's type
  • uuid - The entity's uuid
  • siblings - Any siblings the entity may have as a dictionary of uuids
  • recordIds - A list of lookup keys for the entity
  • data - The entity's encapsulated data element

For Collection Endpoints:

  • items - the list of items in the collection
  • itemCount - the number of total items in the collection

Entity Data

 This section needs to be re-written, moved to specification page, or deleted. -dstoner 

The data element for each entity can include any number of key-value pairs containing properties of the entity, including potentially values that are themselves lists or dictionaries. Typical key namespaces that might appear in each type are (in order of decreasing usefulness):

  • Records: typically contains Darwin Core elements ( http://rs.tdwg.org/dwc/terms/index.htm ) describing a physical specimen, may also contain custom elements or elements defined by other standards. See the complete list of terms here.
  • Mediarecords: typically contains Audubon Core elements ( http://terms.gbif.org/wiki/Audubon_Core_Term_List_(1.0_normative) ) describing a media capture event, may also contain custom elements or elements defined by other standards. See the complete list of terms here.
  • Publishers: A top level entity for the data ingestion process, each publisher contains metadata about a publishing location such as an IPT installation or Symbiota portal.
  • Recordsets: An entity largely derived from the publisher metadata. These serve as the join point between multiple data files for single collection, and all records and mediarecords in iDigBio are expected to be associated with a recordset that links them to a source.
  • All other entities exposed via the api are either internal only concepts with no fixed definition, or are unused.

Searching iDigBio

Search Portal and Bulk Record Downloads

The recommended method for searching iDigBio is to use the Portal search, not the API. The portal also provides bulk download capabilities for aquiring larger sets of data. See: https://www.idigbio.org/portal

Elasticsearch Overview

The iDigBio API does not currently (yet!) provide query/search capabilities. However, the back-end Elasticsearch interface is public-facing and available for use by advanced users and programmers. This is the same interface that is used by the iDigBio Portal search but does not require experienced data users to navigate the portal web site.

Note: Direct queries to the iDigBio Elasticsearch service should be considered an Advanced operation.

According to the Elasticsearch project site, Elasticsearch is a "flexible and powerful open source, distributed, real-time search and analytics engine." Elasticsearch provides a RESTful web services interfaces and returns data in JSON format.

More detailed information on iDigBio Elasticsearch capabilities is available in iDigBio API v1 Specification#Search.

See iDigBio API Examples for Elasticsearch examples that are specific to iDigBio.