Talk:IDigBio API: Difference between revisions

From iDigBio
Jump to navigation Jump to search
Line 279: Line 279:
"waterbody"
"waterbody"
</pre>
</pre>
Should include something about hasImage, scientificname


Sample query:
Sample query:

Revision as of 14:00, 2 May 2014


iDigBio API Overview

This document is the starting point for an introduction to the iDigBio Application Programming Interface (API).

The iDigBio API Examples page includes additional examples of how to use the suite of APIs available.

iDigBio API Basics and Quick Start

The iDigBio API is a RESTful pattern HTTP API that primarily delivers data in JSON format. Currently the API supports GET requests or data read operations only.

API URLs (endpoints) have several parameters in them, and typically follow the form of

<base url>/<version>/<type>/<id> 

For example:

 Using the following parameters for an API request

 base url = http://api.idigbio.org
 version = v1
 type = records
 id = 00000230-01bc-4a4f-8389-204f39da9530

 would produce a URL of the following form

 "http://api.idigbio.org/v1/records/00000230-01bc-4a4f-8389-204f39da9530" 


There are two major types of API enpoints:

  • Collection - which is a group endpoint that returns lists of multiple records. These urls are of the form <base url>/<version>/<type>, such as http://api.idigbio.org/v1/mediarecords/ . Additionally, a collection endpoint can contain optional query parameters, ?limit indicates the number of records returned in the collection and defaults to 1000 and the ?offset parameter which indicates the number of records to skip before returning a set of records and defaults to 0. If a collection endpoint request finds more then the set limit of records it will include a "next page" link to retrieve the next set of records in the collection. See the endpoint properties section for more information on properties returned.
  • Entity - A single item endpoint which returns all of the data available about an object. These urls are of the form <base url>/<version>/<type>/<id> like the example used above.

NOTE: at this time the API does not support search capabilities on entities or collections.

Examples:

collection:
"http://api.idigbio.org/v1/mediarecords"
collection w/ optional query parameters:
"http://api.idigbio.org/v1/mediarecords?limit=100&offset=100"
entity:
"http://api.idigbio.org/v1/mediarecords/00000230-01bc-4a4f-8389-204f39da9530"


Endpoint Basics

Calling just the base URL will return a list of API version endpoints. For example a GET request to "http://api.idigbio.org" will return JSON data like the example below.

{
   "v1" : "http://api.idigbio.org/v1/",
   "check" : "http://api.idigbio.org/check",
   "v0" : "http://api.idigbio.org/v0/"
}

and calling a version URL endpoint will return a list of major data types available for that version. For example, for v1 of the API a GET request to ' http://api.idigbio.org/v1" will return:

{
   "aggregates" : "http://api.idigbio.org/v1/aggregates",
   "records" : "http://api.idigbio.org/v1/records",
   "mediaaps" : "http://api.idigbio.org/v1/mediaaps",
   "taxa" : "http://api.idigbio.org/v1/taxa",
   "people" : "http://api.idigbio.org/v1/people",
   "organizations" : "http://api.idigbio.org/v1/organizations",
   "recordsets" : "http://api.idigbio.org/v1/recordsets",
   "mediarecords" : "http://api.idigbio.org/v1/mediarecords"
}

Endpoint Properties

The iDigBio API tries to follow the REST paradigm's HATEOAS (Hypermedia as the Engine of Application State) model, which basically means that within each API endpoint we provide a list of relevant links to further API actions. This list typically is stored in "idigbio:links"

Other system level property names include

For Entity Endpoints:

  • etag - the opaque identifier assigned to a specific version of a resource found at a URL
  • dateModified - The date the entity was modified
  • version - The entity's version number
  • type - The entity's type
  • uuid - The entity's uuid
  • siblings - Any siblings the entity may have as a dictionary of uuids
  • recordIds - A list of lookup keys for the entity
  • data - The entity's encapsulated data element

For Collection Endpoints:

  • items - the list of items in the collection
  • itemCount - the number of total items in the collection


Entity Data

The data element for each entity can include any number of key-value pairs containing properties of the entity, including potentially values that are themselves lists or dictionaries. Typical key namespaces that might appear in each type are (in order of decreasing usefulness):

  • Records: typically contains darwin core elements ( http://rs.tdwg.org/dwc/terms/index.htm ) describing a physical specimen, may also contain custom elements or elements defined by other standards.
  • Mediarecords: typically contains Audubon Core elements ( http://terms.gbif.org/wiki/Audubon_Core_Term_List_(1.0_normative) ) describing a media capture event, may also contain custom elements or elements defined by other standards.
  • Publishers: A top level entity for the data ingestion process, each publisher contains metadata about a piece of publishing software such as an IPT installation or Symbiota portal.
  • Recordsets: An entity largely derived from the publisher metadata. These serve as the join point between multiple data files for single collection, and all records and mediarecords in iDigBio are expected to be associated with a recordset that links them to a source.
  • All other entities exposed via the api are either internal only concepts with no fixed definition, or are unused.


Available API endpoints

All endpoints follow the form of "http://api.idigbio.org/{api_version}{endpoint}"

Endpoint Method API Versions Available Description
'/mediarecords' GET v0, v1 returns a collection of media record IDs
'/mediarecords/{ID}' GET v0, v1 returns a media record with the specific entity ID
'/mediarecords/{ID}/media' GET v0, v1 returns an image associated with the specific entity ID
'/records' GET v0, v1 returns a collection of record IDs
'/records/{ID}' GET v0, v1 returns a record with the specific entity ID
'/records/{ID}/media' GET v0, v1 returns an image associated with the specific entity ID
'/publishers' GET v0, v1 returns a collection of publisher IDs
'/publishers/{ID}' GET v0, v1 returns a publisher with specific entity ID
'/recordsets' GET v0, v1 returns a collection of recordset IDs
'/recordsets/{ID}' GET v0, v1 returns a recordset with specific entity ID

Optional API Parameters

Parameter Endpoint type Values Description Example
limit Collections [1-] Controls the number of records returned by a collection url. Large numbers may cause requests to time out, but are significantly more efficient when attempting to query large numbers of records. http://api.idigbio.org/v1/mediarecords?limit=100
offset Collections [0-] Controls how many records to skip forward when paging through the API. Large offsets are extremely inefficient, so combinations of small limits and large offsets may cause requests to fail. http://api.idigbio.org/v1/mediarecords?limit=100&offset=100
version Entities [0-current version], -1 for latest version Return a specific version of a record from the data store. Can be used to query historical data for iDigBio records. http://api.idigbio.org/v1/records/c93ebbee-64b5-4452-9e80-93bbfb11b815?version=0
quality Entities ["thumbnail", "web"], Specifiy the quality of the image returned from the API (valid values are "thumbnail" and "web" which return images of width 260 and 600 pixels respectively). Omitting quality will return the full-size high quality image. http://api.idigbio.org/v1/mediarecords/55dd6860-213d-4478-8bfa-b5486afcffda/media?quality=thumbnail http://api.idigbio.org/v1/mediarecords/55dd6860-213d-4478-8bfa-b5486afcffda/media?quality=web

Searching iDigBio

While iDigBio does not currently provide Search API endpoints to faciliate queries via the API, we do offer direct access to the backend Elasticsearch index servers.

Elasticsearch Overview

Elasticsearch provides Lucene-style indexes for querying which return JSON formatted data. This is the same interface that is used by the iDigBio Portal search to query Elasticsearch.

Direct queries to the iDigBio Elasticsearch service should be considered an Advanced operation.

More information about querying an Elastic Search server can be found at: http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl.html

The iDigBio Elasticsearch service is accessed via a base URL combined with the query specificiation. The indexes provide two document types to query on; Records and Media Records.

The search base URL is:

https://search.idigbio.org/

Elasticsearch records

https://search.idigbio.org/idigbio/records/_search

Available fields:

"barcodevalue"
"catalognumber"
"class"
"collectioncode"
"collectionid"
"collectionname"
"collector"
"commonname"
"continent"
"country"
"county"
"datecollected"
"datemodified"
"etag"
"family"
"fieldnumber"
"genus"
"geopoint"
"hasImage"
"highertaxon"
"infraspecificepithet"
"institutioncode"
"institutionid"
"institutionname"
"kingdom"
"locality"
"maxdepth"
"maxelevation"
"mediarecords"
"mindepth"
"minelevation"
"municipality"
"occurenceid"
"order"
"phylum"
"recordset"
"scientificname"
"specificepithet"
"stateprovince"
"typestatus"
"uuid"
"verbatimlocality"
"version"
"waterbody"


Should include something about hasImage, scientificname


Sample query:

curl -s -XGET https://search.idigbio.org/idigbio/records/_search?q=uuid:a5eef658-07c3-4c45-91ef-17f21f7ccff8 | json_pp

Elasticsearch mediarecords

https://search.idigbio.org/idigbio/mediarecords/_search