IDigBio API: Difference between revisions

From iDigBio
Jump to navigation Jump to search
No edit summary
 
(116 intermediate revisions by 7 users not shown)
Line 5: Line 5:
== iDigBio API Overview ==
== iDigBio API Overview ==


This document serves as the official documentation for the iDigBio Application Programming Interface (API).
This document serves as the starting page of official documentation for the iDigBio Application Programming Interfaces ([https://en.wikipedia.org/wiki/Application_programming_interface APIs]).


The iDigBio API is an abstraction layer for retrieving data from the iDigBio back-end data systems. The iDigBio API is a RESTful pattern HTTP API that primarily delivers data in JSON format. This abstraction allows reuse and mashup of aggregated data without needing to understand the complex underlying details of the back-end data storage. Currently, the public API supports GET requests for data read operations only.
[https://www.idigbio.org/ Integrated Digitized Biocollections (iDigBio)] is the National Resource for Advancing Digitization of Biodiversity Collections (ADBC) funded by the National Science Foundation. Through ADBC, data and images for millions of biological specimens are being made available in electronic format for the research community, government agencies, students, educators, and the general public. iDigBio is a data aggregator. This means that data is provided to iDigBio through various publishing mechanisms.


Programmatic Search is a special case. See the section [[#Searching iDigBio]] below for more information.
Many consumers of the iDigBio aggregated data will choose to use the [https://www.idigbio.org/portal iDigBio Portal web site]. Although our portal has many features and is the easiest interface to use, to facilitate integration of iDigBio data with other web sites, services, or research uses, iDigBio provides APIs in order to provide direct access to our data. We ourselves make use of these APIs in our portal and other data services so any functionality you see there, as well as functions not available through the portal, can be done through APIs yourself.


== Quick Start ==
The iDigBio APIs are an abstraction layer for retrieving data from the iDigBio back-end data systems. This abstraction allows reuse and mashup of aggregated data without needing to understand the complex underlying details of the back-end data storage. Currently, the public APIs support HTTP GET and POST requests for data read operations only. The iDigBio APIs are RESTful web services that delivers data primarily as [https://www.json.org/ JSON] documents.


Experienced programmers may wish to jump straight to the [[iDigBio API v1 Specification]] or read through the [[iDigBio API Examples]].
All iDigBio APIs are publicly available with no user authentication required or rate limits applied.


iDigBio API endpoints follow the general form:
== Quick Start Example ==


;<pre>http://api.idigbio.org/{api_version}{endpoint}{optional_parameters}</pre>
The search API is our most useful for most people. Below is a simple example of searching for all records in a given genus to provide you with an example of how this API looks. You can simply click the link in your browser to make the API call and get results:


Simple Example - Curculionidae
[https://search.idigbio.org/v2/search/records?rq={%22genus%22:%22acer%22} https://search.idigbio.org/v2/search/records?rq={"genus":"acer"}]


Let us say we have already located the specimen record for a particular Curculionidae specimen (a family of weevils). The speciment record for our particular example is identified by the following iDigBio GUID:
(You might want to install a JSON viewing plugin in your browser such as JSONView for Chrome.)


<code>
== iDigBio API Mailing List ==
"idigbio:uuid" : "354210ae-4aa3-49d2-8a66-78a86b019c7b"
</code>


To retrieve a specimen record from v1 of the API with the above iDigBio GUID, we issue a GET HTTP request to the following endpoint:
Please join our list at '''idigbio-api-users-l@lists.ufl.edu''' if you are using our APIs. Besides providing notification of API changes, the mailing list gives direct access to the iDigBio API developers and is a good place to ask questions or give and receive feedback.


<code>
You can join via the web at: http://lists.ufl.edu/cgi-bin/wa?A0=IDIGBIO-API-USERS-L
http://api.idigbio.org/v1/records/354210ae-4aa3-49d2-8a66-78a86b019c7b
</code>


and receive the following JSON document from the API (in this case, formatted for readability):
== iDigBio APIs ==


<pre>
There are several APIs you can use to retrieve data from iDigBio: Search, Download, Record, and Media.
{
  "idigbio:uuid" : "354210ae-4aa3-49d2-8a66-78a86b019c7b",
  "idigbio:etag" : "02736fd7318eafed62a4a5ff35175a27fa63983e",
  "idigbio:links" : {
      "mediarecord" : [
        "http://api.idigbio.org/v1/mediarecords/59141135-813a-4db1-a527-009ae6d17101"
      ],
      "owner" : [
        "872733a2-67a3-4c54-aa76-862735a5f334"
      ],
      "recordset" : [
        "http://api.idigbio.org/v1/recordsets/69037495-438d-4dba-bf0f-4878073766f1"
      ]
  },
  "idigbio:version" : 2,
  "idigbio:createdBy" : "872733a2-67a3-4c54-aa76-862735a5f334",
  "idigbio:recordIds" : [
      "urn:uuid:b036a012-ba1e-41e0-a39a-76fc253640cf"
  ],
  "idigbio:dateModified" : "2014-04-22T07:33:16.129Z",
  "idigbio:data" : {
      "dwc:day" : "16",
      "dwc:identifiedBy" : "CPMAB",
      "idigbio:recordId" : "urn:uuid:b036a012-ba1e-41e0-a39a-76fc253640cf",
      "dwc:catalogNumber" : "NAUF4A0013309",
      "dwc:locality" : "Box Cyn. Santa Rita Mts.",
      "dwc:occurrenceID" : "1063507",
      "dwc:year" : "1967",
      "dwc:recordedBy" : "C.D. Johnson",
      "dwc:scientificName" : "Curculionidae",
      "dwc:basisOfRecord" : "PreservedSpecimen",
      "dwc:family" : "Curculionidae",
      "symbiotaverbatimScientificName" : "Curculionidae",
      "dwc:collectionCode" : "NAUF",
      "dcterms:modified" : "2013-12-20 13:00:36",
      "dwc:country" : "USA",
      "dcterms:references" : "http://symbiota4.acis.ufl.edu/scan/portal/collections/individual/index.php?occid=1063507",
      "dwc:eventDate" : "1967-08-16",
      "dwc:scientificNameAuthorship" : "Latreille, 1802",
      "dwc:collectionID" : "urn:uuid:c87a0756-fdd7-4cb6-9921-ca5774f8330e",
      "dwc:minimumElevationInMeters" : "1524",
      "dwc:verbatimElevation" : "5000'",
      "dwc:startDayOfYear" : "228",
      "dwc:month" : "8",
      "dwc:rights" : "http://creativecommons.org/licenses/by-nc-sa/3.0/",
      "dwc:stateProvince" : "Arizona",
      "dwc:genus" : "Curculionidae",
      "dwc:institutionCode" : "NAU",
      "dwc:county" : "Pima"
  }
}
</pre>


=== Search API ===


;The [[iDigBio API Examples]] page shows how to drill down into this specimen record and retrieve an image associated with the specimen, as well as many more examples of the iDigBio API in action.
The version 2 Search API was released in June 2015 and is the current API that the portal and other iDigBio services are based on. Full documentation for this API is available in the [https://github.com/idigbio/idigbio-search-api/wiki idigbio-search-api Github wiki] with the source code. The search API allows you to retrieve limited sets of data in response to custom queries and is typically the one people use.


== Endpoint Basics ==
You can see more [https://github.com/iDigBio/idigbio-search-api/wiki/Examples iDigBio Search API examples] in the [https://github.com/iDigBio/idigbio-search-api/wiki/Examples github wiki].


=== Download API ===


Calling just the base URL will return a list of API version endpoints. For example, an HTTP GET request to "http://api.idigbio.org" will return the following JSON data:
The Download API can be used to generated [https://en.wikipedia.org/wiki/Darwin_Core_Archive Darwin Core archives] containing any amount of data up to everything that is in iDigBio. This API is documented on the [[iDigBio Download API]] wiki page. This API is useful when the size of the data you want exceeds what you can get through the search API. Be aware that this is not an interactive API; requests will result in a queued job being started to make the Darwin Core archive and you will be informed when it is ready to download later.


<pre>
For a detailed discussion of the data included in one of our downloads, you can read our [https://www.idigbio.org/content/understanding-idigbios-data-downloads Understanding iDigBio's Data Downloads] blog post.
{
  "v1" : "http://api.idigbio.org/v1/",
  "check" : "http://api.idigbio.org/check",
  "v0" : "http://api.idigbio.org/v0/"
}
</pre>


== Endpoint Properties ==
=== Record & Media APIs ===


The record and media APIs are of limited use for most people outside iDigBio. They are intended to return a single specimen or media's unprocessed information. You must know the iDigBio record or media identifier to be able to request data from this API. The most common use of the media API is to retrieve a link to an image as stored at iDigBio from a media record identifier. The significant additional functionality the record API provides is access to prior versions of records stored in iDigBio. Other APIs only interact with the most recent record version.


The iDigBio API tries to follow the REST paradigm's HATEOAS (Hypermedia as the Engine of Application State) model, which basically means that within each API endpoint we provide a list of relevant links to further API actions. This list typically is stored in "idigbio:links"
'''ToDo:''' Make documentation for v2 of this API publicly accessible.


Other system level property names include
== Other Ways of Accessing Data ==


For Entity Endpoints:
=== Specimen Web Map Module ===
*[[wikipedia:HTTP_ETag|etag]] - the opaque identifier assigned to a specific version of a resource found at a URL
*dateModified - The date the entity was modified
*version - The entity's version number
*type - The entity's type
*uuid - The entity's uuid
*siblings - Any siblings the entity may have as a dictionary of uuids
*recordIds - A list of lookup keys for the entity
*data - The entity's encapsulated data element


For Collection Endpoints:
The iDigBio [https://www.idigbio.org/portal/search Portal Search page] map is also provided as a stand-alone module for use in any third-party web site. If you are a software developer interested in integrating this map, please see the [https://github.com/iDigBio/idb-portal/blob/master/README.md#stand-alone-map-module-use documentation on Github] provided as part of the portal source code.
*items - the list of items in the collection
*itemCount - the number of total items in the collection


=== Client Libraries ===


== Entity Data ==
Client libraries, packages, and modules are pieces of software that make it easier to interface with the iDigBio API from a specific programming language. We have developed and maintain libraries for the following languages:


==== ridigbio R Package for the Search API ====


The data element for each entity can include any number of key-value pairs containing properties of the entity, including potentially values that are themselves lists or dictionaries. Typical key namespaces that might appear in each type are (in order of decreasing usefulness):
R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.


*Records: typically contains darwin core elements ( http://rs.tdwg.org/dwc/terms/index.htm ) describing a physical specimen, may also contain custom elements or elements defined by other standards.
The production version of this package is in [http://cran.r-project.org/web/packages/ridigbio/index.html CRAN]. The latest development version is available at https://github.com/idigbio/ridigbio
*Mediarecords: typically contains Audubon Core elements ( http://terms.gbif.org/wiki/Audubon_Core_Term_List_(1.0_normative) ) describing a media capture event, may also contain custom elements or elements defined by other standards.
*Publishers: A top level entity for the data ingestion process, each publisher contains metadata about a publishing location such as an IPT installation or Symbiota portal.
*Recordsets: An entity largely derived from the publisher metadata. These serve as the join point between multiple data files for single collection, and all records and mediarecords in iDigBio are expected to be associated with a recordset that links them to a source.
*All other entities exposed via the api are either internal only concepts with no fixed definition, or are unused.


== Optional API Parameters ==
The ridigbio R package is an "official" client library in the iDigBio code repository.


{|class="wikitable"
==== iDigBio Python Library for Search API ====
! align="left"| Parameter
! Endpoint type
! Values
! Description
! Example
|-
| limit
| Collections
| [1-]
| Controls the number of records returned by a collection url. Large numbers may cause requests to time out, but are significantly more efficient when attempting to query large numbers of records.
| http://api.idigbio.org/v1/mediarecords?limit=100
|-
| offset
| Collections
| [0-]
| Controls how many records to skip forward when paging through the API. Large offsets are extremely inefficient, so combinations of small limits and large offsets may cause requests to fail.
| http://api.idigbio.org/v1/mediarecords?limit=100&offset=100
|-
| version
| Entities
| [0-current version], -1 for latest version
| Return a specific version of a record from the data store. Can be used to query historical data for iDigBio records.
| http://api.idigbio.org/v1/records/c93ebbee-64b5-4452-9e80-93bbfb11b815?version=0
|-
| quality
| Entities
| ["thumbnail", "web"],
| Specifiy the quality of the image returned from the API (valid values are "thumbnail" and "web" which return images of width 260 and 600 pixels respectively). Omitting quality will return the full-size high quality image.
| http://api.idigbio.org/v1/mediarecords/55dd6860-213d-4478-8bfa-b5486afcffda/media?quality=thumbnail  http://api.idigbio.org/v1/mediarecords/55dd6860-213d-4478-8bfa-b5486afcffda/media?quality=web
|}


== Searching iDigBio ==
The Python client library is available via PyPI:


=== Search Portal and Bulk Record Downloads ===
https://pypi.python.org/pypi/idigbio


The recommended method for searching iDigBio is to use the Portal search, not the API. The portal also provides bulk download capabilities for aquiring larger sets of data.  See: https://www.idigbio.org/portal
or just use pip:


=== Elasticsearch Overview ===
'''pip install idigbio'''


The iDigBio API does not currently (yet!) provide query/search capabilities.  However, the back-end Elasticsearch interface is public-facing and available for use by advanced users and programmers. This is the same interface that is used by the iDigBio Portal search.
The client code can also be found at github if you wish to help develop the library:


'''Note: Direct queries to the iDigBio Elasticsearch service should be considered an Advanced operation.'''
https://github.com/idigbio/idigbio-python-client/


According to the [http://www.elasticsearch.org/overview/elasticsearch/ Elasticsearch project site], Elasticsearch is a "flexible and powerful open source, distributed, real-time search and analytics engine." Elasticsearch provides a RESTful web services interfaces and returns data in JSON format.
The iDigBio Python library is an "official" client library in the iDigBio code repository.


More detailed information on iDigBio Elasticsearch capabilities is available in [[iDigBio API v1 Specification#Search]].
==== idigbio_client Ruby Gem for the Search API ====


See [[iDigBio API Examples]] for Elasticsearch examples that are specific to iDigBio.
The idigbio_client ruby gem is available:
 
https://rubygems.org/gems/idigbio_client
 
which can be installed via:
 
'''gem install idigbio_client'''
 
The idigbio_client ruby gem is a "community-supported" client library with software repository currently at:
 
https://github.com/GlobalNamesArchitecture/idigbio_client
 
== Suggestions? ==
 
If you are in need of other client libraries or iDigBio API features, please use the [https://www.idigbio.org/modal_forms/nojs/contact feedback button] to submit your request.
 
 
== Technical Details ==
 
We describe the [[iDigBio API Software]] used to provide these APIs in a separate document.

Latest revision as of 08:05, 10 July 2018


iDigBio API Overview

This document serves as the starting page of official documentation for the iDigBio Application Programming Interfaces (APIs).

Integrated Digitized Biocollections (iDigBio) is the National Resource for Advancing Digitization of Biodiversity Collections (ADBC) funded by the National Science Foundation. Through ADBC, data and images for millions of biological specimens are being made available in electronic format for the research community, government agencies, students, educators, and the general public. iDigBio is a data aggregator. This means that data is provided to iDigBio through various publishing mechanisms.

Many consumers of the iDigBio aggregated data will choose to use the iDigBio Portal web site. Although our portal has many features and is the easiest interface to use, to facilitate integration of iDigBio data with other web sites, services, or research uses, iDigBio provides APIs in order to provide direct access to our data. We ourselves make use of these APIs in our portal and other data services so any functionality you see there, as well as functions not available through the portal, can be done through APIs yourself.

The iDigBio APIs are an abstraction layer for retrieving data from the iDigBio back-end data systems. This abstraction allows reuse and mashup of aggregated data without needing to understand the complex underlying details of the back-end data storage. Currently, the public APIs support HTTP GET and POST requests for data read operations only. The iDigBio APIs are RESTful web services that delivers data primarily as JSON documents.

All iDigBio APIs are publicly available with no user authentication required or rate limits applied.

Quick Start Example

The search API is our most useful for most people. Below is a simple example of searching for all records in a given genus to provide you with an example of how this API looks. You can simply click the link in your browser to make the API call and get results:

https://search.idigbio.org/v2/search/records?rq={"genus":"acer"}

(You might want to install a JSON viewing plugin in your browser such as JSONView for Chrome.)

iDigBio API Mailing List

Please join our list at idigbio-api-users-l@lists.ufl.edu if you are using our APIs. Besides providing notification of API changes, the mailing list gives direct access to the iDigBio API developers and is a good place to ask questions or give and receive feedback.

You can join via the web at: http://lists.ufl.edu/cgi-bin/wa?A0=IDIGBIO-API-USERS-L

iDigBio APIs

There are several APIs you can use to retrieve data from iDigBio: Search, Download, Record, and Media.

Search API

The version 2 Search API was released in June 2015 and is the current API that the portal and other iDigBio services are based on. Full documentation for this API is available in the idigbio-search-api Github wiki with the source code. The search API allows you to retrieve limited sets of data in response to custom queries and is typically the one people use.

You can see more iDigBio Search API examples in the github wiki.

Download API

The Download API can be used to generated Darwin Core archives containing any amount of data up to everything that is in iDigBio. This API is documented on the iDigBio Download API wiki page. This API is useful when the size of the data you want exceeds what you can get through the search API. Be aware that this is not an interactive API; requests will result in a queued job being started to make the Darwin Core archive and you will be informed when it is ready to download later.

For a detailed discussion of the data included in one of our downloads, you can read our Understanding iDigBio's Data Downloads blog post.

Record & Media APIs

The record and media APIs are of limited use for most people outside iDigBio. They are intended to return a single specimen or media's unprocessed information. You must know the iDigBio record or media identifier to be able to request data from this API. The most common use of the media API is to retrieve a link to an image as stored at iDigBio from a media record identifier. The significant additional functionality the record API provides is access to prior versions of records stored in iDigBio. Other APIs only interact with the most recent record version.

ToDo: Make documentation for v2 of this API publicly accessible.

Other Ways of Accessing Data

Specimen Web Map Module

The iDigBio Portal Search page map is also provided as a stand-alone module for use in any third-party web site. If you are a software developer interested in integrating this map, please see the documentation on Github provided as part of the portal source code.

Client Libraries

Client libraries, packages, and modules are pieces of software that make it easier to interface with the iDigBio API from a specific programming language. We have developed and maintain libraries for the following languages:

ridigbio R Package for the Search API

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.

The production version of this package is in CRAN. The latest development version is available at https://github.com/idigbio/ridigbio

The ridigbio R package is an "official" client library in the iDigBio code repository.

iDigBio Python Library for Search API

The Python client library is available via PyPI:

https://pypi.python.org/pypi/idigbio

or just use pip:

pip install idigbio

The client code can also be found at github if you wish to help develop the library:

https://github.com/idigbio/idigbio-python-client/

The iDigBio Python library is an "official" client library in the iDigBio code repository.

idigbio_client Ruby Gem for the Search API

The idigbio_client ruby gem is available:

https://rubygems.org/gems/idigbio_client

which can be installed via:

gem install idigbio_client

The idigbio_client ruby gem is a "community-supported" client library with software repository currently at:

https://github.com/GlobalNamesArchitecture/idigbio_client

Suggestions?

If you are in need of other client libraries or iDigBio API features, please use the feedback button to submit your request.


Technical Details

We describe the iDigBio API Software used to provide these APIs in a separate document.