IDigBio Download API: Difference between revisions

From iDigBio
Jump to navigation Jump to search
(more...)
m (→‎Query Endpoint: fixing bad syntax on the example query)
 
(9 intermediate revisions by 2 users not shown)
Line 5: Line 5:
== Overview ==
== Overview ==


'''Note: While the download API is currently used by the production portal, it should be considered highly unstable for non-iDigBio consumers. The next revision of the API will most likely be a total rewrite, backend and front.
The Download API works by performing the requested query and building a Darwin Core Archive. Once archive generation has begun, the status endpoint can be polled to determine if the generation has been completed. Once the archive generation is completed, the API provides a link to the file for download. If the optional email parameter is supplied on the query request, an email notification will be sent that includes a link to the downloadable file.
'''
 
'''The download API may not provide "friendly" error messages at this time.'''
 
 
The Download API works by performing the requested query and building a Darwin Core Archive. Once archive generation has begun, the status endpoint can be polled to determine if the generation has been completed. Once the archive generation is completed, the API provides a link to the file for download. If the optional email parameter is supplied on the query request, an email notification will be sent.


Large queries (and thus large archive file creation) can take multiple hours to complete.
Large queries (and thus large archive file creation) can take multiple hours to complete.


;;GET requests
;;GET requests
Line 22: Line 15:
;;POST requests
;;POST requests


A query submitted as a POST request must be supplied as JSON in the content body and specify the "Content-Type: application/json" request header.
<s>A query submitted as a POST request must be supplied as JSON in the content body and specify the "Content-Type: application/json" request header.</s>
 
Documentation for POST requests coming in the future. Please contact us if you have a need for this documentation.


== Query Endpoint ==
== Query Endpoint ==


The download service url:
The download service query url:


<pre>
<pre>
https://csv.idigbio.org/?query={Query in iDigBio query format}[&email={valid email address}]
https://api.idigbio.org/v2/download/?rq={Query in iDigBio query format}&[email={valid email address}]
</pre>
</pre>


See [https://github.com/iDigBio/idigbio-search-api/wiki/Query-Format iDigBio query format] for more information on writing queries.
See [https://github.com/iDigBio/idigbio-search-api/wiki/Query-Format iDigBio query format] for more information on writing queries.


=== Query Example - genus acer ===
The "email" parameter is optional. Specifying a valid email address will cause an email notification to be sent.


Consider the following query:
A successful request to the query endpoint will return a JSON document that includes a "complete" status flag and a "status_url" link. The "status_url" is a link to the Status Endpoint (see below) which can safely be polled until "complete" is "true".


<pre>
== Status Endpoint ==
{ "genus" : "acer"}
</pre>


Without specifying an email address, we could request a download with the following url:
The download service status url:


<pre>
<pre>
https://csv.idigbio.org/?query=%7B%22genus%22%3A%22acer%22%7D
http://api.idigbio.org/v2/download/{status uuid}
</pre>
</pre>


Using curl, we can see the response:
Using the "status_url" that was returned in the JSON from the query endpoint, a successful request to the status endpoint will return a JSON document that includes a number of fields including "complete" which is a status flag and "download_url" which, once the generation is completed, is a link to the generated archive.


=== Query Example - genus acer with email ===
Consider the following query:


<pre>
<pre>
https://csv.idigbio.org/?query=&lt;[https://github.com/iDigBio/idigbio-search-api/wiki/Query-Format idigbio query format] , not all query types are available yet>&email=<email, optional>
{ "genus" : "acer"}
</pre>
</pre>


Example:
We could request a download and notification sent to email address "donotreply@idigbio.org" via the following url:


<pre>
<pre>
https://csv.idigbio.org/?query={%22genus%22:%22acer%22}
https://api.idigbio.org/v2/download/?rq=%7B%22genus%22%3A%22acer%22%7D&email=donotreply%40idigbio.org
</pre>
</pre>


The resulting JSON will include the "status_url" field which is a link to a status page for this download query.
After the downloadable archive file is generated, the "complete" field will be set to "true" and the "download_url" field will include a link to the available file.


When the download file is generated, the "complete" field will be set to "true" and the "download_url" field will include a link to the available file.
Using curl (output formatted with JSON "prettify"), we can see that a completed archive is available at the download_url:


=== Query Example - genus acer with email ===
<pre>
$ curl "https://api.idigbio.org/v2/download/?rq=%7B%22genus%22%3A%22acer%22%7D"
{
  "complete": false,
  "core_source": "indexterms",
  "core_type": "records",
  "expires": "2015-07-21T14:17:19.715995",
  "form": "dwca-csv",
  "mediarecord_fields": null,
  "mq": null,
  "record_fields": null,
  "rq": {
    "genus": "acer"
  },
  "status_url": "https://localhost:30000/v2/download/fd8de83e-7fb7-4edd-8a6a-f11234eec664",
  "task_status": "PENDING"
}
</pre>


which is the same link that would be included in the "iDigBio Download Ready" email.


== Status Endpoint ==
If we follow the "status_url" in the above we see similar information:


A successful request to the query endpoint will return a JSON document that includes a number of fields including "complete" which is a status flag and "download_url" which, once the generation is completed, a link to the generated download file.
<pre>
$ curl -s "https://api.idigbio.org/v2/download/fd8de83e-7fb7-4edd-8a6a-f11234eec664"
{
  "complete": false,
  "core_source": "indexterms",  
  "core_type": "records",  
  "expires": "2015-07-21T14:17:19.715995",
  "form": "dwca-csv",
  "mediarecord_fields": null,
  "mq": null,
  "record_fields": null,
  "rq": {
    "genus": "acer"
  },
  "status_url": "https://localhost:30000/v2/download/fd8de83e-7fb7-4edd-8a6a-f11234eec664",
  "task_status": "PENDING"
}
</pre>


=== Status Endpoint Example ===
=== Status Endpoint Example ===
Line 77: Line 108:


<pre>
<pre>
{ "scientificname" : "puma concolor" }
{ "genus" : "acer" }
</pre>
</pre>
becomes the following when url-encoded:


<pre>
<pre>
https://csv.idigbio.org/?query=%7B%22scientificname%22%3A%22puma+concolor%22%7D
https://api.idigbio.org/v2/download/?rq=%7B%22genus%22%3A%22acer%22%7D
</pre>
</pre>


Using the above query,  
Immediately after the query is run, the "complete" flag is still set to false. There is no archive available for download (yet).
 
http://csv.idigbio.org/status/&lt;download id> (returned as "status_url" from the original download query)
 
Ex. http://csv.idigbio.org/status/995ed58b-01fd-4c98-893e-e0cbdfadc8fe
 


<pre>
<pre>
{"complete": false,
{
"status_url": "http://csv.idigbio.org/status/cba4ae0f-da2b-42ec-b763-132a209c3251",
  "complete": false,  
"expires": "2015-04-28T11:46:54.562842",
  "core_source": "indexterms",
"query_hash": "5921ce268fe0d911196a4564eea8ce9ffc2e2420",
  "core_type": "records",  
"query":  
  "expires": "2015-07-21T14:17:19.715995",  
{"scientificname": "puma concolor"},
  "form": "dwca-csv",
"task_status": "PENDING"}
  "mediarecord_fields": null,  
  "mq": null,
  "record_fields": null,
  "rq": {
    "genus": "acer"
  },  
  "status_url": "https://localhost:30000/v2/download/fd8de83e-7fb7-4edd-8a6a-f11234eec664",
  "task_status": "PENDING"
}
</pre>
</pre>


We wait a while and try again, the status changes:
If we wait a while and try again, the status changes, and the file is available at the provided download_url:


<pre>
<pre>
 
{
{"complete": true,
  "complete": true,  
"status_url": "http://csv.idigbio.org/status/cba4ae0f-da2b-42ec-b763-132a209c3251",
  "core_source": "indexterms",
"expires": "2015-04-28T11:46:54.401498",
  "core_type": "records",  
"download_url": "http://s.idigbio.org/idigbio-downloads/cba4ae0f-da2b-42ec-b763-132a209c3251.zip",
  "expires": "2015-07-21T14:17:19.715995",  
"query_hash": "5921ce268fe0d911196a4564eea8ce9ffc2e2420",
  "form": "dwca-csv",  
"query":  
  "mediarecord_fields": null,
{"scientificname": "puma concolor"},
  "mq": null,  
"task_status": "SUCCESS"}
  "record_fields": null,
  "rq": {
    "genus": "acer"
  },  
  "status_url": "https://localhost:30000/v2/download/fd8de83e-7fb7-4edd-8a6a-f11234eec664",
  "task_status": "SUCCESS"
}
</pre>
</pre>

Latest revision as of 12:05, 8 August 2024


Overview

The Download API works by performing the requested query and building a Darwin Core Archive. Once archive generation has begun, the status endpoint can be polled to determine if the generation has been completed. Once the archive generation is completed, the API provides a link to the file for download. If the optional email parameter is supplied on the query request, an email notification will be sent that includes a link to the downloadable file.

Large queries (and thus large archive file creation) can take multiple hours to complete.

GET requests

A query submitted as a GET request must be URL-encoded.

POST requests

A query submitted as a POST request must be supplied as JSON in the content body and specify the "Content-Type: application/json" request header.

Documentation for POST requests coming in the future. Please contact us if you have a need for this documentation.

Query Endpoint

The download service query url:

https://api.idigbio.org/v2/download/?rq={Query in iDigBio query format}&[email={valid email address}]

See iDigBio query format for more information on writing queries.

The "email" parameter is optional. Specifying a valid email address will cause an email notification to be sent.

A successful request to the query endpoint will return a JSON document that includes a "complete" status flag and a "status_url" link. The "status_url" is a link to the Status Endpoint (see below) which can safely be polled until "complete" is "true".

Status Endpoint

The download service status url:

http://api.idigbio.org/v2/download/{status uuid}

Using the "status_url" that was returned in the JSON from the query endpoint, a successful request to the status endpoint will return a JSON document that includes a number of fields including "complete" which is a status flag and "download_url" which, once the generation is completed, is a link to the generated archive.

Query Example - genus acer with email

Consider the following query:

{ "genus" : "acer"}

We could request a download and notification sent to email address "donotreply@idigbio.org" via the following url:

https://api.idigbio.org/v2/download/?rq=%7B%22genus%22%3A%22acer%22%7D&email=donotreply%40idigbio.org

After the downloadable archive file is generated, the "complete" field will be set to "true" and the "download_url" field will include a link to the available file.

Using curl (output formatted with JSON "prettify"), we can see that a completed archive is available at the download_url:

$ curl "https://api.idigbio.org/v2/download/?rq=%7B%22genus%22%3A%22acer%22%7D"
{
  "complete": false, 
  "core_source": "indexterms", 
  "core_type": "records", 
  "expires": "2015-07-21T14:17:19.715995", 
  "form": "dwca-csv", 
  "mediarecord_fields": null, 
  "mq": null, 
  "record_fields": null, 
  "rq": {
    "genus": "acer"
  }, 
  "status_url": "https://localhost:30000/v2/download/fd8de83e-7fb7-4edd-8a6a-f11234eec664", 
  "task_status": "PENDING"
}

which is the same link that would be included in the "iDigBio Download Ready" email.

If we follow the "status_url" in the above we see similar information:

$ curl -s "https://api.idigbio.org/v2/download/fd8de83e-7fb7-4edd-8a6a-f11234eec664"
{
  "complete": false, 
  "core_source": "indexterms", 
  "core_type": "records", 
  "expires": "2015-07-21T14:17:19.715995", 
  "form": "dwca-csv", 
  "mediarecord_fields": null, 
  "mq": null, 
  "record_fields": null, 
  "rq": {
    "genus": "acer"
  }, 
  "status_url": "https://localhost:30000/v2/download/fd8de83e-7fb7-4edd-8a6a-f11234eec664", 
  "task_status": "PENDING"
}

Status Endpoint Example

Given the following query JSON:

{ "genus" : "acer" }

becomes the following when url-encoded:

https://api.idigbio.org/v2/download/?rq=%7B%22genus%22%3A%22acer%22%7D

Immediately after the query is run, the "complete" flag is still set to false. There is no archive available for download (yet).

{
  "complete": false, 
  "core_source": "indexterms", 
  "core_type": "records", 
  "expires": "2015-07-21T14:17:19.715995", 
  "form": "dwca-csv", 
  "mediarecord_fields": null, 
  "mq": null, 
  "record_fields": null, 
  "rq": {
    "genus": "acer"
  }, 
  "status_url": "https://localhost:30000/v2/download/fd8de83e-7fb7-4edd-8a6a-f11234eec664", 
  "task_status": "PENDING"
}

If we wait a while and try again, the status changes, and the file is available at the provided download_url:

{
  "complete": true, 
  "core_source": "indexterms", 
  "core_type": "records", 
  "expires": "2015-07-21T14:17:19.715995", 
  "form": "dwca-csv", 
  "mediarecord_fields": null, 
  "mq": null, 
  "record_fields": null, 
  "rq": {
    "genus": "acer"
  }, 
  "status_url": "https://localhost:30000/v2/download/fd8de83e-7fb7-4edd-8a6a-f11234eec664", 
  "task_status": "SUCCESS"
}