iDigBio API Hackathon Media Ingestion Group Report

June 3-5, 2015 (iDigBio API Hackathon) – Team Media Ingestion blog, by Ben Anhalt (University of Kansas/Specify), Benjamin Brandt (Arizona State University/Symbiota), Caitlin Chapman (Northern Arizona University), Renato Figueiredo (UF/iDigBio), Edward Gilbert (Arizona State University/Symbiota), Kyuho Jeong (UF/iDigBio), Shaun Mahmood (American Museum of Natural History), Dan Stoner (UF/iDigBio)

The Media Ingestion Group worked on integration with the iDigBio media API as well as 3D imaging. Shaun Mahmood chose to focus his work during the hackathon on 3D images and did some of the legwork that would be required before iDigBio can display 3D images in the iDigBio specimen portal. Shaun gave an overview of the scanning process and the digital outputs from those processes. He talked about file formats, software tools, and examples of 3D images available on various websites. Dan Stoner outlined some of the requirements for media ingestion from the iDigBio side and Shaun explored options for post-processing to achieve a compromise between full resolution and "web quality" resolution. Additionally, Shaun demoed the features of the Adobe browser plugin capable of rendering 3D PDF files and a javascript viewer that does not require a browser plugin.

The iDigBio programming team plans to leverage this work in the future for:

Updating iDigBio media guidelines to include information on 3D images
Adding support for ingestion of 3D images as media associated with specimen records
Display of 3D images in the iDigBio web portal

Here are some of the samples of 3D images that were presented:

The other members targeted increased integration between iDigBio and third-party software including Symbiota Software Project and Specify Software Project, two widely-used collection management systems.

The iDigBio Image Ingestion Appliance (in the future this may be renamed to "Media Ingestion Appliance" as it gains features to handle media other than just images, such as the 3D images mentioned above) is a piece of software that uses the iDigBio Media APIs to upload image media into iDigBio. Renato Figueiredo and Kyuho Jeong from iDigBio were available to walk through the features and operation of the Appliance and demonstrate the interaction with the iDigBio Media APIs. After seeing the appliance in action, the participants from Symbiota and Specify thought that the appliance might be useful to adapt rather than writing code from scratch.

Symbiota users who are producing digital images currently use FTP upload as part of their imaging workflow. This has a number of quirks when it comes to publishing media to iDigBio. During the hackathon, Ed Gilbert and Ben Brandt came up with a workflow to replace existing FTP processes with uploads via the iDigBio Image Ingestion Appliance. One of the biggest challenges was the process of linking of image files to specimen records and after bringing in Alex Thompson from iDigBio to hammer out the details, this obstacle was defeated.

During the hackathon, the Symbiota team wrote code to allow Symbiota to do the linking of images to specimens based on information in the CSV file that is uploaded along with the images from the iDigBio Image Ingestion Appliance.

On the final day, Caitlin Chapman performed a live demo of the new workflow in Symbiota using the iDigBio Image Ingestion Appliance to upload an image. This succeeded as a test of the workflow as it could be used by imaging teams out in the collections.

Ben Anhalt of University of Kansas / Specify wrote a first person account of his iDigBio hackathon experience, which we reproduce below in its entirety.

----

Having never attended a hackathon before, I was not sure what to expect when signing up for the iDigBio API Hackathon. I had sort of a notion that a hackathon was an event where a bunch of developers are invited to work together on one particular task, so I was surprised that participants were being asked to bring their own goals to the event.

At any rate, my mandate was to do something Specify related. Specify does collection management meaning that Specify and similar tools are the ultimate source of the data that ends up at iDigBio. The most obvious thing to do would be to leverage any new API to simplify the process of moving data from Specify to iDigBio. This turned out not to be possible because the iDigBio APIs are primarily focused on searching and retrieving the data that is already held by iDigBio. In fact, the only available ingestion API was for media.

In Specify, media may be attached to various types of records. These attachments are stored separately from the records database by an independent media server via an ad hoc, internal API. I figured that it should be possible to adapt Specify 7 to allow the iDigBio media API to be used in place of the existing internal API. The advantage of having such an option is that Specify 7 collections that intend to send all of their data to iDigBio, including media attachments, would be able to forgo using the Specify attachment server completely thereby reducing their infrastructure and IT requirements. In other words, by making Specify 7 work directly with the iDigBio media API the redundancy of having two media servers storing the same data would be removed. This is the goal I settled on for the hackathon.

During the media ingestion group discussion on the first day, I learned that iDigBio has produced a tool that utilizes the media ingestion API to bulk upload resources from iDigBio contributors. This iDigBio-ingestion-tool obviously knew how to upload files through the media API and was written in Python like the Specify 7 server. This gave me the idea of trying to directly import the existing tool into Specify 7 and use its existing routines for the API. It seemed that such an approach would be in true hacker spirit.

I began by forking the existing ingestion tool on GitHub and making a few adjustments that would allow it to be imported as a library. Then, in a branch of the Specify 7 repository I added that fork as a git submodule. I added some options to the Specify 7 settings to allow the iDigBio API to be selected for the media server and for setting the iDigBio API credentials. The Specify 7 code was then adjusted to be able to call into the existing ingestion tool routines for uploading and to also request files using the iDigBio API. In this fashion I was
able to quickly get a working prototype.

Despite the pleasing hacker aesthetic of simply wiring-in an existing tool to accomplish the task, it didn't make too much sense to create a dependency from Specify 7 to an independent tool with all its own external dependencies, especially since the API is quite user friendly. Now that I knew it could be made to work, I replaced the calls into the ingestion tool with newly written code to access the API directly. The result was a nearly complete iDigBio media server option for Specify 7.

I believe this work can be refined into a deliverable for Specify. There are a few bugs to be squashed related to when thumbnails are generated for uploaded files. I would also like to remove some code duplication between the interfaces to the Specify media API and the iDigBio media API.

Overall, I had a very good experience at the hackathon. It was fun meeting other developers working on similar problems and seeing how they worked. Thanks again to the organizers and iDigBio staff for hosting an enjoyable and productive get together!

----

On the final day of the hackathon, Ben ran a demo and succeeded in linking a specimen record in Specify to an image that had been uploaded to the iDigBio Media API.

Go back to read the other reports.

iDigBio API Hackathon Media Ingestion Group Report

Researchers

Collections Staff

Teachers & Students

Language