Data Ingestion Guidance: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 6: | Line 6: | ||
Below are a few things that we ask of the data to make it fit for use in the cyberinfrastructure we are building: | Below are a few things that we ask of the data to make it fit for use in the cyberinfrastructure we are building: | ||
== Data Requirements == | == Data Requirements == | ||
3 kinds of data submitted: specimen data, media related to and attached to specimen records, media files | |||
#each specimen record needs to have a unique identifier (within the dataset) identifier in the occurrenceID field. | |||
#you need to have permission to submit the data | |||
# | |||
#we would like it to be available to our harvester via IPT and RSS if possible, otherwise in DarwinCore format in a CSV file would work too. | #we would like it to be available to our harvester via IPT and RSS if possible, otherwise in DarwinCore format in a CSV file would work too. | ||
#dates in ISO 8601 format, i.e., YYYY-MM-DD | #dates in ISO 8601 format, i.e., YYYY-MM-DD | ||
#caution to preserve diacritics in people and place names. | #caution to preserve diacritics in people and place names (save the data in UTF8 format). | ||
*For all images/media objects | *For all images/media objects | ||
#each media record needs to have a GUID: a persistent globally unique identifier | #each media record needs to have a GUID: a persistent globally unique identifier or at least a unique (within the dataset) identifier in the occurrenceID field. | ||
#we need there to be Audubon Core metadata file, with one record to go with each media record, and we can provide coaching to help you create that file. The more you can flesh out the details of the image, the more likely it will be to be highly retrievable. | #we need there to be Audubon Core metadata file, with one record to go with each media record, and we can provide coaching to help you create that file. The more you can flesh out the details of the image, the more likely it will be to be highly retrievable. | ||
#just like the ownership of catalog records, the media records need to provided freely and with permission, and each record needs to have at least Creative Commons permission = "CC BY" | #just like the ownership of catalog records, the media records need to provided freely and with permission, and each record needs to have at least Creative Commons permission = "CC BY" |
Revision as of 21:34, 10 January 2014
Guidance When First Considering iDigBio Data Ingestion
Contact Info
For assistance, contact data@idigbio.org
Below are a few things that we ask of the data to make it fit for use in the cyberinfrastructure we are building:
Data Requirements
3 kinds of data submitted: specimen data, media related to and attached to specimen records, media files
- each specimen record needs to have a unique identifier (within the dataset) identifier in the occurrenceID field.
- you need to have permission to submit the data
- we would like it to be available to our harvester via IPT and RSS if possible, otherwise in DarwinCore format in a CSV file would work too.
- dates in ISO 8601 format, i.e., YYYY-MM-DD
- caution to preserve diacritics in people and place names (save the data in UTF8 format).
- For all images/media objects
- each media record needs to have a GUID: a persistent globally unique identifier or at least a unique (within the dataset) identifier in the occurrenceID field.
- we need there to be Audubon Core metadata file, with one record to go with each media record, and we can provide coaching to help you create that file. The more you can flesh out the details of the image, the more likely it will be to be highly retrievable.
- just like the ownership of catalog records, the media records need to provided freely and with permission, and each record needs to have at least Creative Commons permission = "CC BY"
The methods for linking the catalog records to the media records are in this document, as well as explanation about creating GUIDs for the records:
Details about data ingestion requirements and guidelines are here:
Additional info about image format is here:
If you need to learn about acceptable Creative Commons licenses in iDigBio: