Managing Natural History Collections Data for Global Discoverability
| Managing Natural History Collections Data for Global Discoverability | |
|---|---|
| Quick Links for Managing NHC Data for Global Discoverability wiki | |
| Managing NHC Data Announcement | |
| Managing NHC Data for Global Discoverability - Agenda | |
| Managing NHC Data for Global Discoverability Biblio Entries | |
| Managing NHC Data for Global Discoverability Report | |
This wiki supports the Managing Natural History Collections (NHC) Data for Global Discoverability Workshop and is in development. This workshop is sponsored by iDigBio and hosted by the Arizona State University (ASU) School of Life Sciences Natural History Collections, Informatics & Outreach Group in their new Alameda space on September 15-17, 2015. It is the fourth in a series of biodiversity informatics workshops held in fiscal year 2014-2015. The first three were 1) Data Carpentry, 2) Data Sharing Data Standards and Demystifying the IPT, and 3) Field to Database (March 9 - 12, 2015).
General Information
Description and Overview of Workshop. Are you:
- actively digitizing NHC data and looking to do it more efficiently?
- getting ready to start digitizing NHC data and looking to learn some new skills to enhance your workflow?
- digitizing someone else’s specimens (e.g., as part of a research project)?
- finding yourself in the role of the museum database manager (even though it may not be your title or original job)?
- someone who has a private research collection who wishes to donate specimens and data to a public collection?
The theme of the "Collections Data for Global Discoverability" workshop is ideally suited for natural history collections specialists aiming to increase the "research readiness" of their biodiversity data at a global scale. Have you found yourself in situations where you need to manage larger quantities of collection records, or encounter challenges in carrying out updates or quality checks? Do you mainly use spreadsheets (such as Excel) to clean and manage specimen-level datasets before uploading them into your collections database? The workshop is most appropriate for those who are relatively new to collections data management and are motivated to provide the global research community with accessible, standards- and best practices-compliant biodiversity data.
During the workshop essential information science and biodiversity data concepts will be introduced (i.e., data tables, data sharing, quality/cleaning, Darwin Core, APIs). Hands-on data cleaning exercises using spreadsheet programs and readily usable and free software will be performed. The workshop is platform independent, and thus will not focus on the specifics of one or the other locally preferred biodiversity database platforms, instead addressing fundamental themes and solutions that will apply to a variety of database applications.
To Do For You: Pre-reading materials 
Updates will be posted to this website as they become available.
Planning Team
Collaboratively brought to you by: Katja Seltmann (AMNH - TTD-TCN), Amber Budden (DataONE), Edward Gilbert (ASU - Symbiota), Nico Franz (ASU), Mark Schildhauer (NCEAS), Greg Riccardi (FSU - iDigBio), Reed Beaman (NSF), Cathy Bester (iDigBio), Shari Ellis (iDigBio), Kevin Love (iDigBio), Deborah Paul (FSU - iDigBio)
About
Instructors (iDigBio): Katja Seltmann, Amber Budden, Edward Gilbert, Nico Franz, Greg Riccardi, Deborah Paul, Joanna McCaffrey, Kevin Love, Anne Thessen
Skill Level: We are focusing our efforts in this workshop on beginners.
Where and When: Tempe, AZ at the Arizona State University (ASU) School of Life Sciences Natural History Collections, Informatics & Outreach Group in their new Alameda space, September 15 - 17, 2015
Requirements: Participants must bring a laptop.
Contact (iDigBio Participants): Please email Deb Paul dpaul@fsu.edu for questions and information not covered here.
Twitter:
Tuition for the course is free, but there is an application process and spots are limited (and class is full).
Software Installation Details
A laptop and a web browser are required for participants. 
We use Adobe Connect extensively in this workshop. Please perform the systems test using the link below. Also, you will also need to install the Adobe Connect Add-In to participate in the workshop.
- Adobe Connect Systems Test
- Note when you follow the link to install and perform the test, some software will install (but it doesn't look like anything happens). To check, simply re-run the test.
 
Agenda
- Managing NHC Data Adobe Connect Room http://idigbio.adobeconnect.com/nhcdata
- Monday evening, September 14th: pre-workshop informal get-together at Vine Tavern and Eatery, 6 PM.
Schedule - subject to change.
| Course Overview - Day 1 - Tuesday September 15th | ||
|---|---|---|
| 8:15-8:30 | Check-in, name tags, log in, connect to wireless and Adobe Connect | All | 
| 8:30-9:15 | Welcome, Logistics, Intro to the Workshop, Why Share Data? Why this workshop? 
 | Deb Paul, Amber Budden | 
| 09:15-9:35 | General Concepts and Best Practices 
 | Ed Gilbert and Amber Budden | 
| 9:35-9:55 | Overview of Data standards 
 | Ed Gilbert, Deb Paul | 
| 10:00-10:30 | Introduction to Mapping Data 
 | All | 
| 10:30-10:50 | Break | |
| 10:50-11:30 | Data Management Planning 
 | Amber Budden and Joanna McCaffrey | 
| 11:30-12:00 | DataONE Lesson 4 | Amber Budden | 
| 12:00-1:00 | Lunch (Provided by Panera) | |
| 1:00-1:30 | Images and media issues: a brief intro 
 | Ed Gilbert and Joanna McCaffrey | 
| 1:30-2:00 | Digitization workflows and process: Common Workflows and Optimization 
 | Katja Seltmann, Deb Paul & Ed Gilbert | 
| 2:00 - 3:00 | Collections Tours and Symbiota Demo. (groups of 10) 
 | All | 
| 3:00-3:20 | Break | |
| 3:20-3:50 | Georeferencing Data (Georeferencing Workflow) 
 | Ed Gilbert | 
| 3:50-4:10 | GEOLocate Exercise (May be DEMO) 
 | Ed Gilbert | 
| 4:40-5:30 | Conversation, overview of day, preview for tomorrow, backpack logistics for tomorrow, ... | All | 
| Course Overview - Day 2 - Wednesday September 16th | ||
| 8:30-12:00 | Desert Botanical Garden (DBG) Field Trip and Lunch 
 | |
| 11:30-12:30 | Lunch at Gertrude's (in the Garden) | |
| 1:00-1:25 | Welcome Back and Intro to Data Quality 
 | Amber Budden, Greg Riccardi, (Ed Gilbert) | 
| 1:25-1:45 | Review Tools for Data Cleaning, Data Manipulation, and Visualization (and Lessons) 
 | Deb Paul | 
| 1:45-2:00 | Data Cleaning 
 | Deb Paul & Katja Seltmann | 
| 2:00-2:50 | Data Cleaning Exercise I 
 | Katja Seltmann & Deb Paul | 
| 2:50-3:10 | Break | |
| 3:10-3:40 | Data Cleaning Exercise II 
 | Deb Paul & Katja Seltmann | 
| 3:40 - 4:00 | Feedback: iDigBio recordset data cleaning | Kevin Love | 
| 4:00-5:00 | Conversation, overview of day for context and questions, homework and preview for tomorrow... | Deb Paul & Katja Seltmann | 
| Evening Activity (opt) | Night-lighting Opportunity 
 | Host - Nico Franz | 
| Course Overview - Day 3 - Thursday September 17th | ||
| 8:45-9:00 | Discussion of Material Covered so far and Overview of Day 3 | Katja Seltmann | 
| 9:00-10:00 | Potential break out groups 
 | All | 
| 10:00-10:35 | Break | |
| 10:35 - 10:55 | Sharing Data: Preparing and Moving Data to the Internet 
 | Greg Riccardi | 
| 10:55-11:20 | Data Publishing: in the context of the data life cycle 
 | Anne Thessen, http://datadetektiv.com/ | 
| 11:20-11:40 | Getting Your Data Published: Sending Data to iDigBio 
 | Joanna McCaffrey | 
| 12:00-1:00 | Lunch (Provided by Panera) | |
| 1:00-1:45 | iDigBio Portal Exercise 
 | Katja Seltmann | 
| 1:45-2:05 | Copyright / Intellectual Property | David Bloom, Jonathan Rees, Greg Riccardi | 
| 3:00-3:20 | Break | |
| 3:20-4:45 | Second round of break-out groups 
 | Edward Gilbert | 
| 4:45-5:30 | Closing topics 
 | Katja Seltmann & Nico Franz, all | 
Logistics
- Map showing Hotel and Workshop Locations (pdf)
- Logistics for hotel / per diem / contacts / transportation (pdf)
- Some restaurants near the Hotel 1333 (pdf)
- List of Restaurants (pdf)
 
- Workshop Calendar Announcement
- Participant List
Adobe Connect Access
Adobe Connect will be used to provide access for everyone and for remote folks to listen to the lectures.
Workshop Documents, Presentations, and Links
- Google Collaborative Notes
- These are notes with benefits.
 
- links to any presentations (like power points) here
- Darwin Core Terms
- Participant Presentations
Pre-Workshop Reading List
Links beneficial for review
- Darwin Core Terms Index
- Mapping to Old Versions (for those who might be familiar with older versions of DwC)
- Audubon Media Description standard terms index
- others? (Perhaps Canadensys and VertNet Norms, Canadensys Creative Commons licensing for occurrence data and VertNet Data Licensing Guide)
- iDigBio Data Ingestion Guidance
Workshop Recordings
Day 1
- 8:30am-10:15m
- 10:45am-11:00am
- 11:15am-12pm
- 1:00pm-2:30pm
- 3:00-5:00pm
Day 2
- 1:00pm-2:30pm
- 3:00-5:00pm
Day 3
- 8:30am-10:15am
- 10:45am-11:00am
- 11:15am-12pm
- 1:00pm-3:30pm
- 3:30-5:00pm
Resources and Links
- Got a favorite resource - a book?, a website? to share with your classmates?
- Canadensys Introduction to Darwin Core
- Experts Workshop on the GBIF Integrated Publishing Toolkit (IPT) v. 2
- Summary resources available from IPT workshop held the 20-22 June in Copenhagen, Denmark.
 
- Example of a Data Paper: Yves Bousquet, Patrice Bouchard, Anthony E. Davies, and Derek S. Sikes. 2013. Data associated with CHECKLIST OF BEETLES (COLEOPTERA) OF CANADA AND ALASKA. SECOND EDITION. DATA PAPER. ZooKeys. http://dx.doi.org/10.5886/998dbs2a
- For more Data Papers: http://biodiversitydatajournal.com/
- Darwin Core extension for germplasm Dag Endresen (on slideshare)
- Data exchange standards, protocols and formats relevant for the collection data domain within the GFBio network
- Check out this link if you'd like to see one example page about the multitude of current standards in use in the Natural History Collections and Culture Collections world; this example is from the german federation for the curation of biological data (gfbio).
 
- GBIF Darwin Core Archive, How-to Guide (download the pdf).
- GBIF Metadata Profile Reference Guide (download the pdf).
- Darwin Core Quick Reference Guide (download the pdf).
- Guide on how to use the BioVeL portal (includes a section on OpenRefine).
- lynda.com is a useful collection of tutorials on various IT and other resources - e.g. on relational databases
- For example see relational database fundamentals
 
- You want to share genetic sequence data for your specimens? Are the sequences in a database like GenBank? You can use dwc:associatedSequences field to share links to the sequences and metadata about them. Note you can soon use the Material Sample Core, and share more complex genomic data using the GGBN extensions, and also use an extension to share the specimen information from which the samples were taken.
- INHS digitization system shopping list + info on setup
