Field to Database: Difference between revisions
| m (→Agenda) | m (→Agenda) | ||
| Line 253: | Line 253: | ||
| |- | |- | ||
| |1:30-5:00 | |1:30-5:00 | ||
| | Getting started with R | |[http://idigbio.github.io/2015-03-09-workshop-field2db/intro-R.html Getting started with R] | ||
| | François Michonneau (Lead) | | François Michonneau (Lead) | ||
| |- | |- | ||
Revision as of 12:25, 14 April 2015
| Field to Database | |
|---|---|
| Quick Links for Field to Database | |
| Field to Database Workshop Agenda | |
| Field to Database Workshop Biblio Entries | |
| Field to Database Workshop Report (Workshop Blog) | |
Apply Now
Workshop is full. Application Form is closed.
General Information
This workshop's aim is to investigate current trends in collecting, and focus on best practices and skills development for supporting the collection and sharing of robust, fit-for-research-use data. This 4-day short course is designed to be hands-on and will mix lectures with field work and participant exercises and presentations.
Planning Team
Deb Paul (iDigBio), Katja Seltmann (TTD-TCN, AMNH), François Michonneau (FLMNH - iDigBio), Derek Masaki (USGS - BISON), Pam Soltis (FLMNH - iDigBio PI), Shari Ellis (iDigBio), Kevin Love (iDigBio)
About
Skill Level
Some exposure to R is required. This workshop expects you have some experience with R. If you are new-ish to R, we request you take an intro to R course before the workshop. There are several good options:
- Try R (Code School course)
- intro to R(Coursera course starts Feb 2nd).
- Beginner Course: Up and Running with R with Barton Poulson (course at lynda.com)
- Intermediate Course: R Statistics Essential Training with Barton Poulson(course at lynda.com)
Instructors: François Michonneau (FLMNH - iDigBio), Katja Seltmann (TTD-TCN, AMNH), Derek Masaki (USGS), Matt Collins (ACIS - iDigBio)
Assistants: Deborah Paul (FSU - iDigBio), Matt Cannister (USGS) 
Who: The course is aimed at graduate students, postdocs, research staff, and other researchers.
Where: iDigBio in Gainesville, FL
Requirements:
- Participants must bring a laptop with a few specific software packages installed.
- Participants must have some knowledge of R. This is not a beginner-level course. There are introductions to R you can take on-your-own before the workshop.
- If you will be traveling from out of town, you will need to make your own travel arrangements.
Contact: Please email Deb Paul, dpaul@fsu.edu for questions and information not covered here.
Twitter: #field2db @idigbio
 
Tuition for the course is free, but prior registration is required for attending. You can register here.
Software Installation Requirements
Software needed for Field to Database Course at iDigBio 
Mac OS X
- Text Editor
- We recommend Text Wrangler. In a pinch, you can use nano, which should be pre-installed.
 
- RStudio + R
- Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.
 
- Spreadsheet
- If you already have a spreadsheet program installed, like LibreOffice, Excel or OpenOffice, you can use whatever you already have. If you don't have a spreadsheet program, please download and install LibreOffice from http://www.libreoffice.org/download/libreoffice-fresh/
 
PC
- Text Editor
- Notepad++ is a popular free code editor for Windows. Be aware that you must add its installation directory to your system path in order to launch it from the command line (or have other tools like Git launch it for you). The instructions to modify your path are available online here. Please ask your instructor to help you do this.
 
- RStudio + R
- Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.
 
- Spreadsheet
- If you already have a spreadsheet program installed, like LibreOffice, Excel or OpenOffice, you can use whatever you already have. If you don't have a spreadsheet program, please download and install LibreOffice from http://www.libreoffice.org/download/libreoffice-fresh/
 
Linux
- Text Editor
- Kate is one option for Linux users. In a pinch, you can use nano, which should be pre-installed.
 
- RStudio + R
- You can download the binary files for your distribution from CRAN. Or you can use your package manager, e.g. for Debian/Ubuntu run apt-get install r-base. Also, please install the RStudio IDE.
 
- You can download the binary files for your distribution from CRAN. Or you can use your package manager, e.g. for Debian/Ubuntu run 
- Spreadsheet
- If you already have a spreadsheet program installed, like LibreOffice, Excel or OpenOffice, you can use whatever you already have. If you don't have a spreadsheet program, please download and install LibreOffice from http://www.libreoffice.org/download/libreoffice-fresh/
 
- You must RSVP that the required software is installed, prior to the workshop. Instructors are available to help - see your email for their contact information.
We use Adobe Connect extensively in this workshop. Please perform the systems test using the link below. Also, you will also need to install the Adobe Connect Add-In to participate in the workshop.
Goals
- Investigate, observe, discover leading-edge trends in field collecting.
- Provide examples of best practices for data collecting and data sharing including such data as field data, identifiers, trait data, and environmental variables.
- Explore data tools, to include software such as R, but also field apps.
- Convey the concept of, importance, and methods for how to create reproducible research workflows.
- Illustrate how data gets from the field into a collection database and into an aggregator's database.
- Discuss how data gets published and discovered.
Objectives
- Students participate in field collecting with subject-matter experts and present what changes they plan to make to their collecting practices in a workshop presentation.
- Subject-matter experts share what they have learned from seeing / talking with others on this topic.
- Students work through examples to demonstrate mastery of skills for transforming, enhancing, standardizing data.
- Through comments, discussion, and perhaps post-workshop survey, students demonstrate they grasp the importance of metadata and understand the conceptual difference between data and metadata.
- Students write a post-workshop blog post, prepare a report, or presentation, to synthesize what was learned and pay-it-forward.
Our curriculum overview
- Day 1: Why a Field-to-Database Biodiversity Informatics Workshop? On Site Field Demos from Invited Experts from Paleontology, Ornithology, Ecology, Marine Science, Entomology, and Botany
- Day 2: Student 3-minute presentations. General issues in field data collection to data synthesis. Getting started with R.
- Day 3: Data exploration using R. Import and display. From raw data to technically correct data. From technically correct data to consistent data. File output. Writing processed data to file.
- Day 4: Using R to access biodiversity APIs. Publishing data on iDigBio. Publishing data on DataDryad. Review, Wrap-up, Survey, Next Steps.
The concepts, skills, and tools we teach are domain-independent, but example problem cases and datasets will be taken from organismal and evolutionary biology, biodiversity science, ecology, and environmental science.
Updates to course Wiki will be posted to this website as they become available.
Workshop Evaluation
- Our pre-workshop survey simply asked participants to rank their R skills. With 19 respondents, our participants formed a heterogenous group:
- 6 chose "Low. I am a total beginner, have no or little experience, or have only gone through the R tutorial."
- 5 chose "Somewhat low. I have used R, but only under the guidance of someone more expert (e.g., during a course or workshop)."
- 5 chose "Neither high nor low. I can use and adapt scripts written by other people."
- 3 chose "Somewhat high. I can write my own scripts."
 
- Post Workshop Survey Results for Field to Database
Agenda
- AdobeConnect http://idigbio.adobeconnect.com/field2db
- Pre-workshop dinner at Piesano's, 630 PM Sunday March 8th, 2015. Piesano's is at NW 13th St. and 1250 W. University Ave. in Gainesville. All are welcome. Please do RSVP to Deb Paul, dpaul@fsu.edu
- Google Notes Doc
| Course Overview - Day 1, Monday March 9th | ||
|---|---|---|
| Time | Activity | Responsible | 
| 800 - 830 | Registration. name tags, wired/wireless, adobeconnect, check-in. | All, Deb Paul (iDigBio) | 
| 830 - 850 | Welcome and Introduction to iDigBio. (pptx) Motivation = Research! (pptx)(pdf) | Deb Paul (iDigBio) & Pam Soltis (iDigBio PI) | 
| 850 - 910 | Why a Field-to-Database Biodiversity Informatics Workshop? (pptx)(pdf) R_files_modeling | Charlotte Germain-Aubrey (iDigBio Post Doc) and Katja Seltmann (TTD-TCN) | 
| 910 - 930 | Let's go to the field! Where the best places are wet, isolated, and without internet. A story of the trials of typical fieldwork. | Emilio Bruna | 
| 930 - 940 | Using Digital Resources to Plan Field Expeditions How to prioritize where you collect? How do you plan a collecting trip? What kind of resources do you bring in the field? | Grant Godden | 
| 940 - 1000 | Tips and Workflows for Managing Field Data Field templates, workflow, and planning ahead for better results. | Andrew Short | 
| 1000 - 1010 | Standards for Collection of Genomic Resources Collecting RNA, DNA & flower color. Lessons from a recent field trip. | Grant Godden | 
| 10:00-10:30 | Break (remember Pascal's) | tea and drip coffee free with your name tag, check in at the counter | 
| 1030 - 1110 | Data and metadata standards for biodiversity media: the past, present and future. | Mike Webster | 
| 1110 - 1130 | Top 10 mobile applications every biologist should know about. Download and try. Here are some. 
 | Emilio Bruna | 
| 11:30 - 12:00 | Transport to Natural Teaching Area | (vans) | 
| 12:00 - 1:00 | Lunch (Brown Bag provided) | (organizers set up demo areas) | 
| 1200 - 1230 | Brown bag lunch discussion. Standards: Darwin Core and more. Emphasis of benefits of starting off using them right away. Presented in field using a handout and conversation regarding Darwin Core and other standards. Input from outside experts important for addressing sound/image/paleontological and ecological standards. Metadata. Field Handout - 1) Summary of some relevant standards including: Darwin Core, Ecological Metadata Language (EML), Audubon Media Extension, Global Genome Biodiversity Network (GGBN) and 2) Best practices for writing a locality description. | Deb Paul | 
| 1230 - 100 | Brown bag lunch discussion. Students try one of the cell phone or tablet applications presented by Emilio. Download a GPS app if you do not have one! Sharing is encouraged for students who do not have a mobile device. | Everyone | 
| 100 - 330 | Breakout Group 1: Activity (60min): Students are grouped into pairs or groups of three. Each team does two rounds of mini-collecting, 10 minutes each for total of 20 minutes. For the first 10 min: Each team has to collect and record data for a few insects they collect on blank paper (e.g. a journal page). For the second 10 minutes, each team repeats this process but now is given a generic data sheet to fill in. The collecting focus is insects on plants. | Andrew Short & Grant Godden | 
| 130 - 330 | Breakout Group 2: Activity (60min): Collecting media in the field. Audio and video recordings, as well as photographs, of animals in nature are increasingly becoming important sources of data for biodiversity studies, yet there are few standards for how these should be collected in the field, the sorts of metadata that should be included, and how to preserve and make them accessible to the research community. In this activity we will demonstrate and discuss basic techniques for collecting biodiversity media and metadata in the field, as well as techniques that are being developed to deposit those data quickly and easily in a secure archive. | Mike Webster | 
| 245 - 315 | Break between breakout group exercises | Everyone | 
| 330 - 400 | Group Photo! Travel back to Classroom and begin discussion and debriefing from Field experience. Discussions will run into the morning of day 2. | Everyone | 
| 400 - 430 | Review of field apps with students. Which worked and which didn’t? How would students imagine applying these applications in the field. | Emilio Bruna | 
| 430 - 500 | Recap and homework (videos) for tomorrow, and further presentations and discussion. | Katja Seltmann | 
| 6:00 | Dinner on your own. | Potential to have dinners together if desired. | 
| Course Overview - Day 2, Tuesday March 10th | ||
| 8:30-9:00 | Check in, answer questions | All, Deb Paul | 
| 900 - 940 | Fossil field collection and field site 3D reconstruction including present paleo databases and standards. | Justin Woods | 
| 940 - 1000 | Efficient workflow from collection to cataloging for marine invertebrates. | François Michonneau | 
| 1000 - 1020 | Discussion of template field exercise. | Andrew Short & Grant Godden | 
| 1020 - 1100 | General Discussion: General issues in field data collection to data synthesis. Describe common problems with field data sources and impacts of these problems. | All, Katja Seltmann | 
| 1100-1120 | Break | All | 
| 1120-1200 | Reproducible Research | Derek | 
| 12:30-1:30 | Lunch | (on your own) | 
| 1:30-5:00 | Getting started with R | François Michonneau (Lead) | 
| 5:00-5:30 | Review / Homework? / Preview of tomorrow | |
| Course Overview - Day 3, Wednesday March 11th | ||
| 8:30-9:00 | Check in, answer questions | All, Deb Paul | 
| 9:00-9:20 | Review of new specimen data set for today's R lesson: identify issues, errors. iDigBio R for Data Processing unzip this to open HTML file of iDigBio R for Data Processing Lesson Steps | Derek Masaki (Lead) | 
| 9:20-10:20 | Data exploration using R. Import and display. | Derek Masaki (Lead) | 
| 10:20-12:30 | From raw data to technically correct data. From technically correct data to consistent data. | Derek Masaki (Lead) | 
| 12:30-1:30 | Lunch | on your own | 
| 1:00-1:45 | File output. Writing processed data to file. | Derek Masaki (Lead), François Michonneau | 
| 1:45-2:45 | Review. Work-on-your-own data set. | |
| 2:45-4:45 | Intro to R Markdown OR Break Outs | François Michonneau | 
| 5:00-5:30 | Review / Wrap-up / Preview of tomorrow | |
| Course Overview - Day 4, Thursday March 12th | ||
| 8:30-9:00 | Check in, answer questions | All, Deb Paul | 
| 9:00-12:00 | Using R to access biodiversity APIs 
 Media:2015-03-12-F2DB-Apis.pdf 
 | Francois Michonneau, Matt Collins (Leads) | 
| 12:00-1:00 | Lunch | on your own | 
| 1:00-1:45 | Getting your data out there: publishing & standards with iDigBio | Molly Phillips, Matt Collins (Leads) | 
| 2:30-4:00 | Publishing data on Dryad | Todd Vision, Dryad (http://datadryad.org) (Lead) | 
| 4:00-5:00 | Review, Wrap-up, Survey, Next Steps. | 1 slide lightning talks by participants | 
| Optional Evening Session -- on working with their own data? | ||
Future plans: Scaling it up: Demo using the iPlant Discovery Environment (DE)
Link to Workshop Report
Logistics
- Logistics & Hotel Information (for any out-of-towners)
- Where to find food
- Workshop Calendar Announcement
- F2DB Participant List
Adobe Connect Access
Adobe Connect will be used to provide communication between all present at the workshop.
Remote participants will be able to listen to lecture portions only.
We use Adobe Connect extensively in this workshop. Please perform the systems test using the link below. Also, you will also need to install the Adobe Connect Add-In to participate in the workshop.
Presentation Documents and Links
More Field to Database Workflows
Leading-Edge and Trends in Collecting Methods
People from across the planet joined in to our call to send in more examples of how data gets from the field, into a database.
- Biocode Field Information Management System ppt youtube. A Field Information Management System (FIMS) enables data collection at the source (in the field) by generating spreadsheet templates, validating data, and assigning persistent identifiers for every unique biological sample. The following diagram shows how the system works. The most typical functions are Generating Templates and Validating Data, both of which can be found under the Tools menu.
 Generate a Template
 Validate data
 How FIMS works
- Field Host Collecting Workflow with Arthropod Easy Capture (mp4). A highly efficient and field tested workflow for recording and databasing insects and host plants developed by Randall Schuh (and others) during the Plant Bug Planetary Biodiversity Project. Record collecting events information in great detail, including images and host plant material. The AEC database is open-source, easily installed, and submits data to iDigBio and Discover Life.
- Field to Freezer: Low tech collecting; high quality data. Shelley James, Herbarium Pacificum, Bishop Museum
- From the Field Into Specify: several options. (mp4) Andrew Bentley, Specify, University of Kansas Biodiversity Institute
 Installation Package for Specify
- From the Field Into Symbiota
 Part 1: Field Reach perspective (time: 10:20) – Show how a field research can enter a voucher specimen along with a field image, link voucher to a checklist, and print labels to be distributed with the specimen vouchers.
 Part 2: Curator’s perspective (time: 11:05) – Shows how a curator can import a record from the collector’s data set to their own collection rather than retyping the label data from scratch. Also includes how identification annotations can filter down the network of specimen duplicates to correct a misidentification within the original checklist.
- Digitally archiving localities through the use of their coordinates. Amy Smith, Collections Manager of Earth Sciences, Perot Museum of Nature and Science
 Digitally visualizing and archiving coordinates using KML files
 PDF to accompany video
- Filling Biodiversity Knowledge Gaps (GBIF video) Dr Arturo Ariño discusses potential information gaps that exist between different sources of data, using two case studies the UN Biosphere Reserves in Mexico and Spain.
- ABC Taxa: the Journal Dedicated to Capacity Building in Taxonomy and Collection Management.
 - Volume 8 - Manual on Field Recording Techniques and Protocols for All Taxa Biodiversity Inventories. (2010) Jutta Eymann, Jérôme Degreef, Christoph Häuser, Juan Carlos Monje, Yves Samyn & Didier VandenSpiegel Eds. Field recording techniques in ABC Taxa; beyond traditional collecting and preserving of organismal life (including soil sampling) it includes camera trapping and bio-acoustics as well.
 
 
- Biocode Field Information Management System ppt youtube. A Field Information Management System (FIMS) enables data collection at the source (in the field) by generating spreadsheet templates, validating data, and assigning persistent identifiers for every unique biological sample. The following diagram shows how the system works. The most typical functions are Generating Templates and Validating Data, both of which can be found under the Tools menu.
Biodiversity APIs
- taxize tutorial
- taxize on github
- ridigbio
- Open Tree of Life APIs
- Introduction to the VertNet API
- rgbif on github
- rgbif tutorial
- rgbif: Interface to the Global Biodiversity Information Facility API
Useful Links and Materials
- SQL commands list
- SHELL commands list
- R lesson
- Getting started with Open Refine
- Google's R Style Guide
- make code easier for you, and others, to understand
 
- Rseek
- a search platform just for R resources
 
- R documentation An extensive search interface for R packages
- Download R Studio, newest version for Mac's with new operating systems
- The Software Carpentry Git lesson
- A web-based tutorial to Git
- A web-based tutorial to Markdown
- An introduction to RMarkdown with RStudio
- Using version control (Git) with RStudio
- Here's a download page for RStudio, INCLUDING the newest version for Mac OSX Yosemite (Mac OS X 10.6+ (64-bit)) (which has had some incompatibilities with certain packages)
 
- Data publishing links (from Todd Vision)
- The re3data registry of repositories and the BioSharing registry of policies, databases and standards
- Listing of journals that have adopted the Joint Data Archiving Policy or a similar policy
- Guidance on creating a Data Management Plan for NSF or other agencies and an example plan
- DataONE educational modules
- Best practices in preparing data for archiving
- Borer, ET et al. (2009) Some simple guidelines for effective data management. Bulletin of the ESA 48, 205-214
- Hook, LA et al. (2010) Best Practices for Preparing Environmental Data Sets to Share and Archive
- Penev L et al. (2011) Pensoft Data Publishing Policies and Guidelines for Biodiversity Data
- Whitlock MC (2010) Data archiving in ecology and evolution: best practices Trends in Ecology & Evolution 26, 61-65.
 
- Dryad FAQ and API (some functionality, but still under development)
- Examples of articles w/ data packages in Dryad
- from Biodiversity Data Journal
- from NPG Scientific Data
 
- Other examples
- Manatee paper - not meant as an examplar of data availability!
- Heliconia data package in Dryad
 
 
Workshop Recordings
Day 1
- 9:00am-10:00am http://idigbio.adobeconnect.com/p7nh5z5qljf/
- 10:15am-11:30pm http://idigbio.adobeconnect.com/p60a38m2qhy/
- 4:00pm-5:00pm http://idigbio.adobeconnect.com/p88qtlumpjb/
Day2
- 9:00am-11:00am http://idigbio.adobeconnect.com/p2p6ezjdwdo/
- 11:30am-12:30pm http://idigbio.adobeconnect.com/p7woy8hro5x/
- 1:30pm-2:30pm http://idigbio.adobeconnect.com/p96rvexycsl/
- 1:45-5:00pm http://idigbio.adobeconnect.com/p5l1dc47t1p/
Day3
- 9:00-12:30 http://idigbio.adobeconnect.com/p21s71147nh/
- 1:30-3:30 http://idigbio.adobeconnect.com/p6ipnr4eh4v/
- 3:45-5 http://idigbio.adobeconnect.com/p8fha4j15ex/
Day4
- 9:00-10:30 http://idigbio.adobeconnect.com/p9b106642l6/
- 11:15-12:15 http://idigbio.adobeconnect.com/p285w4uu5xr/
- 1:30-2:30 http://idigbio.adobeconnect.com/p30irmqksq8/
- 2:30-5:00 http://idigbio.adobeconnect.com/p7kabi2d68f/
Related Workshop Resources and Links
- Data Carpentry Materials on GitHub
- Ten Simple Rules for the Care and Feeding of Scientific Data. Goodman et al
- Code and Data for the Social Sciences: A Practitioner's Guide. Matthew Gentzkow, Jesse M. Shapiro Chicago Booth and NBER March 10,2014
- Nine simple ways to make it easier to (re)use your data. White et al.
- You want to learn SQL independently? Try Head First SQL
- Head First Excel, O'Reilly
- Check out DataONE
- They've got a great Software Tools Catalog
 
- Put standard metatdata with your data. Wondering how to do that? Check out DataONE's Morpho Tool available under the tools menu at https://knb.ecoinformatics.org/.
- Why? Makes your data re-useable, and better still, makes your data discoverable. Get cited for your datasets in addition to your published papers!
 
- Making Sense of Data Free online course at Google.
- " Do you work with surveys, demographic information, evaluation data, test scores or observation data? What questions are you looking to answer, and what story are you trying to tell with your data? This self-paced, online course is intended for anyone who wants to learn more about how to structure, visualize, and manipulate data. This includes students, educators, researchers, journalists, and small business owners."
 
- Using Open Refine? Want to compare your taxon names against a standard list? Try this reconciliation service.
- Read Gaurav ' Blog post first: http://gbif.blogspot.com/2013/07/validating-scientific-names-with.html
- Then, give it a try. The google plus Open Refine community will help you figure it out (it's not hard).
- Or use BioVel's extension for checking taxon names and atomizing them.
- Install Open Refine (http://openrefine.org), then add the Open Refine extension developed by BioVel.
 
 
Links from You
- How about you? Got a favorite resource - a book?, a website? to share with your classmates?
- Data Science at the Command Line
- Free Training Resources for UF students, faculty, and staff UF provides free access to over 2600 online training courses through Lynda.com. Does your institution have similar free training opportunities?
- Very easy-to-use map making service: cartodb.com
Related Blog Posts and Photos
- Inaugural Data Carpentry Workshop by Tracy K. Teal
- Our First Data Carpentry Workshop by Karen Cranston
- Tales from the First Data Carpentry Workshop by Deb Paul, May 2014
- Data Carpentry, Please can we have some more?! by Deb Paul, 15 Oct 2014
- Data Carpentry Facebook Photo Album
