Georeferencing for Research Use
Post Workshop Publication
Organizers and participants co-wrote a summation from this workshop of lessons learned and key observations and published these results as
- Seltmann K, Lafia S, Paul D, James S, Bloom D, Rios N, Ellis S, Farrell U, Utrup J, Yost M, Davis E, Emery R, Motz G, Kimmig J, Shirey V, Sandall E, Park D, Tyrrell C, Thackurdeen R, Collins M, O'Leary V, Prestridge H, Evelyn C, Nyberg B (2018) Georeferencing for Research Use (GRU): An integrated geospatial training paradigm for biocollections researchers and data providers. Research Ideas and Outcomes 4: e32449. https://doi.org/10.3897/rio.4.e32449
iDigBio - CCBER GWG Georeferencing for Research Use, a short course
| Georeferencing for Research Use, a short course | |
|---|---|
| Quick Links for GWG Second Train the Trainers Workshop | |
| Georeferencing for Research Use - link to agenda | |
| Biblio entries | |
| Georeferencing for Research Use, short course report | |
October 4 - 7, 2016 at (https://www.nceas.ucsb.edu/) NCEAS, Santa Barbara California
We welcome you to this short course, with a focus on research use of georeferenced natural history collections data. We will include activities and discussions about best practices and tools for georeferencing, capturing locality data in the field, and using georeferenced specimen locality data in research. Attendees must have a basic level of experience with georeferencing techniques and tools and be researchers or directly involved with researchers.
After the workshop, we will encourage our participants to share use cases, any training materials developed, and to offer workshops, webinars, talks, or other events aimed at increasing use of best practices for georeferencing legacy locality data, best practices for capturing the locality data from future biological and paleontological collecting and sampling events, and best practices for using the data in research.
Some anticipated course content includes discussion and activities about georeferencing integration, georeferenced data visualization, and georeferences for modeling and research.
Logistics:
- Hotel and NCEAS Map
- NCEAS is 3rd floor of the Balboa Building, 735 State Street
- Local restaurant list
Course Instructor List
(in alphabetical order) David Bloom, Matt Collins, Una Farrell, Shelley James, Sara Lafia, Deborah Paul, Marcy Revelez, Nelson Rios, Katja Seltmann, Jessica Utrup, Mike Yost
Bring your Datasets and Laptops:
Participants are strongly encouraged to bring representative datasets from their collections or research that need georeferencing to expose everyone to the variety of locality data georeferencing issues and give the experts and participants a chance to work together to address any challenges.
Participants must bring their own laptops and everyone will have wired access to facilitate the best possible workshop experience.
Reading Materials and Resources:
- Georeferencing.org
- Georeferencing Quick Reference Guide
 version 2012-10-08. John Wieczorek, David Bloom, Heather Constable, Janet Fang, Michelle Koo, Carol Spencer, Kristina Yamamoto
- Guide to Best Practices for Georeferencing - Chapman, A.D. and J. Wieczorek (eds). 2006
- Georeferencing Working Group Training Videos
- Georeferencing Incidents from Locality Descriptions and its Applications: a Case Study from Yosemite National Park Search and Rescue Transactions in GIS, 2011, 15(6): 775–793 Authors: Doherty, Guo, Liu, Wieczorek, Doke
- iDigBio Georeferencing Wiki http://tinyurl.com/idbgeowiki
- HerpNET Georeferencing Resources
- Take Workshop Notes Together Here
- Post - Workshop Survey Questions
- Got a Georeferencing Question? Post it on the iDigBio Georeferencing List Serve
- BITC Global Online Seminar #25: Simple Workflow for Data Cleaning
Wireless / Wired Access Issues:
Both wired and wireless access provided to workshop participants. Connectivity instructions will be provided at the workshop.
Goals of the Workshop:
- Best practices for researchers for in-the-field creating of new locality data and legacy data georeferencing.
- Tools (hardware and software) and standards (what to document, datum etc.).
- How to re-patriate data and/or best practices for putting data into data repository if can’t be repatriated (what the obstacles are and minimization of data loss).
 
- How to evaluate already georeferenced data. Current tools for visualization and evaluation.
- Metrics to look for
- Current tools for georeferencing
- Online tools
- R
- QGIS
 
- Researchers give input on the challenges for georeferencing, using existing georeferences.
- Workflow review for some research review of using georeferenced data (Katja, Shelley, ...)
Ultimate goal: Participant can point to aspects they have learned (tool, standard etc.) during the workshop and can indicate how they will use those aspects for their research goal/purpose (present or future).
Workshop Objectives:
Topics to be covered
Pre-workshop materials
- Introductory information about datums, mapping, coordinate systems
- Basic georeferencing how-to
During workshop
- Data standards, DwC terminology and fields (e.g. lat, long, datum), differences among disciplines (neo- and paleontological fields)
- Georeferencing toolkit and workflow examples (GEOLocate, maps, other resources, pros and cons)
- Best practices for field collection of data (locality strings and GPS units, precision, datum) 
- How best to record and store georeferencing notes and other data sources (database/CMS dependant)
- Best practices for georeferencing of legacy data given:
 - Varied research requirements for accuracy and precision
- Project and collection management limitations
- Uncertainty data - polygon vs. point radius, description and metadata, etc.
- Datum - georectify to a standard versus verbatim
 
- Workflows for incorporating data into different collections databases
- Best practice syntax in locality descriptions for use in automation vs verbatim strings
- Database limitations
- Multiple geopoint values and storage (verbatim, automated-non-vetted value, nearest named place, update to more accurate value, etc.)
 
- Downloading datasets - sources, different mechanisms
- Assessing data quality
- Uncertainty data - availability in data sources and interpretation
 
- Tools for aggregating, cleaning, visualizing and analyzing data
- R, QGIS, OpenRefine
- Creating maps
- Spatial analyses
- Automated, online tools and applications using geospatial data (e.g. LifeMapper)
 
- Difficult cases, such as geopolitically fluid locations over time, offshore localities
- Hands-on practice & case studies
Schedule of Events - Agenda
Breakfast, Lunch and Dinner every day is on our own (not provided).
Day 1, Tuesday October 4th
| Time | Activity | Presenter | 
|---|---|---|
| 8:45 | Pick up Name Tags, Wireless Log-In, Wired Setup, Collaborative Notes (google doc) | |
| 9:00 | Welcome by NCEAS host, Logistics, Trainer Introductions, Introduction to iDigBio, CCBER | Katja Seltmann - CCBER, Debbie Paul - iDigBio, Ben Halpern - Director NCEAS, Ginger Gillquist - Logistics NCEAS | 
| 9:20 | From the participants and instructors: a quick informal survey Quick Name/Rank/Serial# introductions 
 | Deb Paul | 
| 10:00 | Standards, Terms & Fields: Darwin Core Standard, Key Terminology | David Bloom, Shelley James | 
| 10:15 | Georeferencing Quick Reference Guide, and Georeferencing Template | Una Farrell | 
| 10:30 | Coffee Klatch w/ NCEAS | |
| 11:15 | Locality Types | Una Farrell | 
| 11:45 | Georeferencing Calculator, Calculator Manual | David Bloom | 
| 12:10 | Lunch | |
| 13:10 | Georeferencing Calculator Example and Exercises, MaNIS/HerpNET/ORNIS Georeferencing Guidelines | David Bloom | 
| 13:40 | Internet Resources - Where to Begin? georeferencing.org | Una Farrell | 
| 14:40 | Break | |
| 15:10 | Exercises cont. | |
| 15:30 | GEOLocate: Overview, Basics & Demos GEOLocate Introduction | Nelson Rios | 
| 17:00 | Day in Review Trivia Question of the Day Survey (15 min) | |
| 17:30 | End | 
Dinner on our own - See list of local restaurants. Optional Evening Activity: Happy hour and joyful GeoGathering at Hoffmann Brat Haus
Day 2, Wednesday October 5th
| Time | Activity | Presenter | 
|---|---|---|
| 8:50 | Please complete Survey for Day 1! | |
| 9:00 | Two! Trivia Questions Review and Questions Software Installs check for tomorrow | All | 
| 9:10 | GEOLocate: Advanced Features, Collaborative Georeferencing and the GEOLocate API | Nelson Rios | 
| 10:00 | Importance of Polygons | Mike Yost, Nelson Rios | 
| 10:30 | Break | |
| 11:00 | GPS Units and APPs: Exercise Introduction | David Bloom, Mike Yost, Shelley James, Katja Seltmann | 
| 11:15 | GPS Exercises (continued outside) | All | 
| 12:15 | Lunch | |
| 13:15 | GPS Exercises (continued outside) Please upload your GPS Data here | All | 
| 13:30 | Good and Bad Localities, Field Locality Handout: MVZ and iDigBio GWG Guide for Recording Localities in Field Notes, Field Information Management Systems (FIMS) Paper maps | David Bloom | 
| 14:15 | Georeferencing Workflows: presentations and discussion Researcher and Collections perspectives: Producers and Consumers 
 
 | All | 
| 15:15 | Break | |
| 15:45 | Online Exercises, Review of known answers | |
| 16:30 | GPS Exercise - Review (.kmz), Summary Spreadsheet, Field Worksheet, Locality Descriptions 
 | David Bloom, Jessica Utrup | 
| 16:45 | Day in Review Download dataset for tomorrow | |
| 17:15 | Survey (15 min) | |
| 17:30 | End | 
Dinner on our own - See list of local restaurants. Optional Evening Activities: TBA
Day 3, Thursday October 6th
Download zipped dataset The parameters for this dataset are specimens in the family Carabidae, that have geocoordinates, and are in California.  It results in about 25,000 records in total.
Recording Day 3
| Time | Activity | Presenter | 
|---|---|---|
| 9:00 | Review and Questions | All | 
| 9:05 | Georeferencing for Research Use Workshop - iDigBio Datasets 
 filter and get the dataset 
 | Matthew Collins (remote), Katja Seltmann, Shelley James | 
| 10:00 | Data Quality: How to evaluate existing georeferenced data/Fitness for Use 
 | Katja Seltmann, Shelley James | 
| 10:30 | Break | |
| 11:00 | Cleaning Datasets: Spreadsheets, Open Refine, tracking your work | Deb Paul, Nelson Rios, Katja Seltmann | 
| 12:00 | Lunch | |
| 13:00 | Cleaning Datasets: Spreadsheets, Open Refine, tracking your work (2) | Deb Paul, Nelson Rios, Katja Seltmann | 
| 13:30 | Visualizing datasets: Set up QGIS and load data 
 Auxiliary datasets: Download any additional datasets of interest. Online Tutorial | Sara Lafia | 
| 15:00 | Break | |
| 15:30 | Visualizing datasets: Preview and explore toolkits & saving your maps and data | Sara Lafia | 
| 17:15 | Survey (15 min) | |
| 17:30 | End | 
Dinner: TBD
Day 4, Friday October 7th
Download zipped QGIS project The project to the point we completed on Day 3 is available for download in the same folder as the auxiliary data. Launch the QGIS project from the Tutorial.qgs file. 
Recording Day 4
| Time | Activity | Presenter | 
|---|---|---|
| 9:00 | Questions and Review Share your datasets! [1]: Upload your research datasets that you'd like to work on. | All | 
| 9:10 | Exploring datasets: Aggregating by Regions 
 
 | Sara Lafia, Katja Seltmann, Nelson Rios | 
| 9:50 | Exploring datasets: Time animation 
 | Sara Lafia | 
| 10:30 | Break | |
| 11:00 | Exploring datasets: Uncertainty 
 | Sara Lafia | 
| 11:30 | Exploring datasets: Spatial autocorrelation 
 | Sara Lafia | 
| 12:00 | Lunch on our own. | |
| 13:00 | LifeMapper LIVE DEMO | Jeffrey Cavner, James Beach, et al | 
| 13:15 | Work on own data sets/Open question time/Practice. Polygon practice | Nelson Rios, et al | 
| 13:45 | Breakout sessions Cleaning data using r  | |
| 15:30 | Break | |
| 16:00 | Research Use of the Data. A conversation from the collective point-of-view of the researchers present. Challenges? Experiences? Needs (software, skills, infrastructure)? What changes might you make now to your workflows? | Ed Davis, Katja Seltmann, Shelley James, Nelson Rios, Sara Lafia | 
| 16:30 | Day & workshop in Review iDigBio Webinar On Your Calendar Oct 12th, 2016 - Isn't that Spatial? Post Workshop Survey | |
| 17:30 | Beer | 
Dinner on our own - See list of local restaurants. 
Some software install instructions from Data and Software Carpentry
Requests for the Future
- Scripts/tools for repeated cleaning/analysis
- Using the iDigBio API (API for dummies)
- Inselect (note we provided links for more on this tool - to the workshop participants, see google doc)
- Automated data cleaning - iDigBio and VertNet activities
- What to do with quantified uncertainties & polygons - Jorge Soberon (KU team, others in the fitness for use GBIF working group - see Final Report of the Task Group on GBIF Data Fitness for Use in Distribution Modelling
- QGIS layers - use cases (e.g. elevation)
- Detailed Workflows - for georeferencing, when not to georeference (see iDigBio Georeferencing Working Group - https://www.idigbio.org/wiki/index.php/IDigBio_Working_Groups#Georeferencing_Working_Group_.28GWG.29), cleaning
- Documentation for tutorials
- Standards/possibility for storing multiple georeferences (and other possibilities such as annotations within iDigBio)
- QGIS tutorial as a Software/Data Carpentry format
- QGIS working group
- Geolocate with r webinar (follow on from Symbiota webinar https://www.idigbio.org/content/symbiota-webinar-geolocate-toolkit https://www.idigbio.org/content/coge-collaborative-georeferencing-demo-webinar
Trained Georeferencers
- Map of Participants and Instructors for TTT1 and TTT2
- Wiki for all TTT1 and TTT2 Participants
Pre-Workshop Assignments
- Attend pre-workshop online meeting. Two options, choose one.
- Thursday September 15th - two times to choose from:
- 11am EDT (10am CDT, 9am MDT, 8am PDT)
- 3pm EDT (2pm CDT, 1pm MDT, 12pm PDT)
 
- Sign Up Here: https://goo.gl/forms/WmJO6z79rx5nHlv32
- Meet: http://idigbio.adobeconnect.com/geotrain
 
- Thursday September 15th - two times to choose from:
- Please watch the following videos - before the workshop. (flipped-classroom). Be sure to note any questions / insights to share with the group.
- Collaboration to Automation: https://vimeo.com/53006304 (25 min lecture, 10 min discussion)
- Geographical Concepts: https://vimeo.com/53008556 (4 min lecture, 2 min discussion)
- https://vimeo.com/album/2163673/video/63692461 (4 min lecture only)
 
- Point Radius Method and Best Practices: https://vimeo.com/53006303 (20 min lecture, 5 min discussion)
- OPTIONAL video: BITC Global Online Seminar #25: Simple Workflow for Data Cleaning (1 hour)
 
- Please install the following software
- QGIS and then QGIS Plugins. NOTE it's easy to install all the plugins from inside QGIS once you have it installed. 
- QGIS: http://qgis.org/en/site/forusers/download.html
-  QGIS Plug-ins: Open your QGIS installation on your laptop > navigate to Plugins > Manage and Install Plugins (as seen in the screenshots). You can then add these plugins within QGIS by typing the tool name into the search box and clicking on "Install Plugin": Clipper, Coordinate Capture, GPS Tools, Heatmap, Interpolation, OpenLayers, Processing, TimeManager, and Lifemapper.
- Clipper (clip intersecting vector features)
- Coordinate Capture (find coordinates in various coordinate reference systems (CRS) via mouse-over)
- Gazetteer Search (finding named places via a search bar): NOTE: The Gazetteer Plugin is not "discoverable" through the Plugins manager in QGIS. You'll need to follow the installation steps listed here: https://github.com/AstunTechnology/QGIS-Gazetteer-Plugin#Installation
- Manual
- find where your QGIS is installed on your machine
- right click the folder to see contents and find the folder for Plugins
- for example, on Deb's Windows 10 laptop, the path to the correct QGIS plugins folder is C:\Users\dlpss\.qgis2\python\plugins
 
- make a folder called gazetteersearch inside of the QGIS Plugins directory
- download the contents from GitHub and move them into the gazetteersearch folder
- close and reopen QGIS in order for the plugin to show up
 
- via Git
- clone the repository into your QGIS Plugins folder following the steps from the link above. Please let Sara know if you have any other questions.
 
 
- Manual
- GPS Tools (loading and importing GPS data)
- Heatmap (generate a heatmap raster given input vector points)
- Interpolation (interpolation techniques given vertices of a vector layer)
- OpenLayers (load basemaps from OpenStreetMap, Google, etc.)
- Processing (spatial data processing framework)
- TimeManager (event-visualization animation for vector features)
- Lifemapper: Plugin for Lifemapper webservices for SDM modeling, and multispecies Presence Absence Matrix (PAM) analysis. The tool allows you to build SDM models using GBIF, iDigBio, or user supplied species occurrence data.
 
-  Gazetteer Search requires an additional step; follow these steps to install (manual): 
- find where your QGIS is installed on your machine
- right click the folder to see contents and find the folder for Plugins
- make a folder called gazetteersearch inside of the QGIS Plugins directory
- download the contents from GitHub and move them into the gazetteersearch folder
- close and reopen QGIS in order for the plugin to show up
- OR install via command line (using Git - see instructions in link above)
- clone the repository into your QGIS Plugins folder following the steps from the link above.
 
 
- Open Refine: (previously Google Refine) is a tool for data cleaning that runs through a web browser, and any browser - Safari, Firefox, Chrome, - should work fine (Explorer not recommended).  You will need to download Google Refine and install it, and when you open it, it will run through the browser, but you don't need an internet connection, and the data will all be stored on your computer. (Use these resources Open Refine Install or Install Open Refine for more help if you run into any Open Refine install issues).
- Windows
- Go to the OpenRefine download page.
- Click on Windows kit to download the install file
- To use it, unzip, and double-click on openrefine.exe (if you're having issues with openrefine.exe try refine.bat instead)
- OpenRefine will then open in your web browser.
- If it doesn't open automatically, open a web broswer after you've started the program and go to the URL http://localhost:3333and you should see OpenRefine.
 
- MacOS
- Go to the OpenRefine download page.
- Click on Mac kit to download the install file
- Open the downloaded .dmg file
- Drag the icon in to the Applications folder
- Double click on the icon and Google Refine will then open in your web browser.
- If it doesn't open automatically, open a web broswer after you've started the program and go to the URL http://localhost:3333and you should see OpenRefine.
 
- Linux
- Go to the OpenRefine download page.
- Click on Linux kit to download the install file
- Download and extract
- Type ./refinein your terminal and Google Refine will then open in your web browser.
- If it doesn't open automatically, open a web broswer after you've started the program and go to the URL http://localhost:3333and you should see OpenRefine.
 
 
- Windows
- Spreadsheet software (your choice, Libre Office, Excel, etc.,)
- We'll be using a spreadsheet program. If you already have a spreadsheet program installed, like LibreOffice, Excel or OpenOffice, you can use whatever you already have. If you don't have a spreadsheet program, please download and install LibreOffice from http://www.libreoffice.org/download/libreoffice-fresh/
 
- Java: Please make sure you have Java installed (needed for Open Refine to work).
 
- QGIS and then QGIS Plugins. NOTE it's easy to install all the plugins from inside QGIS once you have it installed. 
- OPTIONAL software install and tutorials - if you are interested in the R breakout section we will offer at the workshop.
- R & RStudio: R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.
- Windows
- Video Tutorial
- Install R by downloading and running this .exe file from CRAN (http://cran.r-project.org/index.html).
- Also, please install the RStudio IDE.
 
- Mac OS X
- Video Tutorial
- Install R by downloading and running this .pkg file from CRAN (http://cran.r-project.org/index.html).
- Also, please install the RStudio IDE.
 
- Linux
- You can download the binary files for your distribution from CRAN. Or you can use your package manager
- e.g. for Debian/Ubuntu run sudo apt-get install r-baseand for Fedora runsudo yum install R.
 
- e.g. for Debian/Ubuntu run 
- Also, please install the RStudio IDE.
 
- You can download the binary files for your distribution from CRAN. Or you can use your package manager
 
- Windows
- Then install packages:
- R Tutorials. OPTIONAL take a short course in R. If you are a novice, take a beginner course. We don't expect you know know R well, but we do need you be familiar enough to follow along with one of our optional hands-on sessions. There are several good options:
- Try R (Code School course)
- Beginner Course: Up and Running with R with Barton Poulson (course at lynda.com)
- Intermediate Course: R Statistics Essential Training with Barton Poulson(course at lynda.com)
- For the future you could take a Coursera class. intro to R(Coursera course started August 22nd).
 
- Georeferencing using Apps: please install either of these on your device, if you want to try georeferencing this way to compare with results from a GPS unit.
 
- R & RStudio: R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.


