Welcome to the iConference 2013 iDigBio AOCR Wiki

Short URL to this iConference 2013 wiki http://tinyurl.com/aocriConference2013
Note: This wiki page undergoing frequent updates and some participants have wiki edit permissions and will add to / update / edit these pages before, during and after iConference 2013.
AOCR Working Group Wiki
2013 AOCR Hackathon Wiki
AOCR October 2012 Working Group Meeting Presentations

Links to Logistics, Communication, and Participant Information

Participant List
Participant Related Projects
Travel, Food, Lodging, Connectivity Logistics
2013 Hackathon Listserv, a mailing list for Hackathon Participants at aocr-hackathon-l@lists.ufl.edu

iConference 2013 Participation

Panel Workshop

Integrated Digitized Biodiversity Collections, iDigBio, is an initiative funded under the National Science Foundation's (NSF) Advancing Digitization of Biological Collections (ADBC) program set up to help natural history museums get specimen data for hundreds of millions of specimens out of drawers, off of labels, out of field notebooks, out of old publications and into integrated databases for everyone's use. The iDigBio Augmenting OCR Working Group needs your wisdom, knowledge and collaboration as part of our multi-faceted approach to improve OCR strategies and natural language processing (NLP) algorithms used in digitization. Our workshop panelists, five members of our working group, are eager to introduce the iSchools community to our challenges and get your input in our break-out sessions. Our research areas of interest include: image segmentation, autocorrection of typographical errors, semantic autocorrection, autonormalization, automated text segmentation, generating consensus records and user interfaces for these tasks. We seek your insights, collective experiences and partnership in order to find ways to improve the digitization process to create a national searchable online specimen-based data set that is fit-for-use by scientists and the public. Some ideas generated in this session may be implemented at the iDigBio hackathon being held at the Botanical Research Institute of Texas (BRIT) during the iConference.

Poster
Notes (short paper)
Alternative Event

Overview of the related Hackathon Challenge

2013 iDigBio AOCR Hackathon Challenge
- overall description of The Challenge
- The Specific Task: parse OCR output to find values for these 2013 hackathon data elements
- Metrics and Evaluation to be used
- Three Data Sets
  - There are three data sets, that is, three different sets of images of museum specimen labels. Participants, working alone or in groups, may work on one or more data sets as they choose. The sets have been ranked, easy, medium, hard, as an estimate of how difficult it might be to successfully get good parsed data from the OCR output from each data set.
- Accessing the Data

IConference 2013 iDigBio AOCR WG Wiki

Contents

Welcome to the iConference 2013 iDigBio AOCR Wiki

Links to Logistics, Communication, and Participant Information

iConference 2013 Participation

Overview of the related Hackathon Challenge

Navigation menu

IConference 2013 iDigBio AOCR WG Wiki

Welcome to the iConference 2013 iDigBio AOCR Wiki

Links to Logistics, Communication, and Participant Information

iConference 2013 Participation

Overview of the related Hackathon Challenge

Navigation menu

Search