Transcription Hackathon Reconciliation of Replicates Planning
Jump to navigation
Jump to search
We worked on tools to help with reconciling and interpreting crowd-sourced data. One possible workflow might go like this:
Start with crowd-sourced transcriptions.
→ reconcile ( → filter out irreconcilables?)
if locality:
→ place name matching
→ geocoding
if names:
→ name splitting
→ name list lookup
Reconciliation: Range of approaches:
- Get a super-user to finalize / approve transcriptions, instead of trying to resolve multiple submissions
- Or, given multiple transcriptions, pick one which minimizes some edit distance.
- Or, use sequence alignment tools to find the best transcription of subregions in a larger string. (GitHub code does this.)
Locality: Again, a range, but probably want to try to clean up the transcribed string before going to geocoding service.
Names: Processing will depend on target database structure: Maybe you just want one string, or maybe you want to try to separate names. If the names are separated, they could be compared/linked to an outside list of collectors. (... and that could be part of a larger QA process: Does the collection date make sense, given the life span of the collector?) (GitHub code tries to do this.)
Older documents
Back to Transcription_Hackathon