Resources for using OpenRefine: Difference between revisions
No edit summary |
|||
Line 17: | Line 17: | ||
Download OpenRefine from https://openrefine.org. | Download OpenRefine from https://openrefine.org. | ||
= | =Basic tutorials= | ||
See links below for our recommended tutorials on how to use OpenRefine. OpenRefine itself maintains a more comprehensive list of externally produced tutorials [https://github.com/OpenRefine/OpenRefine/wiki/External-Resources here], and searching on [https://www.youtube.com/results?search_query=openrefine YouTube] and [https://vimeo.com/search?q=openrefine Vimeo] will also lead to many relevant videos. | |||
* Data Carpentry lessons: [https://data-lessons.github.io/OpenRefine-nhcdata-lesson/ OpenRefine for Natural History Collection Data] and [https://datacarpentry.org/OpenRefine-ecology-lesson/ Data Cleaning with OpenRefine for Ecologists] | |||
* Library Carpentry lesson: [https://librarycarpentry.org/lc-open-refine/ OpenRefine] | |||
* [http://bit.ly/BITW13_OpenRefine OpenRefine Walk-through], step-by-step orientation by Javier Otegui using natural history museum data as a subject | |||
* [https://www.youtube.com/watch?v=wGVtycv3SS0 Clean Your Data: Getting Started with OpenRefine], a workshop recording produced by the University of Idaho Library Digital Initiatives (2017-02-15) | |||
* Handouts created for use during the 2019 VRA Annual Conference workshop, ''Clean, Transform and Enhance Your Data'': [https://docs.google.com/document/d/1Z863T411TKd1FnmKrbEAERCPHNzxj4enscjTe3OnfgM/edit?usp=sharing Download and Install OpenRefine] and [https://docs.google.com/document/d/1fH_kqo5QtrovLk63uRf4ixScMMy-jO5IikrCOeZl6JM/edit?usp=sharing Getting Started with OpenRefine] | |||
* [https://www.youtube.com/watch?v=6DIsErw8noM Data Cleaning with OpenRefine], and online short seminar organized by the Harvard Library (2020-06-25) | |||
=Scripting= | |||
value+"yourtexthere" | value+"yourtexthere" | ||
Line 41: | Line 52: | ||
=Join the community= | =Join the community= | ||
There are many audiences for OpenRefine, and the best community to join is one that aligns with your usage context and skill level. The [https://bit.ly/3kqiGIR OpenRefine Google Group] is maintained by OpenRefine, and most messages posted are more technical. | There are many audiences for OpenRefine, and the best community to join is one that aligns with your usage context and skill level. The [https://bit.ly/3kqiGIR OpenRefine Google Group] is maintained by OpenRefine, and most messages posted are more technical. | ||
Revision as of 14:19, 26 May 2022
Why use OpenRefine?
OpenRefine is an open-source tool for manipulating small or large datasets in numerous formats (CSV, JSON, XML, etc.). Because of its low barrier to entry with no prior programming knowledge needed, OpenRefine is an excellent tool to for the improvement and maintenance of data integrity for best practices in collections management. Data transformations are reversible and repeatable, and original data are locally preserved. The learning curve for OpenRefine is moderate, with a large community of users and shared knowledge base for help. You can use the resources on this wiki page as a starting point!
When to use OpenRefine
- For quality control, e.g. to clean recent data entry prior to (or after) database ingestion, or to clean legacy data.
- For combining and manipulating existing datasets, e.g. to transform or integrate your data with external resources like those in a taxonomic authority or Wikidata.
When not to use OpenRefine
- For adding new records individually to an existing dataset, e.g. when transcribing specimen labels.
- For text-heavy one-off data entry, e.g. when typing a sentence in a notes field associated with each row.
- For projects with multiple users on separate computers.
Getting started
Download OpenRefine from https://openrefine.org.
Basic tutorials
See links below for our recommended tutorials on how to use OpenRefine. OpenRefine itself maintains a more comprehensive list of externally produced tutorials here, and searching on YouTube and Vimeo will also lead to many relevant videos.
- Data Carpentry lessons: OpenRefine for Natural History Collection Data and Data Cleaning with OpenRefine for Ecologists
- Library Carpentry lesson: OpenRefine
- OpenRefine Walk-through, step-by-step orientation by Javier Otegui using natural history museum data as a subject
- Clean Your Data: Getting Started with OpenRefine, a workshop recording produced by the University of Idaho Library Digital Initiatives (2017-02-15)
- Handouts created for use during the 2019 VRA Annual Conference workshop, Clean, Transform and Enhance Your Data: Download and Install OpenRefine and Getting Started with OpenRefine
- Data Cleaning with OpenRefine, and online short seminar organized by the Harvard Library (2020-06-25)
Scripting
value+"yourtexthere"
value.toDate().toString('YYYY-MM-dd')
value.replace(/\s+/,' ')
value.trim() Trims trailing/leading whitespaces on all cells
value.replace("EXISTING-VALUE","NEW-VALUE")
cells["COLUMN-1"].value[0] == cells["COLUMN-2"].value[0]
cell.cross("TABLE-2", "COLUMN-TO-MATCH-ON")[0].cells["COLUMN-TO-GET-VALUE-FROM"].value
forEach(row.record.cells['COLUMN'].value,v,v).uniques().length()
forEach(value.parseJson().results[0].TARGET,x,[x.types[0], x.TARGET].join("::")).join("|")
https://maps.googleapis.com/maps/api/geocode/json?latlng="+value+"&key=KEY
Join the community
There are many audiences for OpenRefine, and the best community to join is one that aligns with your usage context and skill level. The OpenRefine Google Group is maintained by OpenRefine, and most messages posted are more technical.