Resources for using OpenRefine: Difference between revisions

From iDigBio
Jump to navigation Jump to search
Line 45: Line 45:
=Tutorials=
=Tutorials=


[https://bit.ly/3KrPR9x Data Carpentry: OpenRefine for Natural History Collection Data]
See links below for our recommended tutorials on how to use OpenRefine. OpenRefine itself maintains a more comprehensive list of externally produced tutorials [https://github.com/OpenRefine/OpenRefine/wiki/External-Resources here], and searching on [https://www.youtube.com/results?search_query=openrefine YouTube] and [https://vimeo.com/search?q=openrefine Vimeo] will also lead to many relevant videos.


[https://bit.ly/3vSrtJ5 Griffith University Library: Data Wrangling Introduction]
* Data Carpentry lessons: [https://data-lessons.github.io/OpenRefine-nhcdata-lesson/ OpenRefine for Natural History Collection Data] and [https://datacarpentry.org/OpenRefine-ecology-lesson/ Data Cleaning with OpenRefine for Ecologists]
 
* Library Carpentry lesson: [https://librarycarpentry.org/lc-open-refine/ OpenRefine]
[https://bit.ly/39kKmfX Library Carpentry]
* [http://bit.ly/BITW13_OpenRefine OpenRefine Walk-through], step-by-step orientation by Javier Otegui using natural history museum data as a subject
 
* [https://www.youtube.com/watch?v=wGVtycv3SS0 Clean Your Data: Getting Started with OpenRefine], a workshop recording produced by the  University of Idaho Library Digital Initiatives (2017-02-15)
[https://bit.ly/3wcBIYX YouTube] & [https://bit.ly/37RoiJz Vimeo]
* Handouts created for use during the 2019 VRA Annual Conference workshop, ''Clean, Transform and Enhance Your Data'': [https://docs.google.com/document/d/1Z863T411TKd1FnmKrbEAERCPHNzxj4enscjTe3OnfgM/edit?usp=sharing Download and Install OpenRefine] and [https://docs.google.com/document/d/1fH_kqo5QtrovLk63uRf4ixScMMy-jO5IikrCOeZl6JM/edit?usp=sharing Getting Started with OpenRefine]
* [https://www.youtube.com/watch?v=6DIsErw8noM Data Cleaning with OpenRefine], and online short seminar organized by the Harvard Library (2020-06-25)

Revision as of 14:17, 26 May 2022

OpenRefine logo color.png

Why use OpenRefine?

OpenRefine is an open-source tool for manipulating small or large datasets in numerous formats (CSV, JSON, XML, etc.). Because of its low barrier to entry with no prior programming knowledge needed, OpenRefine is an excellent tool to for the improvement and maintenance of data integrity for best practices in collections management. Data transformations are reversible and repeatable, and original data are locally preserved. The learning curve for OpenRefine is moderate, with a large community of users and shared knowledge base for help. You can use the resources on this wiki page as a starting point!

When to use OpenRefine

  • For quality control, e.g. to clean recent data entry prior to (or after) database ingestion, or to clean legacy data.
  • For combining and manipulating existing datasets, e.g. to transform or integrate your data with external resources like those in a taxonomic authority or Wikidata.

When not to use OpenRefine

  • For adding new records individually to an existing dataset, e.g. when transcribing specimen labels.
  • For text-heavy one-off data entry, e.g. when typing a sentence in a notes field associated with each row.
  • For projects with multiple users on separate computers.

Getting started

Download OpenRefine from https://openrefine.org.

Handy expressions

value+"yourtexthere"

value.toDate().toString('YYYY-MM-dd')

value.replace(/\s+/,' ')

value.trim() Trims trailing/leading whitespaces on all cells

value.replace("EXISTING-VALUE","NEW-VALUE")

cells["COLUMN-1"].value[0] == cells["COLUMN-2"].value[0]

cell.cross("TABLE-2", "COLUMN-TO-MATCH-ON")[0].cells["COLUMN-TO-GET-VALUE-FROM"].value

forEach(row.record.cells['COLUMN'].value,v,v).uniques().length()

forEach(value.parseJson().results[0].TARGET,x,[x.types[0], x.TARGET].join("::")).join("|")

https://maps.googleapis.com/maps/api/geocode/json?latlng="+value+"&key=KEY

Join the community

There are many audiences for OpenRefine, and the best community to join is one that aligns with your usage context and skill level. The OpenRefine Google Group is maintained by OpenRefine, and most messages posted are more technical.

Tutorials

See links below for our recommended tutorials on how to use OpenRefine. OpenRefine itself maintains a more comprehensive list of externally produced tutorials here, and searching on YouTube and Vimeo will also lead to many relevant videos.