Resources for using OpenRefine

From iDigBio
Revision as of 14:19, 26 May 2022 by Ekrimmel (talk | contribs)
Jump to navigation Jump to search

OpenRefine logo color.png

Why use OpenRefine?

OpenRefine is an open-source tool for manipulating small or large datasets in numerous formats (CSV, JSON, XML, etc.). Because of its low barrier to entry with no prior programming knowledge needed, OpenRefine is an excellent tool to for the improvement and maintenance of data integrity for best practices in collections management. Data transformations are reversible and repeatable, and original data are locally preserved. The learning curve for OpenRefine is moderate, with a large community of users and shared knowledge base for help. You can use the resources on this wiki page as a starting point!

When to use OpenRefine

  • For quality control, e.g. to clean recent data entry prior to (or after) database ingestion, or to clean legacy data.
  • For combining and manipulating existing datasets, e.g. to transform or integrate your data with external resources like those in a taxonomic authority or Wikidata.

When not to use OpenRefine

  • For adding new records individually to an existing dataset, e.g. when transcribing specimen labels.
  • For text-heavy one-off data entry, e.g. when typing a sentence in a notes field associated with each row.
  • For projects with multiple users on separate computers.

Getting started

Download OpenRefine from https://openrefine.org.

Basic tutorials

See links below for our recommended tutorials on how to use OpenRefine. OpenRefine itself maintains a more comprehensive list of externally produced tutorials here, and searching on YouTube and Vimeo will also lead to many relevant videos.

Scripting

value+"yourtexthere"

value.toDate().toString('YYYY-MM-dd')

value.replace(/\s+/,' ')

value.trim() Trims trailing/leading whitespaces on all cells

value.replace("EXISTING-VALUE","NEW-VALUE")

cells["COLUMN-1"].value[0] == cells["COLUMN-2"].value[0]

cell.cross("TABLE-2", "COLUMN-TO-MATCH-ON")[0].cells["COLUMN-TO-GET-VALUE-FROM"].value

forEach(row.record.cells['COLUMN'].value,v,v).uniques().length()

forEach(value.parseJson().results[0].TARGET,x,[x.types[0], x.TARGET].join("::")).join("|")

https://maps.googleapis.com/maps/api/geocode/json?latlng="+value+"&key=KEY

Join the community

There are many audiences for OpenRefine, and the best community to join is one that aligns with your usage context and skill level. The OpenRefine Google Group is maintained by OpenRefine, and most messages posted are more technical.