Symbiota Data Quality Toolkit: Difference between revisions

From iDigBio
Jump to navigation Jump to search
(Replaced content with "Symbiota has created a Data Quality Toolkit on their [https://biokic.github.io/symbiota-docs/editor/quality/ Documentation Site].")
Tag: Replaced
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[Category:Data Quality]]
Symbiota has created a Data Quality Toolkit on their [https://biokic.github.io/symbiota-docs/editor/quality/ Documentation Site].
[[Category:Workshop]]
 
= Overview  =
 
This toolkit contains Symbiota-specific resources for the [[Data Quality Toolkit 2024]].
 
== Catalog Numbers and Other Identifiers==
 
=== Duplicate Catalog Numbers ===
 
'''Problem:''' The same catalog number is used multiple times within your dataset. (This problem may or may not be intentional, depending on your collection's policies. It is generally best to not duplicate catalog numbers, when possible).
 
'''Solution:''' Symbiota includes a built-in tool for identifying and resolving duplicate catalog numbers, described [https://biokic.github.io/symbiota-docs/coll_manager/data_cleaning/dupes/ in this tutorial]. This tool shows a list of all duplicates and allows the collection administrator to merge records if necessary.
 
== Dates ==
 
=== Identified Date Earlier than Collected Date ===
 
'''Problem:''' The date the specimen was identified (dateIdentified field) is earlier than the date the specimen was collected (eventDate).
 
'''How to FIND this Problem in Your Dataset:'''
 
'''How to FIX this Problem in your Dataset:'''
 
== Geography ==
 
=== Improperly Negated Latitudes/Longitudes ===
'''Problem:''' The sign of the latitude (decimalLatitude) or longitude (decimalLongitude) does not match the sign/hemisphere of the given country. For example, all longitudes in the U.S. should be negative.
 
'''How to FIND this Problem in Your Dataset:'''
 
'''How to FIX this Problem in your Dataset:'''
 
=== Missing Latitudes/Longitudes ===
'''Problem:''' A record has a latitude value, but not a longitude value.
 
'''How to FIND this Problem in Your Dataset:'''
 
Use the [https://biokic.github.io/symbiota-docs/editor/edit/ Record Search form]. For Custom Field 1, select Decimal Latitude IS NULL. For Custom Field 2, select Decimal Longitude IS NOT NULL. Then conduct a similar search with Decimal Latitude IS NOT NULL and Decimal Longitude IS NULL.
 
'''How to FIX this Problem in your Dataset:'''
 
No batch fixing possible. You will need to review the records and either add lat/long values or remove the orphaned lat/long values.
 
=== Misspelled Geographic Unit Names ===
'''Problem:''' The geographic units (e.g., country, state, county) are misspelled, resulting in poor matching of geographic unit names to existing geographic lists.
 
'''How to FIND this Problem in Your Dataset:'''
 
Use the [https://biokic.github.io/symbiota-docs/coll_manager/data_cleaning/geography/ Geography Cleaning Tools]
 
'''How to FIX this Problem in your Dataset:'''
 
Use the [https://biokic.github.io/symbiota-docs/coll_manager/data_cleaning/geography/ Geography Cleaning Tools]
 
== Taxonomy ==
 
=== Misspelled Taxonomic Names ===
'''Problem:''' Scientific names are misspelled, resulting in poor matching of taxonomic names to taxonomic databases.
 
'''How to FIND this Problem in Your Dataset:'''
 
Use the [https://biokic.github.io/symbiota-docs/coll_manager/data_cleaning/taxonomy/ Taxonomic Cleaning Tool]
 
'''How to FIX this Problem in your Dataset:'''
 
Use the [https://biokic.github.io/symbiota-docs/coll_manager/data_cleaning/taxonomy/ Taxonomic Cleaning Tool]

Latest revision as of 13:34, 21 March 2024

Symbiota has created a Data Quality Toolkit on their Documentation Site.