Text Transcription Issues: Difference between revisions

From iDigBio
Jump to navigation Jump to search
m (Created page with " == About Standards for Transcribing Text == In our last meeting we discussed some of the challenges of transcribing text with corrections, alterations, strikeouts, ambiguous let...")
 
No edit summary
 
(13 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== About Standards for Transcribing Text  ==
<br>
*Content here begins with resources put together by Jason Best (thank you Jason) in an email sent to the AOCR wg on 19 December 2012.


== About Standards for Transcribing Text ==
*In our last meeting (18 Dec 2012) we discussed some of the challenges of transcribing text with corrections, alterations, strikeouts, ambiguous letters, etc and I [Jason Best] briefly mentioned some transcription projects that have dealt with similar issues. A hackathon participant, Ben Brumfeld, has much more experience in this topic so first I'll point you to some information he has compiled. His blog home page (http://manuscripttranscription.blogspot.com) currently has a transcription of his talk about the variety of formats that various projects are using. A worthwhile read.
In our last meeting we discussed some of the challenges of transcribing text with corrections, alterations, strikeouts, ambiguous letters, etc and I briefly mentioned some transcription projects that have dealt with similar issues. A hackathon participant, Ben Brumfeld, has much more experience in this topic so first I'll point you to some information he has compiled. His blog home page (http://manuscripttranscription.blogspot.com) currently has a transcription of his talk about the variety of formats that various projects are using. A worthwhile read.


If we decide to try to transcribe or preserve ambiguous or corrected/struckout characters, then the Text Encoding Initiative format might be a good start, though it would require the use of XML elements in brackets. A more lightweight approach might be to utilize some of the wiki markup formats like Markdown (http://daringfireball.net/projects/markdown/syntax) or Textile (http://txstyle.org).
*If we decide to try to transcribe or preserve ambiguous or corrected/struckout characters, then the Text Encoding Initiative format might be a good start, though it would require the use of XML elements in brackets. A more lightweight approach might be to utilize some of the wiki markup formats like:
**Markdown (http://daringfireball.net/projects/markdown/syntax) or  
**Textile (http://txstyle.org).


Below I've listed some projects that either establish transcription or markup standards or have published guidelines or suggestions about how to transcribe text.
*Below I've listed some projects that either establish transcription or markup standards or have published guidelines or suggestions about how to transcribe text.
TEI elements for representing primary documents (in particular, errors, corrections, alterations, ambiguity, etc) - http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHCH
**TEI elements for representing primary documents (in particular, errors, corrections, alterations, ambiguity, etc) - http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHCH  
FreeReg, register transcription - http://www.freereg.org.uk/howto/transcribe.htm
**FreeReg, register transcription - http://www.freereg.org.uk/howto/transcribe.htm  
Transcribe Bentham Guidelines (seems to be based on TEI) - http://www.transcribe-bentham.da.ulcc.ac.uk/td/Help:Transcription_Guidelines
**Transcribe Bentham Guidelines (seems to be based on TEI) - http://www.transcribe-bentham.da.ulcc.ac.uk/td/Help:Transcription_Guidelines
New York Public Library Menu transcription guidelines - http://menus.nypl.org/help
**New York Public Library Menu transcription guidelines - http://menus.nypl.org/help
National Archives Transcription tips - http://transcribe.archives.gov/tips
**National Archives Transcription tips - http://transcribe.archives.gov/tips
**Leiden+ notation used by classicists for marking damage and unclear readings in Greek papyrus standards - http://papyri.info/editor/documentation?docotype=text (In use since the mid-1930s, updated and translated to TEI by the Integrating Digital Papyrology group.)


Projects that might have additional approaches to transcription
*Projects that might have additional approaches to transcription
http://scripto.org
**http://scripto.org http://www.uscript.org
http://www.uscript.org
**http://transcriptorium.eu http://t-pen.org
http://transcriptorium.eu
 
http://t-pen.org
Back to the [[2013 AOCR Hackathon Wiki]]

Latest revision as of 16:31, 17 January 2013

About Standards for Transcribing Text


  • Content here begins with resources put together by Jason Best (thank you Jason) in an email sent to the AOCR wg on 19 December 2012.
  • In our last meeting (18 Dec 2012) we discussed some of the challenges of transcribing text with corrections, alterations, strikeouts, ambiguous letters, etc and I [Jason Best] briefly mentioned some transcription projects that have dealt with similar issues. A hackathon participant, Ben Brumfeld, has much more experience in this topic so first I'll point you to some information he has compiled. His blog home page (http://manuscripttranscription.blogspot.com) currently has a transcription of his talk about the variety of formats that various projects are using. A worthwhile read.
  • If we decide to try to transcribe or preserve ambiguous or corrected/struckout characters, then the Text Encoding Initiative format might be a good start, though it would require the use of XML elements in brackets. A more lightweight approach might be to utilize some of the wiki markup formats like:

Back to the 2013 AOCR Hackathon Wiki