ObTIC is continuing its digital workshops for the year 2021-2022 with a series of thematic interventions related to digital humanities and new technologies for textual analysis (automatic language processing, machine learning, corpus analysis...).
These introductory workshops will take the form of a lecture, followed by a practical session.
First thematic series:
Transkribus, Kraken, eScriptorium and Tesseract OCR systems
Abstract: Within the framework of projects dedicated to the establishment of scholarly editions of texts under the prism of digital humanities, the exploitation of text digitization tools represents the very first step in the processing chain of a corpus. In this session, we will present three open source optical character recognition (OCR) software: Transkribus, Kraken and eScriptorium, which are already considered as the state of the art in the field of text OCR. In order to understand the specificities of each system, we will use them on an example corpus, and then we will evaluate the quality of the outputs thus produced. To deepen our understanding, we will see how to train a new model for a corpus whose text is poorly recognized.
Workshop on October 28, 2021 (2-5pm):
Introduction and use of OCR Transkribus, Kraken and eScriptorium
Animated by: Ljudmila Petković, ObTIC PhD student
Workshop on November 18, 2021 (2-5pm):
Advanced OCR usage with Tesseract
Animated by: Johanna Cordova, ObTIC engineer
Workshop on November 25, 2021 (2-5pm):
Automatic Correction of OCR Output
Animated by: Ljudmila Petković, Angélique Allaire, ObTIC PhD students
The Observatoire des textes, des idées et des corpus (ObTIC), SCAI's project-team dedicated to digital humanities, draws on an established expertise in the domain of digital editions, data exploration and production for humanities research. ObTIC team members are actively involved in the design and application of software and algorithms for humanities researchers, as well as the development and evaluation of new digital research methodologies across these same fields of inquiry.
ObTIC’S specificity: opening up traditional humanities disciplines to digital methodologies by approaching texts and corpora through transversal concepts independent of any particular textual genre.