SCAI x Datacraft - Image annotation applied to Egyptology: classification of ancient texts by scribe

Research

Apr

2021

12:00

14:00

Paris

Event

Workshop co-hosted by Chloé Ragazzoli, Egyptologist and HDR lecturer at Sorbonne University, and Amir Nakib, Head of IA research & CTO at Vinci Autoroutes, accompanied by Nadiya Shvai, Senior data scientist at Vinci Autoroutes

One of the tasks of Egyptologists is to examine individual writing styles to reconcile different documents from the hand of the same scribe. In order to advance character recognition and decoding of ancient scripts, algorithms based on artificial intelligence could bring new results. This introductory workshop will be followed by a second workshop (6 May) which will implement the most relevant approaches discussed during this workshop.

On the agenda:

Presentation of the dataset
Formulation of the problem
Presentation of the annotation choices
Discussions on pattern detection and classification approaches

Introduction to the topic:
Since 2019, the French Institute of Oriental Archaeology in Cairo and Sorbonne University have been jointly conducting a research programme (ÉCRITURES - Pour une archéologie et une anthropologie des écritures de l'Égypte ancienne) to better understand the uses of the different Egyptian scripts and the actors involved. The texts of everyday life are written in hieratic, a cursive script derived from hieroglyphs.

The tools of palaeography make it possible to compare the shape of the same signs with each other to try to recognise texts that could have been drawn by the same person. But many characteristics must be taken into account: the general shape of the sign, the number of strokes, the size, the dynamism of the writing, the layout, the regularity. The corpus considered comes from the Ramses period (c. 1295-1069 BC), in the Egyptian New Kingdom. It includes ostraca from Deir el-Medina, the village of Pharaoh's craftsmen, and papyri from various scribal libraries of the period. The corpus has been annotated for classification by scribes and for detection based on two frequent signs.

In order to advance character recognition and decoding of ancient scripts, algorithms based on artificial intelligence could bring new results. However, their application poses a number of problems, which are not yet solved in the discipline, namely: learning in the case of a very limited volume of data, designing efficient architectures, limiting overlearning while maintaining high performance, recognising non-alphabetic scripts, the volume of annotations required, as well as clustering.