ELA - Eurasian Latin Archive is a platform under construction aimed at hosting an open-access library of Latin and multilingual texts of medieval and early modern age concerning East Asia. The platform includes tools to investigate the documents in their linguistic and semantic aspects. The start-up phase (March 2018-February 2020) has been co-financed by Regione Toscana within DASMeMo (Data-mining e analisi statistica su fonti testuali storiche del periodo medieval e moderno), a project that involves the Department of Philology and Literary Criticism of the University of Siena, along with its Center for Comparative Studies, and the IT Company QuestIT, specialized in Artificial Intelligence and Machine Learning. The complex and demanding project gives exciting opportunities also from the point of view of the digital humanities studies. It allows to reflect on methodological issues and to seek solutions on a wide range of topics. Challenges start with the corpus definition, passing through the digitization/transcription of big amounts of texts, the encoding and the development of text analysis tools and the automatic extraction of semantic information with Natural Language Processing methods. One of the most interesting test benches of this project concerns the treatment of multilingual texts, on which we are currently working using some excerpts of Intorcetta’s Sapientia Sinica. Aim of this paper is to provide an introduction to the project, explaining the analysis of requirements and the general architecture, the reasons of some technical and methodological choices, and the tasks planned in the middle and long terms. The paper will also show the first prototype of the platform, available at the URL http://ela-unisi.it. In the prototype, documents are freely searchable by means of an ElasticSearch based search engine developed by dr. Nicola Giannelli (QuestIT). All texts are being encoded in XML/TEI, following the guidelines adopted by the ALIM Project (alim.unisi.it), and also include our first experiments on Named-entity recognition.

Carbe', E. (2020). Eurasian Latin Archive. ITINERARIA, 19, 247-261.

Eurasian Latin Archive

Carbe', Emmanuela
2020-01-01

Abstract

ELA - Eurasian Latin Archive is a platform under construction aimed at hosting an open-access library of Latin and multilingual texts of medieval and early modern age concerning East Asia. The platform includes tools to investigate the documents in their linguistic and semantic aspects. The start-up phase (March 2018-February 2020) has been co-financed by Regione Toscana within DASMeMo (Data-mining e analisi statistica su fonti testuali storiche del periodo medieval e moderno), a project that involves the Department of Philology and Literary Criticism of the University of Siena, along with its Center for Comparative Studies, and the IT Company QuestIT, specialized in Artificial Intelligence and Machine Learning. The complex and demanding project gives exciting opportunities also from the point of view of the digital humanities studies. It allows to reflect on methodological issues and to seek solutions on a wide range of topics. Challenges start with the corpus definition, passing through the digitization/transcription of big amounts of texts, the encoding and the development of text analysis tools and the automatic extraction of semantic information with Natural Language Processing methods. One of the most interesting test benches of this project concerns the treatment of multilingual texts, on which we are currently working using some excerpts of Intorcetta’s Sapientia Sinica. Aim of this paper is to provide an introduction to the project, explaining the analysis of requirements and the general architecture, the reasons of some technical and methodological choices, and the tasks planned in the middle and long terms. The paper will also show the first prototype of the platform, available at the URL http://ela-unisi.it. In the prototype, documents are freely searchable by means of an ElasticSearch based search engine developed by dr. Nicola Giannelli (QuestIT). All texts are being encoded in XML/TEI, following the guidelines adopted by the ALIM Project (alim.unisi.it), and also include our first experiments on Named-entity recognition.
2020
Carbe', E. (2020). Eurasian Latin Archive. ITINERARIA, 19, 247-261.
File in questo prodotto:
File Dimensione Formato  
2020_carbe_itineraria.pdf

non disponibili

Tipologia: PDF editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 372.64 kB
Formato Adobe PDF
372.64 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1123837