Eurasian Latin Archive

IRIS

ELA - Eurasian Latin Archive is a platform under construction aimed at hosting an open-access library of Latin and multilingual texts of medieval and early modern age concerning East Asia. The platform includes tools to investigate the documents in their linguistic and semantic aspects. The start-up phase (March 2018-February 2020) has been co-financed by Regione Toscana within DASMeMo (Data-mining e analisi statistica su fonti testuali storiche del periodo medieval e moderno), a project that involves the Department of Philology and Literary Criticism of the University of Siena, along with its Center for Comparative Studies, and the IT Company QuestIT, specialized in Artificial Intelligence and Machine Learning. The complex and demanding project gives exciting opportunities also from the point of view of the digital humanities studies. It allows to reflect on methodological issues and to seek solutions on a wide range of topics. Challenges start with the corpus definition, passing through the digitization/transcription of big amounts of texts, the encoding and the development of text analysis tools and the automatic extraction of semantic information with Natural Language Processing methods. One of the most interesting test benches of this project concerns the treatment of multilingual texts, on which we are currently working using some excerpts of Intorcetta’s Sapientia Sinica. Aim of this paper is to provide an introduction to the project, explaining the analysis of requirements and the general architecture, the reasons of some technical and methodological choices, and the tasks planned in the middle and long terms. The paper will also show the first prototype of the platform, available at the URL http://ela-unisi.it. In the prototype, documents are freely searchable by means of an ElasticSearch based search engine developed by dr. Nicola Giannelli (QuestIT). All texts are being encoded in XML/TEI, following the guidelines adopted by the ALIM Project (alim.unisi.it), and also include our first experiments on Named-entity recognition.

Carbe', E. (2020). Eurasian Latin Archive. ITINERARIA, 19, 247-261.

Eurasian Latin Archive

Carbe', Emmanuela

2020-01-01

Abstract

ELA - Eurasian Latin Archive is a platform under construction aimed at hosting an open-access library of Latin and multilingual texts of medieval and early modern age concerning East Asia. The platform includes tools to investigate the documents in their linguistic and semantic aspects. The start-up phase (March 2018-February 2020) has been co-financed by Regione Toscana within DASMeMo (Data-mining e analisi statistica su fonti testuali storiche del periodo medieval e moderno), a project that involves the Department of Philology and Literary Criticism of the University of Siena, along with its Center for Comparative Studies, and the IT Company QuestIT, specialized in Artificial Intelligence and Machine Learning. The complex and demanding project gives exciting opportunities also from the point of view of the digital humanities studies. It allows to reflect on methodological issues and to seek solutions on a wide range of topics. Challenges start with the corpus definition, passing through the digitization/transcription of big amounts of texts, the encoding and the development of text analysis tools and the automatic extraction of semantic information with Natural Language Processing methods. One of the most interesting test benches of this project concerns the treatment of multilingual texts, on which we are currently working using some excerpts of Intorcetta’s Sapientia Sinica. Aim of this paper is to provide an introduction to the project, explaining the analysis of requirements and the general architecture, the reasons of some technical and methodological choices, and the tasks planned in the middle and long terms. The paper will also show the first prototype of the platform, available at the URL http://ela-unisi.it. In the prototype, documents are freely searchable by means of an ElasticSearch based search engine developed by dr. Nicola Giannelli (QuestIT). All texts are being encoded in XML/TEI, following the guidelines adopted by the ALIM Project (alim.unisi.it), and also include our first experiments on Named-entity recognition.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Rivista su cui è pubblicata l'opera
	
				ITINERARIA
			
	Citazione
	
				Carbe', E. (2020). Eurasian Latin Archive. ITINERARIA, 19, 247-261.
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
2020_carbe_itineraria.pdf non disponiibile Tipologia: PDF editoriale Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 372.64 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	372.64 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1123837