Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2017
…
3 pages
1 file
Lexos is a browser-based suite of tools that helps lower barriers of entry to computational text analysis for humanities scholars and students. Situated within a clean and simple interface, Lexos consolidates the common pre-processing operations needed for subsequent analysis, either with Lexos or with external tools. It is especially useful for scholars who wish to engage in research involving computational text analysis and/or wish to teach their students how to do so but lack the time for a manual preparation of texts, the skill sets needed to prepare their texts analysis, or the intellectual contexts for situating computational methods within their work. Lexos is also targeted at researchers studying early texts and texts in non-Western languages, which may involve specialized processing rules. It is thus designed to facilitate advanced research in these fields even for users more familiar with computational techniques. Lexos is developed by the Lexomics research group led by Mi...
2017
This project hybridizes traditional humanistic approaches to textual scholarship, such as source study and the analysis of style, with advanced computational and statistical comparative methods, allowing scholars "deep access" to digitized texts and textual corpora. Our multi-disciplinary collaboration enables us to discover patterns in (and between) texts previously invisible to traditional methods. Going forward, we will build on the success of our previous Digital Humanities Start-up Grant by further developing tools and documentation (in an open, on-line community) for applying advanced statistical methodologies to textual and literary problems. At the same time we will demonstrate the value of the approach by applying the tools and methods to texts from a variety of languages and time periods, including Old English, medieval Latin, and Modern English works from the twentieth-century Harlem Renaissance.
The Journal of medieval Latin, 2014
In this paper we demonstrate how "lexomic" methods of computer-assisted statistical analysis can be adapted to the investigation of the structures, styles, sources, and authorship of Medieval Latin texts. The methods, which compare the vocabulary distributions of segments of texts, are first shown to produce results consistent with "control" texts whose characteristics have been determined by more traditional approaches. These controls include Waltharius, the Vita sancti Martini of Sulpicius Severus, Alan of Lille's De planctu naturae, the Vita Merlini by Geoffrey of Monmouth, and the Gesta Friderici Imperatoris, and thus represent both poetry and prose from multiple genres, centuries, and geographical locations. After successfully testing the methods against the controls, we demonstrate how they may be applied to investigate texts whose characteristics are unknown or disputed. Our analysis of Dante's Epistolae provides evidence that paragraphs 5-33 of the letter to Can Grande are not by Dante, a point disputed in Dante scholarship. We then examine Bede's Historia Ecclesiastica and discover evidence that strongly suggests that Ceolfrid, Bede's one-time abbot, was the author not only of the letter to King Nechtan that Bede quotes, but also of a computus-focused account of the Synod at Whitby that appears in Book 3, chapter 25, of the Ecclesiastical History. We thus demonstrate that lexomic methods, when used in conjunction with traditional forms of analysis, are useful for the investigation of Medieval Latin texts. Cet article a pour but de montrer comment les méthodes « lexomiques » d'analyse statistique assistée par ordinateur peuvent être adaptées à l'enquête sur les structures, les styles, les sources, et les auteurs de textes latins médiévaux. Nous montrons d'abord que les méthodes, qui comparent les distributions de vocabulaire dans des segments de textes, fournissent des résultats cohérents par rapport aux textes de « contrôle » que sont le Waltharius, la
Zenodo (CERN European Organization for Nuclear Research), 2023
CARMEN Working Papers 3, 2022
Research in the humanities is, of course, contemporary in its questions and in its use of technologies. It is our desire as well as our duty to make use of the facilities of our age, to offer keys to the past, and to meet the needs and interests of the users. We thus saw the need to revise our methodologies and to apply an innovative digital approach to the study of Latin epigraphic verse. This approach has allowed us to make our studies and editions as a whole available to the research community and a broader public interested in Roman history and poetry. At the moment, we focus on epigraphic poetry from Hispania, Britannia, and partly from Gaul, although we have an eye on gradually completing a digital corpus of inscriptions in verse encompassing the Roman world as a whole.
2019
This paper presents a combination of R packages—user contributed toolkits written in a common core programming language—to facilitate the humanistic investigation of digitised, textbased corpora. Our survey of text analysis packages includes those of our own creation (cleanNLP and fasttextM) as well as packages built by other research groups (stringi, readtext, hyphenatr, quanteda, and hunspell). By operating on generic object types, these packages unite research innovations in corpus linguistics, natural language processing, machine learning, statistics, and digital humanities. We begin by extrapolating on the theoretical benefits of R as an elaborate gluing language for bringing together several areas of expertise and compare it to linguistic concordancers and other tool-based approaches to text analysis in the digital humanities. We then showcase the practical benefits of an ecosystem by illustrating how R packages have been integrated into a digital humanities project. Throughou...
Project Deliverable, 2022
This landscape review focuses on intellectual access, i.e. providing guidance for finding and sharing literary data, while D6.1 approaches the task from a more technological side, collecting and analyzing literary corpora, available formats, tools, and metadata in order to create an exploratory catalogue / inventory of literary corpora and to provide a transformation matrix/toolbox for solving common issues. Yet we coordinate our efforts – beginning with the compilation of the table of literary collections – therefore one can regard these as two sides of the same coin. The review’s point of departure is the abundance of existing data and their diversity or heterogeneity as regards corpus design and underlying concepts, for example the definitions of text (is it a source, an edition, a data set? see chapter 3), the purpose of a corpus (e.g. general, reference, or monitoring corpora, special purpose corpora; see chapter 4), central considerations or criteria regarding the construction of a corpus (sampling, balancing, representativeness, annotation model(s), data format(s); see likewise chapter 4). How can I go about obtaining data without transgressing ethical or legal boundaries (see chapter 5)? We ask: How can we assist literary scholars in searching for and finding existing data that are relevant to their own research questions? And additionally, what kind of research question is relevant concerning the present-day state of the data landscape and literariness and textuality?
New Methods in Historical Corpus Linguistics, 2013
Tools for historical corpus research, and a corpus of Latin We present LatinISE, a Latin corpus for the Sketch Engine. LatinISE consists of Latin works comprising a total of 13 million words, covering the time span from the 2 nd century B. C. to the 21 st century A. D. LatinISE is provided with rich metadata markup , including author, title, genre, era, date and century, as well as book, section, paragraph and line of verses. We have automatically annotated LatinISE with lemma and part-of-speech information. The annotation enables the users to search the corpus with a number of criteria, ranging from lemma, part-of-speech, context, to subcorpora defined chronologically or by genre. We also illustrate word sketches, one-page summaries of a word's corpus-based collocational behaviour. Our future plan is to produce word sketches for Latin words by adding richer morphological and syntactic annotation to the corpus.
2018
The design of LitText follows the traditional research approach in digital humanities (DH): collecting texts for critical reading and underlining parts of interest. Texts, in multiple languages, are prepared with a minimal markup language, and processed by NLP services. The result is converted to RDF (a.k.a. semantic-web, linked-data) triples. Additional data available as linked data on the web (e.g. Wikipedia data) can be added. The DH researcher can then harvest the corpus with SPARQL queries. The approach is demonstrated with the construction of a 20 million word corpus from English, German, Spanish, French and Italian texts and an example query to identify texts where animals behave like humans as it is the case in fables.