Automatic spanish text complexity detection in financial documents with low data availability

Romero-Sandoval, Mario Alberto

View/Open

TF10078_BIB314399_Mario_Alberto_Romero-Sandoval.pdf (676.4Kb)

Date

2024-05-29

Author

Romero-Sandoval, Mario Alberto

Metadata

Show full item record

Abstract

Access to information is a fundamental human right in modern society. Nevertheless we all do not have equal access to information, and one reason for that is that we do not understand everything in the same way. Education level, age, disabilities and the cultural context may impact the way that a text is read and understood by the public. Being able to discriminate between complex and simple segments of text has many applications from improve the efficiency of simplifications systems, to education application helping to determine if a text is appropriate for a given student level and also supervise whether institutions are communicating properly its decisions with the public. In this work, we will explore different method and techniques for text classification based on the complexity, concretely Spanish text, as well as methods to solve the lack of data in general for the task of Spanish text complexity discrimination. Specifically we will focus on the leverage of existing language models and transfer learning to achieve and measure the impact of augmented data by using synthetic data generation in the problem of text complexity discrimination.

Description

Proyecto de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Ingeniería en Computación, 2024.

URI

https://hdl.handle.net/2238/16405

Collections

Maestría en Computación [113]