Automatic spanish text complexity detection in financial documents with low data availability
Abstract
Access to information is a fundamental human right in modern society. Nevertheless we all do not have equal access to information, and one reason for that is that we do not understand everything in the same way. Education level, age, disabilities and the cultural context may impact the way that a text is read and understood by the public. Being able to discriminate between complex and simple segments of text has many applications from improve the efficiency of simplifications systems, to education application helping to determine if a text is appropriate for a given student level and also supervise whether institutions are communicating properly its decisions with the public. In this work, we will explore different method and techniques for text classification based on the complexity, concretely Spanish text, as well as methods to solve the lack of data in general for the task of Spanish text complexity discrimination. Specifically we will focus on the leverage of existing language models and transfer learning to achieve and measure the impact of augmented data by using synthetic data generation in the problem of text complexity discrimination.
Description
Proyecto de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Ingeniería en Computación, 2024.
Share
Metrics
Collections
- Maestría en Computación [113]

