Automatic spanish text complexity detection in financial documents with low data availability
Fecha
2024-05-29Autor
Romero-Sandoval, Mario Alberto
Metadatos
Mostrar el registro completo del ítemResumen
Access to information is a fundamental human right in modern society. Nevertheless we all do not have equal access to information, and one reason for that is that we do not understand everything in the same way. Education level, age, disabilities and the cultural context may impact the way that a text is read and understood by the public. Being able to discriminate between complex and simple segments of text has many applications from improve the efficiency of simplifications systems, to education application helping to determine if a text is appropriate for a given student level and also supervise whether institutions are communicating properly its decisions with the public. In this work, we will explore different method and techniques for text classification based on the complexity, concretely Spanish text, as well as methods to solve the lack of data in general for the task of Spanish text complexity discrimination. Specifically we will focus on the leverage of existing language models and transfer learning to achieve and measure the impact of augmented data by using synthetic data generation in the problem of text complexity discrimination.
Descripción
Proyecto de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Ingeniería en Computación, 2024.
Compartir
Métricas
Colecciones
- Maestría en Computación [113]

