Data quality metrics for unlabelled datasets in medical imaging

dc.contributor.advisorCalderón-Ramírez, Saúles
dc.contributor.authorDíaz-Villaplana, Ana Catalina
dc.date.accessioned2026-03-18T17:06:35Z
dc.date.available2026-03-18T17:06:35Z
dc.date.issued2024-07
dc.descriptionProyecto de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Ingeniería en Computación, 2024.es
dc.description.abstractDeep learning models typically require large, labeled datasets for optimal performance. However, in real-world applications such as medical imaging, labeled data can be scarce. Semi-supervised deep learning addresses this challenge by leveraging both labeled and unlabeled data to enhance model accuracy. Most semi-supervised methods assume similar distributions between labeled and unlabeled datasets, an assumption that may not hold in practice. To ensure data quality and consistency, we introduce Mahalanobis-based and Frobenius-based distance measures in the embedding space of the deep learning model to evaluate the similarity between labeled and unlabeled datasets. Our findings reveal that the Mahalanobis-based distance correlates strongly with the accuracy of the popular semi-supervised method MixMatch, whereas Frobenius distance results show inconsistent behavior. Moreover, the proposed approach is significantly more efficient than existing methods in the field.es
dc.identifier.urihttps://hdl.handle.net/2238/16498
dc.language.isoenges
dc.publisherInstituto Tecnológico de Costa Ricaes
dc.rightsacceso abiertoes
dc.subjectMétrica -- Calidad de datoses
dc.subjectConjuntos de datoses
dc.subjectImágenes medicas digitaleses
dc.subjectAprendizaje profundo (Aprendizaje automático)es
dc.subjectDatos etiquetadoses
dc.subjectImágenes -- Radiografíaes
dc.subjectMedición -- Tiempo -- Procesamientoes
dc.subjectRayos X -- Imágeneses
dc.subjectMetrics -- Data qualityes
dc.subjectDatasetses
dc.subjectDigital medical imageses
dc.subjectDeep learning (Machine learning)es
dc.subjectLabeled dataes
dc.subjectImages -- Radiographyes
dc.subjectMeasurement -- Time -- Processinges
dc.subjectX-rays -- Imageses
dc.subjectResearch Subject Categories::TECHNOLOGY::Information technology::Computer sciencees
dc.titleData quality metrics for unlabelled datasets in medical imaginges
dc.typetesis de maestríaes

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TF10140_BIB314988_Ana_Catalina_Diaz-Villaplana.pdf
Size:
1.16 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.77 KB
Format:
Item-specific license agreed upon to submission
Description: