Data quality metrics for unlabelled datasets in medical imaging

Díaz-Villaplana, Ana Catalina

Ver/

TF10140_BIB314988_Ana_Catalina_Diaz-Villaplana.pdf (1.159Mb)

Fecha

2024-07

Autor

Díaz-Villaplana, Ana Catalina

Metadatos

Mostrar el registro completo del ítem

Resumen

Deep learning models typically require large, labeled datasets for optimal performance. However, in real-world applications such as medical imaging, labeled data can be scarce. Semi-supervised deep learning addresses this challenge by leveraging both labeled and unlabeled data to enhance model accuracy. Most semi-supervised methods assume similar distributions between labeled and unlabeled datasets, an assumption that may not hold in practice. To ensure data quality and consistency, we introduce Mahalanobis-based and Frobenius-based distance measures in the embedding space of the deep learning model to evaluate the similarity between labeled and unlabeled datasets. Our findings reveal that the Mahalanobis-based distance correlates strongly with the accuracy of the popular semi-supervised method MixMatch, whereas Frobenius distance results show inconsistent behavior. Moreover, the proposed approach is significantly more efficient than existing methods in the field.

Descripción

Proyecto de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Ingeniería en Computación, 2024.

URI

https://hdl.handle.net/2238/16498

Colecciones

Maestría en Computación [120]