Data quality metrics for unlabelled datasets in medical imaging

Loading...
Thumbnail Image

Authors

Díaz-Villaplana, Ana Catalina

Journal Title

Journal ISSN

Volume Title

Publisher

Instituto Tecnológico de Costa Rica

Abstract

Deep learning models typically require large, labeled datasets for optimal performance. However, in real-world applications such as medical imaging, labeled data can be scarce. Semi-supervised deep learning addresses this challenge by leveraging both labeled and unlabeled data to enhance model accuracy. Most semi-supervised methods assume similar distributions between labeled and unlabeled datasets, an assumption that may not hold in practice. To ensure data quality and consistency, we introduce Mahalanobis-based and Frobenius-based distance measures in the embedding space of the deep learning model to evaluate the similarity between labeled and unlabeled datasets. Our findings reveal that the Mahalanobis-based distance correlates strongly with the accuracy of the popular semi-supervised method MixMatch, whereas Frobenius distance results show inconsistent behavior. Moreover, the proposed approach is significantly more efficient than existing methods in the field.

Description

Proyecto de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Ingeniería en Computación, 2024.

Citation

Endorsement

Review

Supplemented By

Referenced By