Data quality metrics for unlabelled datasets in medical imaging
| dc.contributor.advisor | Calderón-Ramírez, Saúl | es |
| dc.contributor.author | Díaz-Villaplana, Ana Catalina | |
| dc.date.accessioned | 2026-03-18T17:06:35Z | |
| dc.date.available | 2026-03-18T17:06:35Z | |
| dc.date.issued | 2024-07 | |
| dc.description | Proyecto de Graduación (Maestría en Computación) Instituto Tecnológico de Costa Rica, Escuela de Ingeniería en Computación, 2024. | es |
| dc.description.abstract | Deep learning models typically require large, labeled datasets for optimal performance. However, in real-world applications such as medical imaging, labeled data can be scarce. Semi-supervised deep learning addresses this challenge by leveraging both labeled and unlabeled data to enhance model accuracy. Most semi-supervised methods assume similar distributions between labeled and unlabeled datasets, an assumption that may not hold in practice. To ensure data quality and consistency, we introduce Mahalanobis-based and Frobenius-based distance measures in the embedding space of the deep learning model to evaluate the similarity between labeled and unlabeled datasets. Our findings reveal that the Mahalanobis-based distance correlates strongly with the accuracy of the popular semi-supervised method MixMatch, whereas Frobenius distance results show inconsistent behavior. Moreover, the proposed approach is significantly more efficient than existing methods in the field. | es |
| dc.identifier.uri | https://hdl.handle.net/2238/16498 | |
| dc.language.iso | eng | es |
| dc.publisher | Instituto Tecnológico de Costa Rica | es |
| dc.rights | acceso abierto | es |
| dc.subject | Métrica -- Calidad de datos | es |
| dc.subject | Conjuntos de datos | es |
| dc.subject | Imágenes medicas digitales | es |
| dc.subject | Aprendizaje profundo (Aprendizaje automático) | es |
| dc.subject | Datos etiquetados | es |
| dc.subject | Imágenes -- Radiografía | es |
| dc.subject | Medición -- Tiempo -- Procesamiento | es |
| dc.subject | Rayos X -- Imágenes | es |
| dc.subject | Metrics -- Data quality | es |
| dc.subject | Datasets | es |
| dc.subject | Digital medical images | es |
| dc.subject | Deep learning (Machine learning) | es |
| dc.subject | Labeled data | es |
| dc.subject | Images -- Radiography | es |
| dc.subject | Measurement -- Time -- Processing | es |
| dc.subject | X-rays -- Images | es |
| dc.subject | Research Subject Categories::TECHNOLOGY::Information technology::Computer science | es |
| dc.title | Data quality metrics for unlabelled datasets in medical imaging | es |
| dc.type | tesis de maestría | es |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- TF10140_BIB314988_Ana_Catalina_Diaz-Villaplana.pdf
- Size:
- 1.16 MB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 1.77 KB
- Format:
- Item-specific license agreed upon to submission
- Description: