Background: Malignant melanoma (MM) is the most aggressive skin cancer, requiring early diagnosis for better outcomes. While deep learning models have shown promise in dermatological image analysis, their performance is constrained by limited training data. Generative Adversarial Networks (GANs) offer a solution by generating synthetic images for data augmentation. However, assessing their clinical reliability remains difficult as automated metrics may not fully capture visual realism or clinical usability. The Objective: This study presents a comprehensive framework for creating high-quality synthetic dermatoscopic pictures of MM lesions, as well as a holistic validation methodology that uses quantitative metrics and qualitative dermatologist assessment to provide a full clinical evaluation of the generated images. Three GAN architectures (DCGAN, StyleGAN2, and StyleGAN3-t) are explored. Lesions on the face, palms, and soles are excluded due to their unique dermoscopic patterns. Materials and Methods: A dataset of 1,774 dermatoscopic body MM images was used to train the models, assessed with Fréchet Inception Distance (FID), Kernel Inception Distance (KID), precision, and recall. Afterwards, a panel of 17 dermatologists with different levels of expertise assessed image quality using a 7-point Likert scale, with accuracy, sensitivity, specificity, and inter-rater agreement analysed. Results: StyleGAN2 achieved the best image fidelity (FID: 18.89, KID: 0.0025), while StyleGAN3-t demonstrated stable but slower convergence. Both StyleGAN models outperformed DCGAN in diversity and fidelity. The validation study showed that StyleGAN2-generated images were often indistinguishable from real ones, reflected in low specificity and accuracy values among evaluators. Conclusions: The study highlights the effectiveness of GANs in generating high-quality synthetic images, proposing a validation framework that integrates expert assessments with state-of-the-art quantitative metrics. This approach advances standardisation in GAN evaluation, ensuring synthetic images are clinically relevant for dermatological AI applications.
Luschi, A., Tognetti, L., Cartocci, A., Cevenini, G., Rubegni, P., Iadanza, E. (2025). Advancing synthetic data for dermatology: GAN comparison with multi-metric and expert validation approach. HEALTH AND TECHNOLOGY [10.1007/s12553-025-00971-x].
Advancing synthetic data for dermatology: GAN comparison with multi-metric and expert validation approach
Luschi, Alessio
;Tognetti, Linda;Cartocci, Alessandra;Cevenini, Gabriele;Rubegni, Pietro;Iadanza, Ernesto
2025-01-01
Abstract
Background: Malignant melanoma (MM) is the most aggressive skin cancer, requiring early diagnosis for better outcomes. While deep learning models have shown promise in dermatological image analysis, their performance is constrained by limited training data. Generative Adversarial Networks (GANs) offer a solution by generating synthetic images for data augmentation. However, assessing their clinical reliability remains difficult as automated metrics may not fully capture visual realism or clinical usability. The Objective: This study presents a comprehensive framework for creating high-quality synthetic dermatoscopic pictures of MM lesions, as well as a holistic validation methodology that uses quantitative metrics and qualitative dermatologist assessment to provide a full clinical evaluation of the generated images. Three GAN architectures (DCGAN, StyleGAN2, and StyleGAN3-t) are explored. Lesions on the face, palms, and soles are excluded due to their unique dermoscopic patterns. Materials and Methods: A dataset of 1,774 dermatoscopic body MM images was used to train the models, assessed with Fréchet Inception Distance (FID), Kernel Inception Distance (KID), precision, and recall. Afterwards, a panel of 17 dermatologists with different levels of expertise assessed image quality using a 7-point Likert scale, with accuracy, sensitivity, specificity, and inter-rater agreement analysed. Results: StyleGAN2 achieved the best image fidelity (FID: 18.89, KID: 0.0025), while StyleGAN3-t demonstrated stable but slower convergence. Both StyleGAN models outperformed DCGAN in diversity and fidelity. The validation study showed that StyleGAN2-generated images were often indistinguishable from real ones, reflected in low specificity and accuracy values among evaluators. Conclusions: The study highlights the effectiveness of GANs in generating high-quality synthetic images, proposing a validation framework that integrates expert assessments with state-of-the-art quantitative metrics. This approach advances standardisation in GAN evaluation, ensuring synthetic images are clinically relevant for dermatological AI applications.File | Dimensione | Formato | |
---|---|---|---|
Advancing synthetic data for dermatology GAN comparison with multi-metric and expert validation approach.pdf
accesso aperto
Descrizione: Articolo
Tipologia:
PDF editoriale
Licenza:
Creative commons
Dimensione
962.8 kB
Formato
Adobe PDF
|
962.8 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/1291974