Background: Malignant melanoma (MM) is the most aggressive skin cancer, requiring early diagnosis for better outcomes. While deep learning models have shown promise in dermatological image analysis, their performance is constrained by limited training data. Generative Adversarial Networks (GANs) offer a solution by generating synthetic images for data augmentation. However, assessing their clinical reliability remains difficult as automated metrics may not fully capture visual realism or clinical usability. The Objective: This study presents a comprehensive framework for creating high-quality synthetic dermatoscopic pictures of MM lesions, as well as a holistic validation methodology that uses quantitative metrics and qualitative dermatologist assessment to provide a full clinical evaluation of the generated images. Three GAN architectures (DCGAN, StyleGAN2, and StyleGAN3-t) are explored. Lesions on the face, palms, and soles are excluded due to their unique dermoscopic patterns. Materials and Methods: A dataset of 1,774 dermatoscopic body MM images was used to train the models, assessed with Fréchet Inception Distance (FID), Kernel Inception Distance (KID), precision, and recall. Afterwards, a panel of 17 dermatologists with different levels of expertise assessed image quality using a 7-point Likert scale, with accuracy, sensitivity, specificity, and inter-rater agreement analysed. Results: StyleGAN2 achieved the best image fidelity (FID: 18.89, KID: 0.0025), while StyleGAN3-t demonstrated stable but slower convergence. Both StyleGAN models outperformed DCGAN in diversity and fidelity. The validation study showed that StyleGAN2-generated images were often indistinguishable from real ones, reflected in low specificity and accuracy values among evaluators. Conclusions: The study highlights the effectiveness of GANs in generating high-quality synthetic images, proposing a validation framework that integrates expert assessments with state-of-the-art quantitative metrics. This approach advances standardisation in GAN evaluation, ensuring synthetic images are clinically relevant for dermatological AI applications.

Luschi, A., Tognetti, L., Cartocci, A., Cevenini, G., Rubegni, P., Iadanza, E. (2025). Advancing synthetic data for dermatology: GAN comparison with multi-metric and expert validation approach. HEALTH AND TECHNOLOGY [10.1007/s12553-025-00971-x].

Advancing synthetic data for dermatology: GAN comparison with multi-metric and expert validation approach

Luschi, Alessio
;
Tognetti, Linda;Cartocci, Alessandra;Cevenini, Gabriele;Rubegni, Pietro;Iadanza, Ernesto
2025-01-01

Abstract

Background: Malignant melanoma (MM) is the most aggressive skin cancer, requiring early diagnosis for better outcomes. While deep learning models have shown promise in dermatological image analysis, their performance is constrained by limited training data. Generative Adversarial Networks (GANs) offer a solution by generating synthetic images for data augmentation. However, assessing their clinical reliability remains difficult as automated metrics may not fully capture visual realism or clinical usability. The Objective: This study presents a comprehensive framework for creating high-quality synthetic dermatoscopic pictures of MM lesions, as well as a holistic validation methodology that uses quantitative metrics and qualitative dermatologist assessment to provide a full clinical evaluation of the generated images. Three GAN architectures (DCGAN, StyleGAN2, and StyleGAN3-t) are explored. Lesions on the face, palms, and soles are excluded due to their unique dermoscopic patterns. Materials and Methods: A dataset of 1,774 dermatoscopic body MM images was used to train the models, assessed with Fréchet Inception Distance (FID), Kernel Inception Distance (KID), precision, and recall. Afterwards, a panel of 17 dermatologists with different levels of expertise assessed image quality using a 7-point Likert scale, with accuracy, sensitivity, specificity, and inter-rater agreement analysed. Results: StyleGAN2 achieved the best image fidelity (FID: 18.89, KID: 0.0025), while StyleGAN3-t demonstrated stable but slower convergence. Both StyleGAN models outperformed DCGAN in diversity and fidelity. The validation study showed that StyleGAN2-generated images were often indistinguishable from real ones, reflected in low specificity and accuracy values among evaluators. Conclusions: The study highlights the effectiveness of GANs in generating high-quality synthetic images, proposing a validation framework that integrates expert assessments with state-of-the-art quantitative metrics. This approach advances standardisation in GAN evaluation, ensuring synthetic images are clinically relevant for dermatological AI applications.
2025
Luschi, A., Tognetti, L., Cartocci, A., Cevenini, G., Rubegni, P., Iadanza, E. (2025). Advancing synthetic data for dermatology: GAN comparison with multi-metric and expert validation approach. HEALTH AND TECHNOLOGY [10.1007/s12553-025-00971-x].
File in questo prodotto:
File Dimensione Formato  
Advancing synthetic data for dermatology GAN comparison with multi-metric and expert validation approach.pdf

accesso aperto

Descrizione: Articolo
Tipologia: PDF editoriale
Licenza: Creative commons
Dimensione 962.8 kB
Formato Adobe PDF
962.8 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1291974