Advancing synthetic data for dermatology: GAN comparison with multi-metric and expert validation approach

IRIS

Background: Malignant melanoma (MM) is the most aggressive skin cancer, requiring early diagnosis for better outcomes. While deep learning models have shown promise in dermatological image analysis, their performance is constrained by limited training data. Generative Adversarial Networks (GANs) offer a solution by generating synthetic images for data augmentation. However, assessing their clinical reliability remains difficult as automated metrics may not fully capture visual realism or clinical usability. The Objective: This study presents a comprehensive framework for creating high-quality synthetic dermatoscopic pictures of MM lesions, as well as a holistic validation methodology that uses quantitative metrics and qualitative dermatologist assessment to provide a full clinical evaluation of the generated images. Three GAN architectures (DCGAN, StyleGAN2, and StyleGAN3-t) are explored. Lesions on the face, palms, and soles are excluded due to their unique dermoscopic patterns. Materials and Methods: A dataset of 1,774 dermatoscopic body MM images was used to train the models, assessed with Fréchet Inception Distance (FID), Kernel Inception Distance (KID), precision, and recall. Afterwards, a panel of 17 dermatologists with different levels of expertise assessed image quality using a 7-point Likert scale, with accuracy, sensitivity, specificity, and inter-rater agreement analysed. Results: StyleGAN2 achieved the best image fidelity (FID: 18.89, KID: 0.0025), while StyleGAN3-t demonstrated stable but slower convergence. Both StyleGAN models outperformed DCGAN in diversity and fidelity. The validation study showed that StyleGAN2-generated images were often indistinguishable from real ones, reflected in low specificity and accuracy values among evaluators. Conclusions: The study highlights the effectiveness of GANs in generating high-quality synthetic images, proposing a validation framework that integrates expert assessments with state-of-the-art quantitative metrics. This approach advances standardisation in GAN evaluation, ensuring synthetic images are clinically relevant for dermatological AI applications.

Luschi, A., Tognetti, L., Cartocci, A., Cevenini, G., Rubegni, P., Iadanza, E. (2025). Advancing synthetic data for dermatology: GAN comparison with multi-metric and expert validation approach. HEALTH AND TECHNOLOGY [10.1007/s12553-025-00971-x].

Advancing synthetic data for dermatology: GAN comparison with multi-metric and expert validation approach

Luschi, Alessio;Tognetti, Linda;Cartocci, Alessandra;Cevenini, Gabriele;Rubegni, Pietro;Iadanza, Ernesto

2025-01-01

Abstract

Background: Malignant melanoma (MM) is the most aggressive skin cancer, requiring early diagnosis for better outcomes. While deep learning models have shown promise in dermatological image analysis, their performance is constrained by limited training data. Generative Adversarial Networks (GANs) offer a solution by generating synthetic images for data augmentation. However, assessing their clinical reliability remains difficult as automated metrics may not fully capture visual realism or clinical usability. The Objective: This study presents a comprehensive framework for creating high-quality synthetic dermatoscopic pictures of MM lesions, as well as a holistic validation methodology that uses quantitative metrics and qualitative dermatologist assessment to provide a full clinical evaluation of the generated images. Three GAN architectures (DCGAN, StyleGAN2, and StyleGAN3-t) are explored. Lesions on the face, palms, and soles are excluded due to their unique dermoscopic patterns. Materials and Methods: A dataset of 1,774 dermatoscopic body MM images was used to train the models, assessed with Fréchet Inception Distance (FID), Kernel Inception Distance (KID), precision, and recall. Afterwards, a panel of 17 dermatologists with different levels of expertise assessed image quality using a 7-point Likert scale, with accuracy, sensitivity, specificity, and inter-rater agreement analysed. Results: StyleGAN2 achieved the best image fidelity (FID: 18.89, KID: 0.0025), while StyleGAN3-t demonstrated stable but slower convergence. Both StyleGAN models outperformed DCGAN in diversity and fidelity. The validation study showed that StyleGAN2-generated images were often indistinguishable from real ones, reflected in low specificity and accuracy values among evaluators. Conclusions: The study highlights the effectiveness of GANs in generating high-quality synthetic images, proposing a validation framework that integrates expert assessments with state-of-the-art quantitative metrics. This approach advances standardisation in GAN evaluation, ensuring synthetic images are clinically relevant for dermatological AI applications.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Rivista su cui è pubblicata l'opera
	
				HEALTH AND TECHNOLOGY
			
	Citazione
	
				Luschi, A., Tognetti, L., Cartocci, A., Cevenini, G., Rubegni, P., Iadanza, E. (2025). Advancing synthetic data for dermatology: GAN comparison with multi-metric and expert validation approach. HEALTH AND TECHNOLOGY [10.1007/s12553-025-00971-x].
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Advancing synthetic data for dermatology GAN comparison with multi-metric and expert validation approach.pdf accesso aperto Descrizione: Articolo Tipologia: PDF editoriale Licenza: Creative commons Dimensione 962.8 kB Formato Adobe PDF Visualizza/Apri	962.8 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1291974