Cultural heritage applications and advanced machine learning models are creating a fruitful synergy to provide effective and accessible ways of interacting with artworks. Smart audio-guides, personalized art-related content and gamification approaches are just a few examples of how technology can be exploited to provide additional value to artists or exhibitions. Nonetheless, from a machine learning point of view, the amount of available artistic data is often not enough to train effective models. Off-the-shelf computer vision modules can still be exploited to some extent, yet a severe domain shift is present between art images and standard natural image datasets used to train such models. As a result, this can lead to degraded performance. This paper introduces a novel approach to address the challenges of limited annotated data and domain shifts in the cultural heritage domain. By leveraging generative vision-language models, we augment art datasets by generating diverse variations of artworks conditioned on their captions. This augmentation strategy enhances dataset diversity, bridging the gap between natural images and artworks, and improving the alignment of visual cues with knowledge from general-purpose datasets. The generated variations assist in training vision and language models with a deeper understanding of artistic characteristics and that are able to generate better captions with appropriate jargon.

Cioni, D., Berlincioni, L., Becattini, F., Del Bimbo, A. (2023). Diffusion Based Augmentation for Captioning and Retrieval in Cultural Heritage. In Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023 (pp.1699-1708). 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA : Institute of Electrical and Electronics Engineers Inc. [10.1109/ICCVW60793.2023.00186].

Diffusion Based Augmentation for Captioning and Retrieval in Cultural Heritage

Becattini F.;
2023-01-01

Abstract

Cultural heritage applications and advanced machine learning models are creating a fruitful synergy to provide effective and accessible ways of interacting with artworks. Smart audio-guides, personalized art-related content and gamification approaches are just a few examples of how technology can be exploited to provide additional value to artists or exhibitions. Nonetheless, from a machine learning point of view, the amount of available artistic data is often not enough to train effective models. Off-the-shelf computer vision modules can still be exploited to some extent, yet a severe domain shift is present between art images and standard natural image datasets used to train such models. As a result, this can lead to degraded performance. This paper introduces a novel approach to address the challenges of limited annotated data and domain shifts in the cultural heritage domain. By leveraging generative vision-language models, we augment art datasets by generating diverse variations of artworks conditioned on their captions. This augmentation strategy enhances dataset diversity, bridging the gap between natural images and artworks, and improving the alignment of visual cues with knowledge from general-purpose datasets. The generated variations assist in training vision and language models with a deeper understanding of artistic characteristics and that are able to generate better captions with appropriate jargon.
2023
979-8-3503-0744-3
Cioni, D., Berlincioni, L., Becattini, F., Del Bimbo, A. (2023). Diffusion Based Augmentation for Captioning and Retrieval in Cultural Heritage. In Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023 (pp.1699-1708). 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA : Institute of Electrical and Electronics Engineers Inc. [10.1109/ICCVW60793.2023.00186].
File in questo prodotto:
File Dimensione Formato  
Cioni_Diffusion_Based_Augmentation_for_Captioning_and_Retrieval_in_Cultural_Heritage_ICCVW_2023_paper.pdf

accesso aperto

Tipologia: Post-print
Licenza: PUBBLICO - Pubblico con Copyright
Dimensione 1.18 MB
Formato Adobe PDF
1.18 MB Adobe PDF Visualizza/Apri
Diffusion_Based_Augmentation_for_Captioning_and_Retrieval_in_Cultural_Heritage.pdf

non disponibili

Tipologia: PDF editoriale
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.6 MB
Formato Adobe PDF
1.6 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1277506