Situated Visualizations (SV) and reality-based information retrieval systems, enhanced by Mixed Reality (MR) and Augmented Reality (AR), enable the overlay of digital information onto real-world objects, providing context-aware content through computer vision. Despite their potential, these systems face significant challenges in scalability and adaptability, particularly for domains like wine recognition, where diverse label designs, frequent updates, and limited historical databases complicate automated analysis. SOLLAMA (SOmmeLier LlAMA) is a novel wine recognition framework designed to address the scalability and adaptability challenges of AR systems in recognizing diverse wine labels. Leveraging Multimodal Large Language Models (MLLMs), SOLLAMA integrates visual and textual cues for accurate label interpretation, bypassing the need for extensive image datasets and traditional OCR methods. Built on the Augmented Wine Recognition (AWR) system, it replaces the OCR module with LLAMA 3.2 for advanced text recognition and contextual understanding. Key benefits include scalability across diverse designs and simplified, server-free deployment. Experimental validation on a dataset of wine labels from Italy's Emilia-Romagna region highlights the system's effectiveness and its potential to transform wine recognition in AR-based applications.

Multi-Modal Large Language Model Driven Augmented Reality Situated Visualization: The Case of Wine Recognition

Stacchio, L.;
2025-01-01

Abstract

Situated Visualizations (SV) and reality-based information retrieval systems, enhanced by Mixed Reality (MR) and Augmented Reality (AR), enable the overlay of digital information onto real-world objects, providing context-aware content through computer vision. Despite their potential, these systems face significant challenges in scalability and adaptability, particularly for domains like wine recognition, where diverse label designs, frequent updates, and limited historical databases complicate automated analysis. SOLLAMA (SOmmeLier LlAMA) is a novel wine recognition framework designed to address the scalability and adaptability challenges of AR systems in recognizing diverse wine labels. Leveraging Multimodal Large Language Models (MLLMs), SOLLAMA integrates visual and textual cues for accurate label interpretation, bypassing the need for extensive image datasets and traditional OCR methods. Built on the Augmented Wine Recognition (AWR) system, it replaces the OCR module with LLAMA 3.2 for advanced text recognition and contextual understanding. Key benefits include scalability across diverse designs and simplified, server-free deployment. Experimental validation on a dataset of wine labels from Italy's Emilia-Romagna region highlights the system's effectiveness and its potential to transform wine recognition in AR-based applications.
2025
9798331536626
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11393/365131
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact