Projects

Current Projects

InterVisions – Participatory AI for Intersectional Bias Auditing (2025–2026)
Funding: EU – CERV   |   Grant ID: 101214711   |   Budget: €245,417.34
More info

Coordinated by: ALIA – Associació Cultural de Dones per a la Recerca i l’Acció
Participants:
Centre de Visió per Computador (CVC-UAB), Research Organisation
Diputació de Barcelona, Associated Partner

Goal:
InterVisions aims to build a participatory bias audit tool for vision and language foundation models. It integrates intersectional feminist theory, deep learning, and participatory AI practices to identify and mitigate social biases in large-scale multimodal AI systems.

Activities:
– Community-driven workshops to audit foundation models
– Co-creation of a technical fairness benchmark
– Development of intersectional impact assessment guidelines
– Promotion of ethical AI practices in line with the EU Charter of Fundamental Rights

Keywords: Bias in AI, Ethical AI, Participatory AI, Intersectionality, Fairness Benchmark, Vision & Language Models

Project Website: TBD
FairCLIP – Training a Fair CLIP Model with Hybrid Real and Synthetic Data (2024–2025)
Funding: EuroHPC AI & Data-Intensive Applications Access Call   |   Grant ID: EHPC-AI-2024A02-040   |   Resources: 32,000 node hours on MareNostrum5
More info

Coordinated by: Universitat Autònoma de Barcelona / Computer Vision Center (Spain)
Team:
– Dr. Lluis Gomez (PI) • Dr. Lei Kang • Dr. Mohamed Ali Souibgui • Mr. Francesc Net • Mr. Joan Masoliver • Dr. Sonia Ruiz • Prof. Yuki M. Asano (University of Amsterdam)

Objective:
The FairCLIP project aims to mitigate bias in large-scale vision-language models by training a new CLIP model on a hybrid dataset combining real and synthetic data, ensuring balanced demographic representation. The project contributes to fairness in AI with both technical and ethical innovations.

Key Methods:
– Synthetic data generation via state-of-the-art diffusion models
– Real data from the CommonPool dataset
– OpenCLIP framework for scalable training
– Contrastive learning with demographic control

Milestones:
– Small-scale (12M samples), medium-scale (128M), and large-scale (400M) experiments
– Total: 32,000 node hours over 12 months (Aug 2024–Jul 2025)

Expected Outcomes:
– A fairness-optimized CLIP model
– A reusable hybrid dataset
– Open-source technical deliverables

Keywords: Fair AI, CLIP, Synthetic Data, Bias Mitigation, Diffusion Models, Vision-Language Models, HPC

Code: FairCLIP GitHub Repository
COELI-IA – From Text to Media: A Paradigm Shift in Cultural Heritage Management (2023–2025)
Funding: INNOTEC R+D Grant (Catalonia)   |   Grant ID: RDECR20/EMT/1791/2021   |   Budget: €195,530.02
More info

Coordinated by: Nubilum SL (SME)
Research Partner: Centre de Visió per Computador (CVC), Universitat Autònoma de Barcelona

Objective:
COELI-IA aims to revolutionize the management and dissemination of cultural heritage content by leveraging AI techniques. The project explores automatic classification, indexing, and enhanced accessibility for digital archives through multimodal models that can understand and connect text and media data.

Key Innovations:
– Development of AI-driven cultural heritage content engines
– New interfaces and recommendation systems based on content relevance
– Fine-tuning of AI models for domain-specific archives

Funding Structure:
– Total accepted budget: €195,530.02
– CVC share: €84,446.05 (43.19%)
– Nubilum SL share: €111,083.98 (56.81%)

Team:
– Dr. Lluís Gómez (CVC Lead) – Pep Casals Pug (Nubilum Lead) – Marc Folia Campos (Nubilum) – Francesc Net Barnes (CVC research staff)

Keywords: Cultural Heritage, AI for Archives, Multimodal Indexing, Recommendation Systems, Computer Vision, NLP

More Info: Videocoeli.catcvc.uab.cat


Past Projects

  • ReadQA – Reading systems for Visual Question Answering
    Funded by: Ministerio de Ciencia e Innovación (PID2020-116298GB-I00) • €89,419
    Period: Jan 2021 – Dec 2023 • PIs: Lluis Gomez & Dimosthenis Karatzas
    Aimed to improve scene-text-based VQA systems using advanced multimodal models.

  • BeARS – Beyond Automatic Reading Systems
    Funded by: AGAUR (Catalan University and Research Agency) • €97,000
    Period: 2020–2021 • PIs: M. Russiñol & Lluis Gomez
    Focused on broadening the capabilities of reading systems beyond OCR.

  • DeepPhotoArchive
    Funded by: TECNIOspring PLUS / H2020 MSCA / ACCIÓ • €113,339
    Period: 2018–2020 • PI: Lluis Gomez
    Applied deep learning to build semantic search engines for photo archives.

  • READS – Reading the Scene
    Funded by: Ministerio de Economía, Industria y Competitividad (TIN2017-89779P) • €81,554
    Period: 2018–2020 • PIs: D. Karatzas & E. Valveny
    Core research on text-in-scene interpretation and representation.

  • Semantic Search in Digital Newspaper Libraries
    Funded by: Fundación BBVA • €74,526
    Period: 2018–2019 • PI: M. Russiñol • Role: Core Researcher
    Developed multimodal search tools for historical digital newspapers.

  • RAW – Reading in the Wild
    Funded by: Ministerio de Economía y Competitividad (TIN2014-52072P) • €109,021
    Period: 2015–2017 • PI: D. Karatzas • Role: Core Researcher
    Addressed robust scene text understanding in unconstrained environments.

  • Text and the City – Human-Centred Scene Text Understanding
    Funded by: Ministerio de Ciencia e Innovación (TIN2011-24631) • €78,045
    Period: 2012–2014 • PI: D. Karatzas • Role: Core Researcher
    Explored user-centric models for text interpretation in urban imagery.

  • Knowledge Extraction from Document Images with Heterogeneous Contents
    Funded by: Ministerio de Ciencia e Innovación (TIN2009-14633-C03-03) • €195,000
    Period: Jan 2010 – Aug 2013 • PI: J. Lladós • Role: Core Researcher
    Investigated document image understanding for structured and unstructured content.

  • HuPerText – Human Perception Inspired Text Technologies
    Funded by: Ministerio de Ciencia e Innovación (TIN2008-04998) • €49,610
    Period: 2009–2011 • PI: D. Karatzas • Role: Core Researcher
    Focused on perceptually motivated scene text modeling and reading.