About

I am a TECNIOspring Research Fellow (H2020 Marie Skłodowska-Curie actions of the European Union) at the Computer Vision Center - Universitat Autonoma de Barcelona. I am a member of the Intelligent Reading Systems Group, where I work with my colleagues in developing algorithms to make computers read and understand textual information in unconstrained scenarios. I have done research stays at the Media Integration and Communication Center (MICC) - University of Florence, and the Intelligent Media Processing Group - Osaka Prefecture University, Japan. I have also collaborated with other prominent research groups in the organization of the ICDAR Robust Reading Competitions.

My research interests include a variety of different topics in machine learning and computer vision. Currently I work on embeding models, multimodal self-supervised learning, joint modelling of textual and visual information, and single shot CNN architectures for scene text understanding.

Curriculum vitae

email: lgomez {AT} cvc.uab.es
tel: +34 93 581 18 28
address: Edifici O, Campus UAB - 08193 Bellaterra (Cerdanyola). Barcelona, Spain.

Research




Deep-embeddings of images into text topic spaces

Topic modeling frameworks, such as the Latent Dirichlet Allocation (LDA) algorithm, are statistical models for discovering the the latent topics that occur in a corpus of textual documents. This way, each individual text document can be represented as a probability distribution over the set of discovered topics, and thus can be projected to a point in a topic space.

Our research puts forward the idea of embedding images into text topic spaces by mining a large scale collection of multi-modal (text and image) documents. To do so we first learn a topic model on the text corpus of a dataset composed by pairs of correlated texts and images. Then, we train a deep CNN model to predict text representations (topic-probabilities) directly from the image pixels. In other words the learned topic model teaches the CNN to predict the semantic context of images.

This deep-embedding framework can be used to perform different tasks, such as self-supervised learning of visual features, multi-modal image retrieval, or even to generate contextualized lexicons for scene text recognition.

  • Lluis Gomez, Yash Patel, Marçal Rusiñol, Dimosthenis Karatzas, & C.V. Jawahar. (2017). "Self-supervised learning of visual features through embedding images into text topic spaces." In Proc. International Conference on Computer Vision and Pattern Recognition, CVPR 2017. [PDF] [CODE]
  • Yash Patel, Lluis Gomez, Marçal Rusiñol, & Dimosthenis Karatzas. (2016). "Dynamic Lexicon Generation for Natural Scene Images." In Proc. 2nd International Workshop on Robust Reading, ECCV Workshops 2016. [PDF]
  • Raul Gomez, Lluis Gomez, Jaume Gibert, & Dimosthenis Karatzas. (2018). "Learning to Learn from Web Data through Deep Semantic Embeddings". In 1st Multimodal Learning and Applications Workshop, ECCV Workshops 2018.
  • Raul Gomez, Lluis Gomez, Jaume Gibert, & Dimosthenis Karatzas. (2018). "Learning from #Barcelona Instagram data". In 1st Multimodal Learning and Applications Workshop, ECCV Workshops 2018.

  • Scene text detection with Fully Convolutional Networks

    Text Proposals have emerged as a class-dependent version of object proposals -- efficient approaches to reduce the search space of possible text object locations and extents in an image. Combined with strong word classifiers, text proposals currently yield top state-of-the-art results in end-to-end scene text recognition. In this paper we propose an improvement over the original Text Proposals algorithm (Gomez and Karatzas 2015), combining it with Fully Convolutional Networks to improve the ranking of proposals. Results on the ICDAR RRC and the Coco-text datasets show superior performance over the current state-of-the-art.

  • Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas, & Andrew D. Bagdanov. (2017). "FAST: Facilitated and Accurate Scene Text Proposals through FCN Guided Pruning." Pattern Recognition Letters.
  • Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas, & Andrew D. Bagdanov. (2016). "Improving Text Proposals for Scene Images with Fully Convolutional Networks." In Proc. 1st International Workshop on Deep Learning for Pattern Recognition, ICPR Workshops 2016. [PDF]

  • Patch-based scene text script identification

    This work focuses on the problem of script identification in scene text images. Facing this problem with state of the art CNN classifiers is not straightforward, as they fail to address a key characteristic of scene text instances: their extremely variable aspect ratio. Instead of resizing input images to a fixed aspect ratio as in the typical use of holistic CNN classifiers, we propose here a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class.

    We describe a novel method based on the use of ensembles of conjoined networks to jointly learn discriminative stroke-parts representations and their relative importance in a patch-based classification scheme. Our experiments with this learning procedure demonstrate state-of-the-art results in two public script identification datasets.

  • Lluis Gomez, Anguelos Nicolaou, & Dimosthenis Karatzas. (2017). "Improving Patch-based Scene Text Script Identification with Ensembles of Conjoined Networks." Pattern Recognition. [PDF | CODE]
  • Lluis Gomez, & Dimosthenis Karatzas. (2016). "A Fine Grained Classification Approach to Scene Text Script Identification." In Proc. Document Analysis Systems (DAS), 2016 12th IAPR International Workshop on. IEEE, 2016. [PDF | CODE]

  • Exploting Similarity Hierarchies for Multi-script Scene Text Understanding

    Optical Character Recognition (OCR) is nowadays considered a solved problem when a clean binarized and well formatted input image, with text in a standard font and language, is provided. On the contrary, the automated localization, extraction and recognition of "scene text" in uncontrolled environments is still an open Computer Vision problem. At the core of the problem lies the extensive variability of scene text in terms of its location, rotation, physical appearance and design.

    Scene text extraction methodologies have been traditionally based in classification of individual regions or patches, using a priori knowledge for a given script or language. Human perception of text, on the other hand, is based on perceptual organisation through which text emerges as a perceptually significant group of atomic objects. Therefore humans are able to detect text even in languages and scripts never seen before. My research revolves around these ideas and poses the text extraction problem as the detection of meaningful groups of regions. I'm working in a text detection method built around a perceptual organisation framework that exploits collaboration of proximity and similarity laws to create text-group hypotheses.

  • Lluis Gomez, & Dimosthenis Karatzas. (2017). "TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild." Pattern Recognition. [PDF | CODE]
  • Lluis Gomez, & Dimosthenis Karatzas. (2016). "A Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene Text Extraction." International Journal on Document Analysis and Recognition. [PDF]
  • Lluis Gomez, & Dimosthenis Karatzas. (2015). "Object Proposals for Text Extraction in the Wild " In Proc. 13th International Conference on Document Analysis and Recognition. [PDF | CODE]
  • Lluis Gomez, & Dimosthenis Karatzas. (2014). "MSER-based Real-Time Text Detection and Tracking " In Proc. 22nd International Conference on Pattern Recognition (pp. 3110–3115). [PDF]
  • Lluis Gomez, & Dimosthenis Karatzas. (2014). "Scene Text Recognition: No Country for Old Men?" In Proc. 1st International Workshop on Robust Reading, ACCV Workshops. [CODE]
  • Lluis Gomez, & Dimosthenis Karatzas. (2013). "Multi-script Text Extraction from Natural Scenes " In Proc. 12th International Conference on Document Analysis and Recognition (pp. 467–471). [PDF | CODE]
  • Publications

    2018
    Lluis Gomez, Andres Mafla, Marçal Rusiñol, & Dimosthenis Karatzas. (2018). "Single Shot Scene Text Retrieval." European Conference on Computer Vision (ECCV).
    Raul Gomez, Lluis Gomez, Jaume Gibert, & Dimosthenis Karatzas. (2018). "Learning to Learn from Web Data through Deep Semantic Embeddings". In 1st Multimodal Learning and Applications Workshop, ECCV Workshops 2018.
    Raul Gomez, Lluis Gomez, Jaume Gibert, & Dimosthenis Karatzas. (2018). "Learning from #Barcelona Instagram data". In 1st Multimodal Learning and Applications Workshop, ECCV Workshops 2018.
    Lluis Gomez, Marçal Rusiñol & Dimosthenis Karatzas. (2018). "Cutting Sayre's Knot: Reading Scene Text without Segmentation. Application to Utility Meters". In 13th IAPR Workshop on Document Analysis Systems (DAS).
    Dimosthenis Karatzas, Lluis Gomez, Marçal Rusiñol & Anguelos Nicolaou. (2018). "The Robust Reading Competition Annotation and Evaluation Platform". In 13th IAPR Workshop on Document Analysis Systems (DAS).
    2017
    Lluis Gomez, Marçal Rusiñol, & Dimosthenis Karatzas. (2017). "LSDE: Levenshtein Space Deep Embedding for Query-by-string Word Spotting." In 14th International Conference on Document Analysis and Recognition (ICDAR).
    Raul Gomez, Baoguang Shi, Lluis Gomez, Lukas Numann, Andreas Veit, Jiri Matas, Serge Belongie, & Dismosthenis Karatzas. (2017). "ICDAR2017 Robust Reading Challenge on COCO-Text." In 14th International Conference on Document Analysis and Recognition (ICDAR).
    Masakazu Iwamura, Naoyuki Morimoto, Keishi Tainaka, Dena Bazazian, Lluis Gomez, & Dimosthenis Karatzas. (2017). "ICDAR2017 Robust Reading Challenge on Omnidirectional Video." In 14th International Conference on Document Analysis and Recognition (ICDAR).
    Dimosthenis Karatzas, Lluis Gomez, & Marçal Rusiñol. (2017). "The Robust Reading Competition Annotation and Evaluation Platform." In 1st International Workshop on Open Services and Tools for Document Analysis.
    Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas, & Andrew D. Bagdanov. (2017). "FAST: Facilitated and Accurate Scene Text Proposals through FCN Guided Pruning." Pattern Recognition Letters.
    Lluis Gomez, Yash Patel, Marçal Rusiñol, Dimosthenis Karatzas, & C.V. Jawahar. (2017). "Self-supervised learning of visual features through embedding images into text topic spaces." IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017.
    Lluis Gomez, & Dimosthenis Karatzas. (2017). "TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild." Pattern Recognition.
    Lluis Gomez, Anguelos Nicolaou, & Dimosthenis Karatzas. (2017). "Improving Patch-based Scene Text Script Identification with Ensembles of Conjoined Networks." Pattern Recognition.
    2016
    Lluis Gomez, & Dimosthenis Karatzas. (2016). "A Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene Text Extraction." International Journal on Document Analysis and Recognition.
    Dena Bazazian, Raul Gomez, Anguelos Nicolaou, Lluis Gomez, Dimosthenis Karatzas, & Andrew D. Bagdanov. (2016). "Improving Text Proposals for Scene Images with Fully Convolutional Networks." In Proc. 1st International Workshop on Deep Learning for Pattern Recognition, ICPR Workshops 2016.
    Yash Patel, Lluis Gomez, Marçal Rusiñol, & Dimosthenis Karatzas. (2016). "Dynamic Lexicon Generation for Natural Scene Images." In Proc. 2nd International Workshop on Robust Reading, ECCV Workshops 2016.
    Lluis Gomez, & Dimosthenis Karatzas. (2016). "A Fine Grained Classification Approach to Scene Text Script Identification." In Proc. Document Analysis Systems (DAS), 2016 12th IAPR International Workshop on. IEEE, 2016.
    Anguelos Nicolaou, Lluis Gomez, & Dimosthenis Karatzas. (2016). "Visual Script and Language Identification." In Proc. Document Analysis Systems (DAS), 2016 12th IAPR International Workshop on. IEEE, 2016.
    2015
    Lluis Gomez, & Dimosthenis Karatzas. (2015). "Object Proposals for Text Extraction in the Wild " In Proc. 13th International Conference on Document Analysis and Recognition.
    Dimosthenis Karatzas, Lluis Gomez, A.Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, et al. (2015)."ICDAR 2015 Competition on Robust Reading" In Proc. 13th International Conference on Document Analysis and Recognition (pp. 1156–1160).
    Suman Ghosh, Lluis Gomez, Dimosthenis Karatzas, & Ernest Valveny. (2015). "Efficient indexing for Query By String text retrieval " In Proc. 6th IAPR International Workshop on Camera Based Document Analysis and Recognition (pp. 1236–1240).
    2014
    Dimosthenis Karatzas, Sergi Robles, & Lluis Gomez. (2014). "An on-line platform for ground truthing and performance evaluation of text extraction systems " In Proc. 11th IAPR International Workshop on Document Analysis and Systems (pp. 242–246).
    Lluis Gomez, & Dimosthenis Karatzas. (2014). "MSER-based Real-Time Text Detection and Tracking " In Proc. 22nd International Conference on Pattern Recognition (pp. 3110–3115).
    Lluis Gomez, & Dimosthenis Karatzas. (2014). "Scene Text Recognition: No Country for Old Men?" In Proc. 1st International Workshop on Robust Reading, ACCV Workshops.
    2013
    Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez, Sergi Robles, et al. (2013). "ICDAR 2013 Robust Reading Competition " In Proc. 12th International Conference on Document Analysis and Recognition (pp. 1484–1493).
    Lluis Gomez, & Dimosthenis Karatzas. (2013). "Multi-script Text Extraction from Natural Scenes " In Proc. 12th International Conference on Document Analysis and Recognition (pp. 467–471).

    Code

    If you are looking for code implementations of my research papers, you may be good visiting my GitHub repositories.

    I am an enthusiast of Free Software in general and an advocate of the GNU/Linux project. I have contributed to several Open Source projects, e.g. the Pure Data visual programming language, the OpenCV Computer Vision library, the Giss open broadcast platform, and the FreeJ video mixer.

    I have been selected by the OpenCV Foundation for participation in the Google Summer of Code program as a developer (2013 and 2014 editions) and as a mentor (2016).

    Teaching

    2017/18: M3 - Machine Learning for Computer Vision (MSc. Computer Vision), Invited Lecturer at Universitat Autonoma de Barcelona (UAB).

    2017/18: M5 - Visual Recognition (MSc. Computer Vision), Assistant Professor at Universitat Autonoma de Barcelona (UAB).

    2017/18: Artificial Intelligence (B.S. Interanctive Digital Content), Lecturer at ENTI (School of New Interactive Technologies) - Universitat de Barcelona (UB).

    2016/17: M5 - Visual Recognition (MSc. Computer Vision), Assistant Professor at Universitat Autonoma de Barcelona (UAB).

    2014/15: 43340 - Pattern Recognition (MSc. Computer Engineering), Assistant Professor at Universitat Autonoma de Barcelona (UAB).

    2011/12: Computer Vision (MSc. Visual Arts), Invited Lecturer at Universidad Politécnica de Valencia (UPV).

    Others

    Reviewer for Computer Vision and Image Understanding, International Journal on Document Analysis and Recognition, Neourocomputing, Transactions on Image Processing, Packt Publishing.
    Workshop Chair, International Workshop on Robust Reading (2018).
    Programme Committee member, Document Analisys Systems (2018).
    Programme Committee member, First Workshop on Computer Vision for Fashion, Art and Design (2018).
    Area Chair, International Conference on document Analysis and Recognition (2017).
    Workshop Chair, International Workshop on Camera Based Document Analysis (2017).
    Programme Committee member, International Workshop on Robust Reading (2014, 2016).
    Programme Committee member, Workshop on Camera Based Document Analysis (2013, 2015).
    Member, International Association of Pattern Recognition, 2013–Present.

    I have developed part of my career in the crossroads between Arts and Computer Science. Between 2005 and 2010 I had an engineer position in Hangar, a Visual Arts Production and Research Center in Barcelona, where I was responsible for the area of Software Development in the MediaLab, directly involved in free software development for art projects. During that time I had the opportunity to work with amazingly creative persons like Antoni Abad, Daniel G. Andújar, Ricardo Iglesias, Shu Lea Cheang, Salud Lopez, Straddle3, Simona Levi, Xavi Manzanares, Hackitectura, Denis Roio (a.k.a. Jaromil), minipimer, Oscar Martin, PlayModes, Ramiro Cosentino, Pedro Soler, Sergi Lario, Telenoika, Yves Degoyon, among many others.

    I love music. I play the guitar and recently I started to learn piano in my spare time.

    I love mountains and natural environment in general, when I have time I like to visit Montserrat, Pirineus, or Costa Brava.