Bio

I study how machines learn to act in the world.

A decade ago, that meant building Elektra, an autonomous vehicle at the Universitat Autonoma de Barcelona, where I led the perception system: object detection, semantic segmentation, 3D reconstruction, and SLAM. Today, it means leading a research team that builds AI agents capable of navigating enterprise software, understanding multimodal documents, and reasoning over complex tasks.

I am Research Director of the Foundational AI Research (FAR) team at ServiceNow, where I lead a group working at the frontier of AI agents, AI security, multimodal learning, and data analytics. Our work includes BrowserGym (an open environment for evaluating web agents), WorkArena (a benchmark for knowledge work tasks, published at ICML), BigDocs (large scale multimodal document understanding), and the Apriel family of open models. We publish regularly at NeurIPS, ICLR, ICML, CVPR, ACL, and EMNLP.

In parallel, I hold an Adjunct Professor appointment at Polytechnique Montreal, affiliated with MILA (the Quebec AI Institute), where I co supervise graduate students. I am also an Adjunct Professor at the Universitat Autonoma de Barcelona (UAB), where I teach in the Master’s in Computer Vision and AI. My academic work currently focuses on multimodal AI tools for underrepresented languages, including an NSERC funded project to build translation and literacy tools for the Matsigenka and Inuktitut communities.

Before ServiceNow, I was a researcher at Element AI (acquired by ServiceNow in 2021). Before that, I completed postdoctoral fellowships at the Computer Vision Center in Barcelona and at MILA under Aaron Courville, supported by a Marie Curie International Outgoing Fellowship. I am a member of ELLIS (the European Laboratory for Learning and Intelligent Systems) and have accumulated over 13,000 citations across my research portfolio.

I build teams that produce rigorous, open science with real industry impact. Alumni of my research group hold positions at Meta AI, Google, Apple, NUS, UC Berkeley, NVIDIA, and leading AI startups worldwide. If you are a researcher or engineer excited about agents, multimodal AI, or benchmarking, my team is frequently hiring. Reach out.

🤖

AI Agents

Building autonomous agents that navigate complex software environments, from web browsers to enterprise platforms. Key projects: BrowserGym, WorkArena, StarUI.

📄

Multimodal AI

Training models that jointly reason over text, images, charts, and documents. Key projects: AlignVLM, BigDocs, BigCharts, StarVector.

🚗

Autonomous Driving

Pioneered the use of synthetic data and domain adaptation for visual perception in self-driving vehicles. Key projects: SYNTHIA, Elektra, V-AYLA.

🔬

Open Models & Benchmarks

Creating open-source models, datasets, and evaluation frameworks that advance reproducible research. Key projects: Apriel, GEO-Bench, InsightBench, Synbols.

🧠

Continual & Few-Shot Learning

Developing methods that learn efficiently from limited data and adapt without forgetting. Key projects: OSAKA, Sequoia, MILe.

🌍

AI for Social Good

Applying AI to environmental monitoring, healthcare, and tools for underrepresented languages including Matsigenka and Inuktitut.

No results found

13K+

140+

4

15+

Bio