Hi! I'm Roman Bachmann, a Machine Learning Research Scientist at Apple. My research is focused on building scalable any-to-any multimodal foundation models for world modelling and visual reasoning. My goal is to build adaptable world priors that enable quick understanding of the environment and allow for global, out-of-sight reasoning.

Previously I was an EPFL PhD student at VILAB, advised by Amir Zamir. I received my M.Sc. degree in Data Science at EPFL, where I also completed my B.Sc. in Computer Science. During my studies, I interned as a research scientist at Apple and RIKEN AIP.

2025


How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

Rahul Ramachandran, Ali Garjani, Roman Bachmann, Andrei Atanov*, Oğuzhan Fatih Kar*, Amir Zamir*

ICLR 2026

How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Roman Bachmann*, Jesse Allardice*, David Mizrahi*, Enrico Fini, Oğuzhan Fatih Kar, Elmira Amirloo, Alaaeldin El-Nouby, Amir Zamir, Afshin Dehghan

ICML 2025

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

2024


4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Roman Bachmann*, Oguzhan Fatih Kar*, David Mizrahi*, Ali Garjani, Mingfei Gao, David Griffiths, Jiaming Hu, Afshin Dehghan, Amir Zamir

NeurIPS 2024

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Sogand Salehi, Mahdi Shafiei, Teresa Yeo, Roman Bachmann, Amir Zamir

ECCV 2024

ViPer: Visual Personalization of Generative Models via Individual Preference Learning

2023


4M: Massively Multimodal Masked Modeling

David Mizrahi*, Roman Bachmann*, Oguzhan Fatih Kar, Teresa Yeo, Mingfei Gao, Afshin Dehghan, Amir Zamir

NeurIPS 2023

★ Spotlight

4M: Massively Multimodal Masked Modeling

Modality-invariant Visual Odometry for Embodied Vision

Marius Memmel, Roman Bachmann, Amir Zamir

2023 Conference on Computer Vision and Pattern Recognition

Modality-invariant Visual Odometry for Embodied Vision

2022


MultiMAE: Multi-modal Multi-task Masked Autoencoders

Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir

2022 European Conference on Computer Vision

MultiMAE: Multi-modal Multi-task Masked Autoencoders

CLIPasso: Semantically-Aware Object Sketching

Yael Vinker, Ehsan Pajouheshgar, Jessica Y. Bo, Roman Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, Ariel Shamir

SIGGRAPH 2022

★ Best Paper Award winner

CLIPasso: Semantically-Aware Object Sketching

2021


Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans

Ainaz Eftekhar*, Alexander Sax*, Roman Bachmann, Jitendra Malik, Amir Zamir

2021 International Conference on Computer Vision

Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans

2020


Training Binary Neural Networks using the Bayesian Learning Rule

Xiangming Meng, Roman Bachmann, and Mohammad Emtiyaz Khan

2020 International Conference on Machine Learning

Training Binary Neural Networks using the Bayesian Learning Rule

2019


Motion Capture from Pan-Tilt Cameras with Unknown Orientation

Roman Bachmann, Jörg Spörri, Pascal Fua, and Helge Rhodin

2019 International Conference on 3D Vision

★ Selected for oral presentation

Motion Capture from Pan-Tilt Cameras with Unknown Orientation

Global Motion Estimation from Pan-Tilt Cameras

Roman Bachmann, Helge Rhodin, and Pascal Fua

2019 Central European Seminar on Computer Graphics (non-peer-reviewed)

★ Voted best presentation and third best paper

Global Motion Estimation from Pan-Tilt Cameras

Automatic 3D motion capture in alpine skiing using deep learning and computer vision

Roman Bachmann, Helge Rhodin, Jörg Spörri, and Pascal Fua

8th Int. Congress on Science and Skiing (non-peer-reviewed)

★ Won first place in the Young Investigator Award competition

Automatic 3D motion capture in alpine skiing using deep learning and computer vision