Roman Bachmann – EPFL PhD Student in Computer Vision

Hi! I'm Roman Bachmann, an EPFL PhD student at VILAB, where I'm advised by Amir Zamir. My current research is focused on building scalable multi-modal foundation models for world modelling and visual reasoning. My goal is to build adaptable world priors that enable quick understanding of the environment and allow for global, out-of-sight reasoning.

I received my M.Sc. degree in Data Science at EPFL, where I also previously completed my B.Sc. in Computer Science. Previously I interned as a research scientist at Apple and RIKEN AIP.

Publications

2025

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Roman Bachmann*, Jesse Allardice*, David Mizrahi*, Enrico Fini, Oğuzhan Fatih Kar, Elmira Amirloo, Alaaeldin El-Nouby, Amir Zamir, Afshin Dehghan

arXiv 2025

PDF Website Code

2024

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Roman Bachmann*, Oguzhan Fatih Kar*, David Mizrahi*, Ali Garjani, Mingfei Gao, David Griffiths, Jiaming Hu, Afshin Dehghan, Amir Zamir

NeurIPS 2024

PDF Website Code

ViPer: Visual Personalization of Generative Models via Individual Preference Learning

ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Sogand Salehi, Mahdi Shafiei, Teresa Yeo, Roman Bachmann, Amir Zamir

ECCV 2024

PDF Website Code

2023

4M: Massively Multimodal Masked Modeling

4M: Massively Multimodal Masked Modeling

David Mizrahi*, Roman Bachmann*, Oguzhan Fatih Kar, Teresa Yeo, Mingfei Gao, Afshin Dehghan, Amir Zamir

NeurIPS 2023

Spotlight

PDF Website Code

Modality-invariant Visual Odometry for Embodied Vision

Modality-invariant Visual Odometry for Embodied Vision

Marius Memmel, Roman Bachmann, Amir Zamir

2023 Conference on Computer Vision and Pattern Recognition

PDF Website Code

2022

MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders

Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir

2022 European Conference on Computer Vision

PDF Website Code

CLIPasso: Semantically-Aware Object Sketching

CLIPasso: Semantically-Aware Object Sketching

Yael Vinker, Ehsan Pajouheshgar, Jessica Y. Bo, Roman Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, Ariel Shamir

SIGGRAPH 2022

Selected as one of the five Best Paper Award winners

PDF Website Code

2021

Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans

Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans

Ainaz Eftekhar*, Alexander Sax*, Roman Bachmann, Jitendra Malik, Amir Zamir

2021 International Conference on Computer Vision

PDF Website

2020

Training Binary Neural Networks using the Bayesian Learning Rule

Training Binary Neural Networks using the Bayesian Learning Rule

Xiangming Meng, Roman Bachmann, and Mohammad Emtiyaz Khan

2020 International Conference on Machine Learning

PDF Code

2019

Motion Capture from Pan-Tilt Cameras with Unknown Orientation

Motion Capture from Pan-Tilt Cameras with Unknown Orientation

Roman Bachmann, Jörg Spörri, Pascal Fua, and Helge Rhodin

2019 International Conference on 3D Vision

Selected for oral presentation

PDF Dataset

Global Motion Estimation from Pan-Tilt Cameras

Global Motion Estimation from Pan-Tilt Cameras

Roman Bachmann, Helge Rhodin, and Pascal Fua

2019 Central European Seminar on Computer Graphics (non-peer-reviewed)

Voted best presentation and third best paper

PDF Dataset

Automatic 3D motion capture in alpine skiing using deep learning and computer vision

Automatic 3D motion capture in alpine skiing using deep learning and computer vision

Roman Bachmann, Helge Rhodin, Jörg Spörri, and Pascal Fua

8th Int. Congress on Science and Skiing (non-peer-reviewed)

Won first place in the Young Investigator Award competition

PDF Dataset