More Publications

Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets.
Presented at 37th Conference on Neural Information Processing Systems (NeurIPS), 2023.

Link PDF

Model-based Offline Reinforcement Learning with Local Misspecification.
Oral at 37th AAAI Conference on Artificial Intelligence (AAAI), 2023.

Link PDF

Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data.
Presented at 36th Conference on Neural Information Processing Systems (NeurIPS).
Oral at AAAI 2023 Workshop on Reinforcement Learning Ready for Production, 2023.

Link PDF Slides

Offline Policy Optimization with Eligible Actions.
Presented at 38th Conference on Uncertainty in Artificial Intelligence (UAI), 2022.

Link PDF

SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics.
Presented at 5th Conference on Reinforcement Learning and Decision Making (RLDM), 2022.

Link PDF PDF (extended version)

Sample-Efficient Deep Reinforcement Learning for Control, Exploration and Safety.
PhD Thesis, 2021.

Link PDF

Adversarially Guided Actor-Critic.
Presented at 9th International Conference on Learning Representations (ICLR), 2021.

Link PDF Slides

Learning Value Functions in Deep Policy Gradients using Residual Variance.
Presented at 9th International Conference on Learning Representations (ICLR), 2021.

Link PDF Slides

Only Relevant Information Matters: Filtering Out Noisy Samples to Boost RL.
Presented at 29th International Joint Conference on Artificial Intelligence (IJCAI), 2020.

Link PDF

Temperature Decreases Spread Parameters of the New Covid-19 Case Dynamics.
Biology, 9(5), p.94, 2020.

Link PDF

Invited Talks

Efficient Actor-Critics under the Prism of Variance
July, 2021
Oral & Panel Discussion: Do we control the algorithms we create?
November, 2019
Improving Policy Gradient Updates with MERL and SAUNA
October, 2019
Deep Reinforcement Learning at Scale
April, 2019
QA and Deep Learning for Language Understanding
November, 2017

Selected Software


A Reinforcement Learning Library for Research and Education (PyTorch)


AGAC: Adversarially Guided Actor-Critic (PyTorch & TensorFlow)


AVEC: Actor with Variance Estimated Critic (TensorFlow)


Materials for the Reinforcement Learning Summer School 2019: Bandits, RL & Deep RL (PyTorch)


Reinforcement Learning - Fall 2019 - MVA - ENS Paris-Saclay

Teaching Assistant
Instructors: Alessandro Lazaric, Matteo Pirotta

Reinforcement Learning Summer School 2019

Teaching Assistant
Instructors: Felix Berkenkamp, Tristan Cazenave, Ludovic Denoyer, Gabriel Dulac-Arnold, Audrey Durand, Vincent François-Lavet, Matteo Hessel, Emilie Kaufmann, Marc Lanctot, Max Lapan, Alessandro Lazaric, Odalric-Ambrym Maillard, Jérémie Mary, Gerhard Neumann, Guillaume Obozinski, Olivier Pietquin, Bilal Piot, Matteo Pirotta, Bruno Scherrer, Florian Strub, Eleni Vasilaki, Oriol Vinyals

Professional Experience

January 2022 – Present
Palo Alto, CA, USA

Postdoctoral Scholar

Stanford University

November 2017 – October 2018
Nantes, FR

Machine Learning Engineer


Designed and executed product-focused research agendas, which led to building a conversational model for human/machine interface using deep learning.
August 2017 – November 2017
Copenhagen, DK

Research Assistant


Research work at DTU Compute laboratory focusing on deep convolutional neural network models for image classification and generative adversarial network models for image generation from a mixture of human artworks and photographs.
March 2017 – August 2017
Copenhagen, DK

Machine Learning Researcher

Soply (part-time during MSc)

Defined with the co-founders a roadmap for ML projects in the company, which led to building a system to recommend artists according to their photographic style and three artworks classification models (content, style & type) in collaboration with the National Gallery of Denmark.
November 2015 – January 2017
Copenhagen, DK

Machine Learning Engineer

EasyTranslate (part-time during MSc)

Several research projects in collaboration with the product team including a seq2seq machine translation model for specialized text and a recommendation system for human translators using LDA models trained on Wikipedia, deployed on AWS.


Projects, Summer Schools, etc.