Group WebsiteTeachingGoogle Scholar

I am principal investigator (PI) at the ELLIS Institute Tübingen and independent group leader the MPI for Intelligent Systems. I lead the Deep Models and Optimization group, am a lecturer at the University of Tübingen and faculty for CLS, ELLIS, IMPRS-IS PhD Programs. I proudly am a Schmidt Sciences AI2050 Early Career Fellow.

My goal is to improve efficiency and accessibility of deep learning technologies in science and engineering by pioneering new architectures and training techniques grounded in theoretical knowledge. My work encompasses two main areas: understanding the intricacies of large-scale optimization dynamics and designing innovative architectures and powerful optimizers capable of reasoning in complex data. Central to my studies is exploring innovative techniques for decoding patterns in complex sequential data, with implications spanning biology, neuroscience, natural language processing, and music generation.

I received my PhD from the Data Analytics Lab at ETH Zürich under the supervision of Prof. Dr. Thomas Hofmann and Dr. Aurelien Lucchi. Prior to this, I obtained my master’s degree in Robotics, Systems, and Control from ETH. During my PhD, I interned at DeepMind London, Meta (FAIR) Seattle, MILA, and Inria Paris. During my master and PhD, I was involved in several computational systems biology projects at ETH, such as SignalX. I also regularly helped in rare diseases research with bioinformatic analysis of genome sequence data from EEC syndrome patients.

For more details, you can check my curriculum vitae.

In my free time, I travel and read philosophy books. My favorite authors are Kierkegaard, Lévi-Strauss, Meister Eckhart and Nietzsche. I also play a few instruments: I studied cello in Venice/Klagenfurt for more than 10 years, and right now I am (trying to) learn the oboe. I occasionally bring out my transverse flute and acoustic bass.

Recent & Future Talks

July 2025: Invited talk @ ICCOPT 2025, US
March 2025: Lecture @ LOT Spring School, Pisa, IT
March 2025: Talk @ Feuromed, Euromediterranean Economics Festival, IT
January 2025: Invited talk @ Oberwolfach, DE
October 2024: Invited talk @ INRIA Paris, FR
October 2024: Invited talk @ University of Basel, CH
October 2024: Invited talk @ Suvrit Sra Lab in TUM, DE
September 2024: Keynote at ML4ITS2024 @ ECML PKDD 2024, Lithuania
April 2024: Invited talk @ the mathematics of data streams workshop, DE
April 2024: University of Michigan, US
March 2024: AWS AI Fundamental Research Reading Group, US
March 2024: AstraZeneca Centre for AI, UK
Jan 2024: EPFL, CH
Dec 2023: Oxford Department of Statistics, UK
Nov 2023: Google DeepMind UK
Nov 2023: Inria Paris, FR
Nov 2023: Tübingen AI Center, DE

Preprints

Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations, 2025
S. Movahedi*, F. Sarnthein*, N. Muca Cirone, A. Orvieto

Generalized Interpolating Discrete Diffusion, 2025
D. von Rutte, J. Fluri, Y. Ding, A. Orvieto, B. Scholkopf, T. Hofmann

An Uncertainty Principle for Linear Recurrent Neural Networks, 2025
Alexandre Francois, Antonio Orvieto, Francis Bach

When, Where and Why to Average Weights, 2025
N. Ajroldi, A. Orvieto, J. Geiping

Towards Robust and Principled Processing of Point Clouds With SSMs, 2024
N. Köprücü, D. Okpekpe, A. Orvieto

An Adaptive Method with Non-negative Gauss-Newton Stepsizes, 2024
A. Orvieto, L. Xiao

Gradient Descent on Logistic Regression w/ Non-Separable Data and Large Steps, 2024
Si Yi Meng, A. Orvieto, D. Y. Cao, C. De Sa

Publications (Conference/Journal)

Adaptive Methods through the Lens of SDEs, ICLR 2024
E. Monzio Compagnoni, T. Liu, R. Islamov, F. N. Proske, A. Orvieto, A. Lucchi

Geometric Inductive Biases of Deep Networks: Role of Data and Architecture, ICLR 2024
S. Movahedi, A. Orvieto, S. Moosavi-Dezfooli

A Novel Approach to Loss Landscape Characterization without Over-Parametrization, NeurIPS 2024.
R. Islamov, N. Ajroldi, A. Orvieto, A. Lucchi

Theoretical Foundations of Deep Selective State-Space Models, NeurIPS 2024.
N. Muca Cirone, A. Orvieto, B. Walker, C. Salvi, T. Lyons

RNNs: Vanishing gradients are not the end of the story , NeurIPS 2024.
N. Zucchet, A. Orvieto

Understanding the differences in Foundation Models, NeurIPS 2024.
J. Sieber, C. Amo Alonso, A. Didier, M. N. Zeilinger, A. Orvieto

Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning, NeurIPS 2024.
A. Meterez, L. Noci, T. Hofmann, A. Orvieto

Universality of Linear Recurrences Followed by Nonlinear Projections, ICML 2024.
A. Orvieto, S. De, C. Gulcehre, R. Pascanu, S. L. Smith

Recurrent Distance-Encoding Neural Networks
for Graph Representation Learning
, ICML 2024.
Y. Ding, A. Orvieto, B. He, T. Hofmann

An Accelerated Lyapunov Function for Polyak’s Heavy-Ball on Convex Quadratics,
Optimization Letters, 2024
A. Orvieto

SDEs for Minimax Optimization, AISTATS 2024
E. Monzio Compagnoni, A.Orvieto, H.Kersting, F. Proske, A. Lucchi

Resurrecting Recurrent Neural Networks for Long Sequences, ICML 2023 (Oral)
A. Orvieto, S. L Smith, A. Gu, A. Fernando, C. Gulcehre, R. Pascanu, S. De

An SDE for Modeling SAM: Theory and Insights, ICML 2023
E. Monzio Compagnoni, L. Biggio, A. Orvieto, H. Kersting, F. N. Proske, A. Lucchi

Mean First Exit Times of Ornstein-Uhlenbeck Processes in High Dimensions,
Journal of Physics A: Mathematical and Theoretical, 2023
H. Kersting, A. Orvieto, F. Proske, A. Lucchi

Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning, CVPR, 2023
S. Kim, L. Noci, A. Orvieto, T. Hofmann

Explicit Regularization in Overparametrized Models via Noise Injection, AISTATS, 2023
A. Orvieto*, A. Raj*, H. Kersting*, F. Bach

On the Effectiveness of Randomized Signatures as Reservoir for Learning Rough Dynamics, IJCNN, 2023.
E. Monzio Compagnoni, A. Scampicchio, L. Biggio, A. Orvieto, T. Hofmann, J. Teichmann

On the Theoretical Properties of Noise Correlation in SGD, NeurIPS, 2022
H. Kersting, A. Orvieto, F. Bach, F. Proske, A. Lucchi

Signal Propagation in Transformers: Theoretical Perspectives
and the Role of Rank Collapse
, NeurIPS, 2022
L. Noci*, S. Anagnostidis*, L. Biggio*, A. Orvieto*, S. Pal Singh*, A. Lucchi

Dynamics of SGD with Stochastic Polyak Stepsizes:
Truly Adaptive Variants and Convergence to Exact Solution
, NeurIPS, 2022.
A. Orvieto, S. Lacoste-Julien, N. Loizou

Analysis of Pharmacological Modulation of Senescence in Human Epithelial Stem Cells
Journal of Cellular and Molecular Medicine, 2022.
V. Barbaro, A. Orvieto, et al.

Anticorrelated Noise Injection for Improved Generalization, ICML, 2022.
A. Orvieto*, H. Kersting*, F. Proske, F. Bach, A. Lucchi

Faster Single-loop Algorithms for Minimax Opt. without Strong Concavity, AISTATS, 2022.
J. Yang, A. Orvieto, A. Lucchi, N. He

Vanishing Curvature in Randomly Initialized Deep ReLU Networks, AISTATS, 2022.
A. Orvieto*, J. Kohler*, D. Pavllo, T. Hofmann, A. Lucchi

On the Second-order Convergence of Random Search Methods, NeurIPS, 2021.
A. Lucchi*, A. Orvieto*, Adamos Solomou*

Rethinking the Variational Interpretation of Nesterov’s Method, NeurIPS, 2021.
P. Zhang*, A. Orvieto*, Hadi Daneshmand

Learning explanations that are hard-to-vary, ICLR, 2021.
G. Parascandolo*, A. Neitz*, A. Orvieto, L. Gresele, B. Schölkopf

Revisiting the Role of Symplectic Numerical Integration on Acceleration and Stability in Convex Optimization, AISTATS, 2021.
P. Zhang, A. Orvieto, H. Daneshmand, R. Smith, T. Hofmann

Momentum Improves Optimization on Riemannian Manifolds, AISTATS, 2021.
F. Alimisis, A. Orvieto, G. Becigneul, A. Lucchi

An Accelerated DFO Algorithm for Finite-sum Convex Functions, ICML, 2020.
C. Yuwen, A. Orvieto, A. Lucchi

Continuous-time Acceleration in Riemannian Optimization, AISTATS, 2020.
F. Alimisis, A. Orvieto, G. Becigneul, A. Lucchi

Shadowing Properties of Optimization Algorithms, NeurIPS, 2019.
A. Orvieto, A. Lucchi

Continuous-time Models for Stochastic Optimization Algorithms, NeurIPS, 2019.
A. Orvieto, A. Lucchi

The Role of Memory in Stochastic Optimization, UAI, 2019.
A. Orvieto, J. Kohler, A. Lucchi

Workshops

On the low-shot transferability of [V]-Mamba, CVPR 2024 PV Workshop
D. Misra, J. Gala, A. Orvieto

Escaping Random Teacher Initialization Enhances Signal Propagation and Representations, NeurIPS 3ML Workshop, 2023.
F. Sarnthein, S. Pal Singh, A. Orvieto, T. Hofmann

On the Advantage of Lion Compared to signSGD with Momentum, ICML High-dimensional Learning Dynamics Workshop, 2023.
A. Noiato, A. Orvieto

A New Adaptive Method for Minimizing Non-negative Losses, ICML High-dimensional Learning Dynamics Workshop, 2023.
A. Orvieto, L. Xiao

Batch-size Selection by Stochastic Optimal Control, NeurIPS HITY Workshop, 2022.
J. Zhao, A. Lucchi, F. N. Proske, A. Orvieto, H. Kersting

Achieving a Better Stability-Plasticity Tradeoff in Continual Learning, NeurIPS MetaLearn Workshop, 2022.
S. Kim, L. Noci, A. Orvieto, T. Hofmann

Should you follow the gradient flow? ICML Continuous-time Perspectives workshop, 2022.
Xiang Li · Antonio Orvieto

Enhancing Unit-Tests for Invariance Discovery, ICML Spurious Correlations workshop, 2022.
P. De Bartolomeis, A. Orvieto, G. Parascandolo

Empirics on the expressiveness of Randomized Signature, NeurIPS DLDE workshop, 2021.
E. Monzio Compagnoni, L. Biggio, A. Orvieto

Two-Level K-FAC Preconditioning for Deep Learning, NeurIPS OPT workshop, 2020.
N. Tselepidis, J. Kohler, A. Orvieto

Patents
Setting Method For Threaded Connection by means of Impact Wrench,
Inventors: M. Alberding, D. Bralla, A. Orvieto
Current Assignee: Hilti AG
European Patent Office, 2019, Publication number: 3501740

“Verum, sine mendacio certum et verissimum, quod est inferius, est sicut quod est superius, et quod est superius, est sicut quod est inferius: ad perpetranda miracula rei unius.”

Hermes Trismegistus, Emerald Table

“He studied the leaves of the tiny plant; how daintily, with what strange intelligence they were arranged around the stem. Virgil’s verses were beautiful, and he loved them; still, there was more than one verse in Virgil that was not half as clear and intelligent, beautiful and meaningful as the spiraled order of those tiny leaves climbing the stem. What pleasure, what ecstasy, what a delightful, noble, meaningful task it would be for a man to be able to create just one such flower! But no man was able to do that—no hero, no emperor, no pope or saint!”

Narcissus and Goldmund, Hermann Hesse

“While the heart beats, bruise it–it is your only opportunity; while the eye can still turn towards you with moist, timid entreaty, freeze it with an icy unanswering gaze; while the ear, that delicate messenger to the inmost sanctuary of the soul, can still take in the tones of kindness, put it off with hard civility, or sneering compliment, or envious affectation of indifference; while the creative brain can still throb with the sense of injustice, with the yearning for brotherly recognition–make haste–oppress it with your ill-considered judgements, your trivial comparisons, your careless misrepresentations. The heart will by and by be still–“ubi saeva indignatio ulterius cor lacerare nequit“; the eye will cease to entreat; the ear will be deaf; the brain will have ceased from all wants as well as from all work. Then your charitable speeches may find vent; then you may remember and pity the toil and the struggle and the failure; then you may give due honour to the work achieved; then you may find extenuation for errors, and may consent to bury them.”

George Eliot, The Lifted Veil

“And so proceed, and do not be afraid, without considering whether this is right, lest you take false steps. For if a painter, having to give the first stroke of his pen, were to consider all the others, he would conclude nothing.”

Meister Eckhart, Gott hat die Armen (sorry for the bad translation)