About me · Yury Belousov

I’m a PhD student at the University of Geneva advised by Slava Voloshynovskiy. Previously, I graduated with distinction from Higher School of Economics in 2021 with a master’s degree in Data Science and did my undergrad at Saint Petersburg State University. Before joining Stochastic Information Processing group, I was part of JetBrains Research.
I’m supported by the Swiss National Science Foundation.

Research interests

I am interested in deep learning in general.
My current research focuses on the following topics:

Computer vision
Generative modelling
Adversarial attacks and defenses
Machine learning for anti-counterfeiting and brand protection

and some of my previous research topics:

Black-box optimization
Reinforcement learning, simulation to reality transfer

Teaching

Teaching assistant for:

Data Science (14X026) — Fall 2024, 2023, 2022, 2021
Multimedia Security and Privacy (14x016) — Spring 2025, 2024, 2023, 2022

Awards and achievements

NeurIPS 2020 Competition Track challenges:
- Third place in the Black box optimization challenge
- Third place in the AI Driving Olympics (AI-DO) challenge
NeurIPS 2021 Competition Track challenges:
- Fourth place in the MineRL Diamond Competition
NeurIPS 2022 Competition Track challenges:
- Third place in the Weather4cast Competition — Super-Resolution Rain Movie Prediction under Spatio-temporal Shifts
- Second place in the Second AmericasNLP Competition on Speech-to-Text Translation
All-Russian Student Olympics “Ya — Professional”:
- Silver medal in the category of “Programming and Information Technologies” (2021)
- Winner in the category “Artificial Intelligence” (2021, 2020, 2019), “Software Engineering” (2021, 2019), “Business Informatics” (2021), “Big data” (2020, 2019), “Programming and Information Technologies” (2020), “Internet of Things” (2019)
Won eight hackathons

Publications

Selected publications, for a full list please see my Google Scholar profile:

Task-Agnostic Attacks Against Vision Foundation Models

📍 Computer Vision and Pattern Recognition Conference (CVPR) Workshops 2025

CVPR 2025

📄 Paper arXiv 📋 Poster

Abstract

The study of security in machine learning mainly focuses on downstream task-specific attacks, where the adversarial example is obtained by optimizing a loss function specific to the downstream task. At the same time, it has become standard practice for machine learning practitioners to adopt publicly available pre-trained vision foundation models, effectively sharing a common backbone architecture across a multitude of applications such as classification, segmentation, depth estimation, retrieval, question-answering and more. The study of attacks on such foundation models and their impact to multiple downstream tasks remains vastly unexplored. This work proposes a general framework that forges task-agnostic adversarial examples by maximally disrupting the feature representation obtained with foundation models. We extensively evaluate the security of the feature representations obtained by popular vision foundation models by measuring the impact of this attack on multiple downstream tasks and its transferability between models.

Robustness Tokens: Towards Adversarial Robustness of Transformers

📍 European Conference on Computer Vision (ECCV) 2024

ECCV 2024

📄 Paper arXiv 🛠️ Code 📋 Poster

Abstract

Recently, large pre-trained foundation models have become widely adopted by machine learning practitioners for a multitude of tasks. Given that such models are publicly available, relying on their use as backbone models for downstream tasks might result in high vulnerability to adversarial attacks crafted with the same public model. In this work, we propose Robustness Tokens, a novel approach specific to the transformer architecture that fine-tunes a few additional private tokens with low computational requirements instead of tuning model parameters as done in traditional adversarial training. We show that Robustness Tokens make Vision Transformer models significantly more robust to white-box adversarial attacks while also retaining the original downstream performances.

Beyond Classification: Evaluating Diffusion Denoised Smoothing for Security-Utility Trade off

📍 European Signal Processing Conference (EUSIPCO) 2025

EUSIPCO 2025

arXiv 🛠️ Code

Abstract

While foundation models demonstrate impressive performance across various tasks, they remain vulnerable to adversarial inputs. Current research explores various approaches to enhance model robustness, with Diffusion Denoised Smoothing emerging as a particularly promising technique. This method employs a pretrained diffusion model to preprocess inputs before model inference. Yet, its effectiveness remains largely unexplored beyond classification. We aim to address this gap by analyzing three datasets with four distinct downstream tasks under three different adversarial attack algorithms. Our findings reveal that while foundation models maintain resilience against conventional transformations, applying high-noise diffusion denoising to clean images without any distortions significantly degrades performance by as high as 57%. Low-noise diffusion settings preserve performance but fail to provide adequate protection across all attack types. Moreover, we introduce a novel attack strategy specifically targeting the diffusion process itself, capable of circumventing defenses in the low-noise regime. Our results suggest that the trade-off between adversarial robustness and performance remains a challenge to be addressed.

Stochastic Digital Twin for Copy Detection Patterns

📍 IEEE International Workshop on Information Forensics and Security (WIFS) 2023

WIFS 2023

📄 Paper arXiv 🛠️ Code 🪧 Slides

Abstract

Copy detection patterns (CDP) present an efficient technique for product protection against counterfeiting. However, the complexity of studying CDP production variability often results in time-consuming and costly procedures, limiting CDP scalability. Recent advancements in computer modelling, notably the concept of a “digital twin” for printing-imaging channels, allow for enhanced scalability and the optimization of authentication systems. Yet, the development of an accurate digital twin is far from trivial. This paper extends previous research which modelled a printing-imaging channel using a machine learning-based digital twin for CDP. This model, built upon an information-theoretic framework known as “Turbo”, demonstrated superior performance over traditional generative models such as CycleGAN and pix2pix. However, the emerging field of Denoising Diffusion Probabilistic Models (DDPM) presents a potential advancement in generative models due to its ability to stochastically model the inherent randomness of the printing-imaging process, and its impressive performance in image-to-image translation tasks. This study aims at comparing the capabilities of the Turbo framework and DDPM on the same CDP datasets, with the goal of establishing the real-world benefits of DDPM models for digital twin applications in CDP security. Furthermore, the paper seeks to evaluate the generative potential of the studied models in the context of mobile phone data acquisition. Despite the increased complexity of DDPM methods when compared to traditional approaches, our study highlights their advantages and explores their potential for future applications.

A machine learning-based digital twin for anti-counterfeiting applications with copy detection patterns

📍 IEEE Transactions on Information Forensics and Security (Volume: 19)

TIFS 2024

📄 Paper arXiv 🛠️ Code

Abstract

In this paper, we present a new approach to model a printing-imaging channel using a machine learning-based “digital twin” for copy detection patterns (CDP). The CDP are considered as modern anti-counterfeiting features in multiple applications. Our digital twin is formulated within the information-theoretic framework of TURBO initially developed for high energy physics simulations, using variational approximations of mutual information for both encoder and decoder in the bidirectional exchange of information. This model extends various architectural designs, including paired pix2pix and unpaired CycleGAN, for image-to-image translation. Applicable to any type of printing and imaging devices, the model needs only training data comprising digital templates sent to a printing device and data acquired by an imaging device. The data can be paired, unpaired, or hybrid, ensuring architectural flexibility and scalability for multiple practical setups. We explore the influence of various architectural factors, metrics, and discriminators on the overall system’s performance in generating and predicting printed CDP from their digital versions and vice versa. We also performed a comparison with several state-of-the-art methods for image-to-image translation applications.

Digital twins of physical printing-imaging channel

📍 IEEE International Workshop on Information Forensics and Security (WIFS) 2022

WIFS 2022

📄 Paper arXiv 🛠️ Code 🪧 Slides

Abstract

In this paper, we address the problem of modeling a printing-imaging channel built on a machine learning approach a.k.a. digital twin for anti-counterfeiting applications based on copy detection patterns (CDP). The digital twin is formulated on an information-theoretic framework called Turbo that uses variational approximations of mutual information developed for both encoder and decoder in a two-directional information passage. The proposed model generalizes several state-of-the-art architectures such as adversarial autoencoder (AAE), CycleGAN and adversarial latent space autoencoder (ALAE). This model can be applied to any type of printing and imaging and it only requires training data consisting of digital templates or artworks that are sent to a printing device and data acquired by an imaging device. Moreover, these data can be paired, unpaired or hybrid paired-unpaired which makes the proposed architecture very flexible and scalable to many practical setups. We demonstrate the impact of various architectural factors, metrics and discriminators on the overall system performance in the task of generation/prediction of printed CDP from their digital counterparts and vice versa. We also compare the proposed system with several state-of-the-art methods used for image-to-image translation applications.

Solving the Weather4cast Challenge via Visual Transformers for 3D Images

📍 NeurIPS 2022 Competition Track Workshop

NeurIPS 2022

📄 Paper 🛠️ Code 🪧 Slides

Abstract

Accurately forecasting the weather is an important task, as many real-world processes and decisions depend on future meteorological conditions. The NeurIPS 2022 challenge entitled Weather4cast poses the problem of predicting rainfall events for the next eight hours given the preceding hour of satellite observations as a context. Motivated by the recent success of transformer-based architectures in computer vision, we implement and propose two methodologies based on this architecture to tackle this challenge. We find that ensembling different transformers with some baseline models achieves the best performance we could measure on the unseen test data. Our approach has been ranked 3rd in the competition.

Weather4cast at NeurIPS 2022: Super-Resolution Rain Movie Prediction under Spatio-temporal Shifts

📍 NeurIPS 2022 Competition Track

NeurIPS 2022

📄 Paper

Abstract

Weather4cast again advanced modern algorithms in AI and machine learning through a highly topical interdisciplinary competition challenge: The prediction of hi-res rain radar movies from multi-band satellite sensors, requiring data fusion, multi-channel video frame prediction, and super-resolution. Accurate predictions of rain events are becoming ever more critical, with climate change increasing the frequency of unexpected rainfall. The resulting models will have a particular impact where costly weather radar is not available. We here present highlights and insights emerging from the thirty teams participating from over a dozen countries. To extract relevant patterns, models were challenged by spatio-temporal shifts. Geometric data augmentation and test-time ensemble models with a suitable smoother loss helped this transfer learning. Even though, in ablation, static information like geographical location and elevation was not linked to performance, the general success of models incorporating physics in this competition suggests that approaches combining machine learning with application domain knowledge seem a promising avenue for future research. Weather4cast will continue to explore the powerful benchmark reference data set introduced here, advancing competition tasks to quantitative predictions, and exploring the effects of metric choice on model performance and qualitative prediction properties.

Findings of the Second AmericasNLP Competition on Speech-to-Text Translation

📍 NeurIPS 2022 Competition Track

NeurIPS 2022

📄 Paper

Abstract

Indigenous languages, including those from the Americas, have received very little attention from the machine learning (ML) and natural language processing (NLP) communities. To tackle the resulting lack of systems for these languages and the accompanying social inequalities affecting their speakers, we conduct the second AmericasNLP competition (and the first one in collaboration with NeurIPS), which is centered around speech-to-text translation systems for Indigenous languages of the Americas. The competition features three tasks – (1) automatic speech recognition, (2) text-based machine translation, and (3) speech-to-text translation – and two tracks: constrained and unconstrained. Five Indigenous languages are covered: Bribri, Guarani, Kotiria, Wa’ikhana, and Quechua. In this overview paper, we describe the tasks, tracks, and languages, introduce the baseline and participating systems, and end with a summary of ongoing and future challenges for the automatic translation of Indigenous languages.

#10

MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned

📍 NeurIPS 2021 Competitions and Demonstrations Track

NeurIPS 2021

📄 Paper arXiv

Abstract

Reinforcement learning competitions advance the field by providing appropriate scope and support to develop solutions toward a specific problem. To promote the development of more broadly applicable methods, organizers need to enforce the use of general techniques, the use of sample-efficient methods, and the reproducibility of the results. While beneficial for the research community, these restrictions come at a cost—increased difficulty. If the barrier for entry is too high, many potential participants are demoralized. With this in mind, we hosted the third edition of the MineRL ObtainDiamond competition, MineRL Diamond 2021, with a separate track in which we permitted any solution to promote the participation of newcomers. With this track and more extensive tutorials and support, we saw an increased number of submissions. The participants of this easier track were able to obtain a diamond, and the participants of the harder track progressed the generalizable solutions in the same task.

#11

Solving Black-Box Optimization Challenge via Learning Search Space Partition for Local Bayesian Optimization

📍 NeurIPS 2020 Competition and Demonstration Track

NeurIPS 2020

📄 Paper arXiv 🛠️ Code 🪧 Slides

Abstract

Black-box optimization is one of the vital tasks in machine learning, since it approximates real-world conditions, in that we do not always know all the properties of a given system, up to knowing almost nothing but the results. This paper describes our approach to solving the black-box optimization challenge at NeurIPS 2020 through learning search space partition for local Bayesian optimization. We describe the task of the challenge as well as our algorithm for low budget optimization that we named SPBOpt. We optimize the hyper-parameters of our algorithm for the competition finals using multi-task Bayesian optimization on results from the first two evaluation settings. Our approach has ranked third in the competition finals.