Bower, Craig

Adversarial Thresholding Semi-Bandits

Thesis

Bower, Craig 2020. Adversarial Thresholding Semi-Bandits. Thesis

Publication process dates
Authors	Bower, Craig
Qualification name	PhD
Abstract	The classical multi-armed bandit is one of the most common examples of sequential decision-making, either by trading-off between exploiting and exploring arms to maximise some payoff or purely exploring arms until the optimal arm is identified. In particular, a bandit player wanting to only pull arms with stochastic feedback exceeding a given threshold, has been studied extensively in a pure exploration context. However, numerous applications fail to be expressed, where a player wishes to balance the need to observe regions of an uncertain environment that are currently interesting (exploit) and checking if neglected regions have become interesting since last observed (explore). We introduce the adversarial thresholding semi-bandit problem: a non-stochastic bandit model, where a player wants to only pull (potentially several) arms with feedback meeting some threshold condition. Our main objective is to design algorithms that meet the requirements of the adversarial thresholding semi-bandit problem theoretically, empirically and algorithmically, for a given application. In other words, we want to develop a machine that learns to select options according to some threshold condition and adapts quickly if the feedback from selecting an option unexpectedly changes. This work has many real-world applications and is motivated by online detector control monitoring in high-energy physics experiments, on the Large Hadron Collider. We begin by describing the adversarial thresholding semi-bandit problem (ATSBP) in terms of a multi-armed bandit with multiple plays and extending the stochastic thresholding bandit problem to the adversarial setting. The adversarial thresholding exponentially-weighted exploration and exploitation with multiple plays algorithm (T-Exp3.M) and an algorithm combining label efficient prediction (LET-Exp3.M), are introduced that satisfy theoretical and computational Research specifications, but either perform poorly or fail completely under certain threshold conditions. To meet empirical performance requirements, we propose the dynamic label efficient adversarial thresholding exponentially-weighted exploration and exploitation with multiple plays algorithm (dLET-Exp3.M). Whilst computational requirements match those for T-Exp3.M, theoretical upper bounds on performance are proven to be worse. We also introduce an ATSBP algorithm (AliceBandit) that decomposes the action of pulling an arm into selection and observation decisions. Computational complexity and empirical performance under two different threshold conditions are significantly improved, compared with exponentially weighted adversarial thresholding semi-bandits. Theoretical upper bounds on performance are also significantly improved, for certain environments. In the latter part of this thesis, we address the challenge of efficiently monitoring multiple condition parameters in high-energy experimental physics. Due to the extreme conditions experienced in heavy-ion particle colliders, the power supply to any device exceeding safe operating parameters is automatically shut down or tripped, to preserve integrity and functionality of the device. Prior to recent upgrades, a device or channel trip would halt data-taking for the entire experiment. Post-trip recovery requires a costly procedure both in terms of expertise and data-taking time. After the completion of the current upgrading phase (scheduled for 2021), the detector will collect data continuously. In this new regime, a channel trip will result in only the affected components of the experiment being shut down. However, since the new upgraded experiment will enable data-taking to increase by a factor of 100, each trip will have a significant impact on the experiments ability to provide physicists with reliable data to analyse. We demonstrate that adversarial thresholding semi-bandits efficiently identify device channels either exceeding a fixed threshold or deviating by more than a prescribed range prior to a trip, extending the state-of-the-art in high-energy physics detector control.
Keywords	Multi-armed bandits, Adversarial bandits, Thresholding bandits, Multi-agent bandits, Online machine learning, High-energy physics detector control
Year	2020
Publisher	University of Derby
Web address (URL)	hdl:10545/625725
File	license.txt File Access Level Open
File	license_rdf File Access Level Open
File	ATSB_Final_draft (2).pdf File Access Level Open
Deposited	29 Apr 2021, 09:47
Publication dates	Dec 2020
Rights	CC0 1.0 Universal
Contributors	Anjum, Ashiq (Advisor), Bagdasar, Ovidiu (Advisor) and Xue, Yong (Advisor)

Permalink -

https://repository.derby.ac.uk/item/940z7/adversarial-thresholding-semi-bandits

Log in to edit

Download files

File

	license.txt
File access level: Open

	license_rdf
File access level: Open

	ATSB_Final_draft (2).pdf
File access level: Open

115
total views
49
total downloads
12
views this month
2
downloads this month

Export as

Related outputs

A deep reinforcement learning based homeostatic system for unmanned position control

Manning, Warren, Anjum, Ashiq, Bower, Craig and Dassanayake, Priyanthi 2019. A deep reinforcement learning based homeostatic system for unmanned position control. Association for Computing Machinery.