Adversarial Thresholding Semi-Bandits

Thesis


Bower, Craig 2020. Adversarial Thresholding Semi-Bandits. Thesis
AuthorsBower, Craig
Qualification namePhD
Abstract

The classical multi-armed bandit is one of the most common examples of sequential decision-making, either by trading-off between exploiting and exploring arms to maximise some payoff or purely exploring arms until the optimal arm is identified. In particular, a bandit player wanting to only pull arms with stochastic feedback exceeding a given threshold, has been studied extensively in a pure exploration context. However, numerous applications fail to be expressed, where a player wishes to balance the need to observe regions of an uncertain environment that are currently interesting (exploit) and checking if neglected regions have become interesting since last observed (explore).

We introduce the adversarial thresholding semi-bandit problem: a non-stochastic bandit model, where a player wants to only pull (potentially several) arms with feedback meeting some threshold condition. Our main objective is to design algorithms that meet the requirements of the adversarial thresholding semi-bandit problem theoretically, empirically and algorithmically, for a given application. In other words, we want to develop a machine that learns to select options according to some threshold condition and adapts quickly if the feedback from selecting an option unexpectedly changes. This work has many real-world applications and is motivated by online detector control monitoring in high-energy physics experiments, on the Large Hadron Collider.

We begin by describing the adversarial thresholding semi-bandit problem (ATSBP) in terms of a multi-armed bandit with multiple plays and extending the stochastic thresholding bandit problem to the adversarial setting. The adversarial thresholding exponentially-weighted exploration and exploitation with multiple plays algorithm (T-Exp3.M) and an algorithm combining label efficient prediction (LET-Exp3.M), are introduced that satisfy theoretical and computational Research specifications, but either perform poorly or fail completely under certain threshold conditions. To meet empirical performance requirements, we propose the dynamic label efficient adversarial thresholding exponentially-weighted exploration and exploitation with multiple plays algorithm (dLET-Exp3.M). Whilst computational requirements match those for T-Exp3.M, theoretical upper bounds on performance are proven to be worse. We also introduce an ATSBP algorithm (AliceBandit) that decomposes the action of pulling an arm into selection and observation decisions. Computational complexity and empirical performance under two different threshold conditions are significantly improved, compared with exponentially weighted adversarial thresholding semi-bandits. Theoretical upper bounds on performance are also significantly improved, for certain environments.

In the latter part of this thesis, we address the challenge of efficiently monitoring multiple condition parameters in high-energy experimental physics. Due to the extreme conditions experienced in heavy-ion particle colliders, the power supply to any device exceeding safe operating parameters is automatically shut down or tripped, to preserve integrity and functionality of the device. Prior to recent upgrades, a device or channel trip would halt data-taking for the entire experiment. Post-trip recovery requires a costly procedure both in terms of expertise and data-taking time. After the completion of the current upgrading phase (scheduled for 2021), the detector will collect data continuously. In this new regime, a channel trip will result in only the affected components of the experiment being shut down. However, since the new upgraded experiment will enable data-taking to increase by a factor of 100, each trip will have a significant impact on the experiments ability to provide physicists with reliable data to analyse. We demonstrate that adversarial thresholding semi-bandits efficiently identify device channels either exceeding a fixed threshold or deviating by more than a prescribed range prior to a trip, extending the state-of-the-art in high-energy physics detector control.

KeywordsMulti-armed bandits, Adversarial bandits, Thresholding bandits, Multi-agent bandits, Online machine learning, High-energy physics detector control
Year2020
PublisherUniversity of Derby
Web address (URL)hdl:10545/625725
File
File Access Level
Open
File
File Access Level
Open
File
File Access Level
Open
Publication process dates
Deposited29 Apr 2021, 09:47
Publication datesDec 2020
Rights

CC0 1.0 Universal

ContributorsAnjum, Ashiq (Advisor), Bagdasar, Ovidiu (Advisor) and Xue, Yong (Advisor)
Permalink -

https://repository.derby.ac.uk/item/940z7/adversarial-thresholding-semi-bandits

Download files


File
license.txt
File access level: Open

license_rdf
File access level: Open

ATSB_Final_draft (2).pdf
File access level: Open

  • 46
    total views
  • 16
    total downloads
  • 0
    views this month
  • 2
    downloads this month

Export as

Related outputs

A deep reinforcement learning based homeostatic system for unmanned position control
Manning, Warren, Anjum, Ashiq, Bower, Craig and Dassanayake, Priyanthi 2019. A deep reinforcement learning based homeostatic system for unmanned position control. Association for Computing Machinery.