Multi-armed quantum bandits: Exploration versus exploitation when learning properties of quantum states
We initiate the study of tradeoffs between exploration and exploitation in online learning of properties of quantum states.Given sequential oracle access to an unknown quantum state, in each round, we are tasked to choose an observable from a set of actions aiming to maximize its expectation value on the state (the reward).Information gained about