Convexity Dominant Eigenvalue

Submitted by: Submitted by

Views: 215

Words: 11861

Pages: 48

Category: Science and Technology

Date Submitted: 09/04/2012 03:07 AM

Report This Essay

328

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 41, NO. 3, MARCH 1996

andits wi

Abstract- The multi-armed bandit problem with switching penalties (switching cost and switching delays) is investigated. It is shown that under an optimal policy, decisions about the processor allocation need to be made only at stopping times that achieve an appropriateindex, the well-known “Gittins index” or a “switching index” that is defined for switching cost and switching delays. An algorithm for the computation of the “switching index” is presented. Furthermore, sufficient conditions for optimality of allocation strategies, based on limited look-ahead techniques, are established. These conditions together with the above-mentioned feature of optimal scheduling policies simplify the search for an optimal allocation policy. For a special class of multi-armed bandits (scheduling of parallel queues with switching penalties and no arrivals), it is shown that the aforementioned property of optimal policies is sufficient to determine an optimal allocation strategy. In general, the determinationof optimal allocation policies remains a dimcult and challenging task.

Switching

alties

Manjari Asawa and Demosthenis Teneketzis, Member, ZEEE

interchangeably) and one server. Let x,(t) denote the state of machine i at time t , where i = 1, 2, . . , N and t = 1, 2, . . .. At each time instant t, the server must select exactly one machine for operation (or service). Denote this machine by m(t).If m ( t ) = i , i.e., machine i is selected for operation at time t , an immediate reward R(t) := R,(z,(t))is obtained, and the state of machine i changes to z t ( t 1) according to a stationary Markov transition rule. The states of the idle machines remain frozen, i.e., x 3 ( t 1) = x 3 ( t ) ,j # i . The states of all machines are perfectly observed, and the objective is to schedule the order in which machines are to be operated so as to maximize an infinite horizon expected discounted...