0:00:15that
0:00:17it there is a joint work with uh
0:00:19and i is due that of a seems to the found my
0:00:22the
0:00:23and you guy and uh a a separation much very from us C
0:00:27um so first
0:00:29the problem we are considering a a is that dynamic spectrum access and on the model
0:00:34and the is illustrated that system where looking at to um
0:00:38there are and need given then channels uh
0:00:41i i don't get it it to a primary system that's using a slot transmission structure
0:00:46and uh of second and their use of the objective here is to find the the i'd slot and the
0:00:52limit its sensing
0:00:53oh we a at any given so uh a set use the can only choose K channels to set
0:00:59and if
0:01:00the channel is i do it what transmit and uh
0:01:04if you transmit if you patch what i'd slot you will receive one unit to work then essentially is you
0:01:09get one packet delete
0:01:11and the problem here is how to way i choose
0:01:14which K channels was we should sense and the beginning of a
0:01:18based on
0:01:20observation as a visual have so far
0:01:21and to answer this question the first thing we need to specify
0:01:25is the model of channel all two
0:01:27so if we look at
0:01:29each channel so
0:01:30each channel is essentially piece the i do is a stochastic process
0:01:34and each channel is model as a stock as a process
0:01:37with on parameter
0:01:39so we do know what kind of stochastic process with be ways
0:01:42but the prime in
0:01:44so the simple liz
0:01:45so the process we can be
0:01:47i be here will be a i pro
0:01:50so the channel or as the ah for channel i will the i people in the only with a known
0:01:55means it
0:01:56so see the i'd say is the probability channellise i'd in any given as
0:02:01and
0:02:02changing
0:02:02i
0:02:04and not
0:02:05we want to choose
0:02:07without knowing which channel was more likely to be bus so i don't
0:02:11imagining for pink is with on the by S and you want to get as many had
0:02:16oh
0:02:17so now how do we choose
0:02:18channel
0:02:19um K out of
0:02:20you know there is lot
0:02:22and objective is to maximise house swoop
0:02:25so what is the maximum throughput would we kind
0:02:28even assume
0:02:29we not the perfect
0:02:31we know all the C
0:02:33in that scenario we no we should always sense the channel with the largest to seat
0:02:38with
0:02:39the max probability
0:02:40i
0:02:41and you that case
0:02:42i was we'll put we look at it is just see
0:02:45as every i does well we get
0:02:46when you in
0:02:48the now without knowing all the see guys is it possible to achieve the same maximum throughput be fine by
0:02:54the non model seen that
0:02:55and if yes how
0:02:58a turns of this is actually a well studied problem you've we take a D model
0:03:03this is you was special case of more the amp and
0:03:06vol
0:03:07and the region problem we can explain using this slot machine with a dependent are
0:03:14and that you just time count is that like one um to play
0:03:17but we don't know the we word statistics
0:03:20and the we word processor soon i
0:03:23and maximise long term
0:03:25i
0:03:26as essence of this problem is in the tradeoff between exploitation and exploration
0:03:32so you to it to like
0:03:33we want to play a a a um that has a large is the sample me based on what we
0:03:37have
0:03:38is there are so
0:03:39but that same time
0:03:40we have to play each are sufficiently many both
0:03:44a sufficient number of times it so that our some whole me is close enough to the true
0:03:49oh that's essentially the
0:03:51i which a problem
0:03:53and uh you can be read was we formulate it in the following way
0:03:56see the one to see that and is the we what mean of each are which is a little or
0:04:01no
0:04:02and the maximum we what we can i get over a reading of lance T
0:04:07you
0:04:08it has it's no is given by the maximal see that i use subscript the one here to in the
0:04:13extreme the here
0:04:14to see the white the largest one multiplied by right
0:04:17which is achieved by always playing the bass ah
0:04:20very into it
0:04:22or not we don't know all the see so we don't know which way is the best um
0:04:26we can design design a policy high
0:04:29that tell us which um to play based on what we have
0:04:32there are so far
0:04:33and uh this is the re what we can get
0:04:36for a given policy
0:04:38and how do we know how to these policy is
0:04:41it's measured by that you friends
0:04:43to these ideals in their real way where you do not
0:04:46so this is called the required
0:04:48also called cost of nor learning that's the cost we pay
0:04:51for not snowy them uh
0:04:53and
0:04:54very intuitive to this
0:04:56every time we spend time
0:04:58a a a a a a a that's not the
0:05:01best
0:05:02we waiting incur a loss of C that one minus seat
0:05:05we are spending time
0:05:07i
0:05:08a multiplied by the expected time
0:05:10we
0:05:11yeah play a the ice best
0:05:13so to minimize wet
0:05:14we want to minimize a model
0:05:16spend a each
0:05:18and uh object here is to minimize the growth rate of the web
0:05:23she
0:05:23it's intuitive
0:05:25whether always grew of with time because we and never stop playing
0:05:28each a
0:05:29to make sure our sample means zach
0:05:32and we can also see
0:05:34if we have a sampling year roles of whether with time T then we can achieve the maximum average reward
0:05:40of the best uh
0:05:41a find five
0:05:42the normal models area because if you divide by T C these is not be know you with
0:05:49and there is a problem is nice solved by at all
0:05:53i five
0:05:54they show that
0:05:55the minimum required will street is lot
0:05:58with
0:05:58T
0:05:59they also found the best leading constant uh and given in terms of kl divergence
0:06:04and they the also construct a the policy to achieve peace past
0:06:08well
0:06:09but was rather complicated policy
0:06:11and uh about ten to a fifty years later
0:06:14there's simple policies to achieve lot
0:06:17at
0:06:17this is
0:06:18essentially a index
0:06:20type of policy based on simple me
0:06:22one of the most uh uh uh well no one is
0:06:25proposed by our uh
0:06:27is the so called U C B one policy
0:06:29so in this case what compute the in that
0:06:31for each a are at each time T
0:06:34the in is them play the sum of me of these are
0:06:38class
0:06:39this
0:06:39turn is called of the confidence of on
0:06:42used term is given by a square with the two of log T divide by G i
0:06:47yeah i is the number of times we have played arm
0:06:50i
0:06:51not a car
0:06:53so you can see
0:06:54if we are playing a a a a significant a last log T used remote allow
0:07:00it will make the index the large as
0:07:02that we will have to
0:07:03so the is always played around with a large
0:07:06oh use actually were thing
0:07:08the policy to at you spend a lot of keep time
0:07:11each uh
0:07:12which that's what line all be has shown that
0:07:15what we have
0:07:16so you've what it you know i be model though wow a channel occupancy
0:07:20the problem is completely so
0:07:22it's just a special case of the classic the um and
0:07:27so you we consider a more general more um
0:07:30complex model that's the each channel is a markov chain
0:07:33to state B the i don't
0:07:35it change according to this
0:07:37a a transition that
0:07:38here
0:07:39and we don't know the transition problem it's we don't know P zero one Q one
0:07:43at all the channels are still has had an equal they have the same transition
0:07:48not
0:07:48how do we design channels action policy
0:07:51to maximise
0:07:52swoop
0:07:53turns out
0:07:54this is a variation of the uh a more yeah and it problem but has never been self so we
0:08:00are
0:08:00essentially a on the first to to address
0:08:03these
0:08:04more down band
0:08:05and um my
0:08:07and the what to challenges here
0:08:10so first here we can see if we have a markovian model for each channel
0:08:15then the optimal policy even if we know the transition probability
0:08:19is no longer the state home one channel
0:08:21we need to
0:08:23so each across channels based on our prediction or which channel is more likely to
0:08:27i do like these
0:08:29a fixed
0:08:29because of the channel memory makes prediction
0:08:33so that in this case we can not simply
0:08:36minimize our required by funding at that time was spent on a bad channel
0:08:41because
0:08:41there's no really bad channel
0:08:43say we need to switch the was smart we are all switching on much
0:08:48so you actually large mine to or what is the best way
0:08:51to switching on among channels based on how observation
0:08:55and in general
0:08:56there are infinite possibilities of
0:08:58how to to switching
0:08:59so the wheel are
0:09:00from these infinite possibility which one is the bad
0:09:03so that's really that difficult part
0:09:05and perhaps the reason that this problem has not addressed
0:09:09a lead to each
0:09:10so actually turns out
0:09:12if we do one of the transition probability
0:09:15it's is this problem itself is complex enough not this is is actually the S is multi um band problem
0:09:21and they're not that that makes
0:09:23uh formulate that we do ninety eight eight
0:09:26and this problem in general is peace space ha
0:09:29that's shown but have them each one that's
0:09:32the cool think here is we have a special recipe spend it
0:09:35the think we are you ways here is the only a two state markov G channel each channel
0:09:40and we have shown and they no model
0:09:42the optimal policy is a simple now
0:09:46and do this is uh a a a a real work uh um have down with a a couple of
0:09:50clever with er
0:09:51so the optimality is
0:09:53current to and there's certain conditions
0:09:55so if we have a key one one we to that P zero one which be is is that daisy
0:09:59chain
0:10:00it's more likely to stay rather than change state
0:10:03then of to melody is generally um shoe
0:10:06but that you are the think like you P was model that P zero once a sure what three channels
0:10:12and the all K equal to and mine there's one
0:10:14but we come it's can gently these holds for all and for
0:10:18you you a rising case we have be able to prove
0:10:21so but that's
0:10:22assume we can this time your to project or is true then we know and the not model myopic policy
0:10:28is the optimal
0:10:30and the more significant to these on the the problem
0:10:33is that the mouth you policy has a simple so a universal struck
0:10:39oh which means a we need to know is what if P one mine's squid to that P zero one
0:10:43we always switch in this way
0:10:46as we stay at a i don't channel if this channel with current the sense is i don't who was
0:10:50stay with
0:10:51you it becomes a it then we switched to the channel visited the log is
0:10:56that's are are way of speech
0:10:58and if
0:10:59you want one small the
0:11:00if zero one then way switching a you from way
0:11:03we actually stay if the channel was a the and the was which if becomes i with where
0:11:08why you re what then we go to somewhere else
0:11:11and how the with switch with switched to the channel most recently visited a are all channels with a T
0:11:17that you most of all
0:11:18you've there are not such an no
0:11:20the we switch to the channel busy long as
0:11:23anyway
0:11:24there essentially a of this says is all optimal we of change
0:11:29among channels
0:11:30i'm only takes two possible
0:11:32depending on the transition
0:11:34so that's really
0:11:36makes the are no problem
0:11:38significant a simpler to some instead of searching my infinite possible ways of how to switching on channels
0:11:44we know there are only two possible
0:11:47if we don't to P zero one
0:11:48what is either there this way or that
0:11:51the
0:11:52we can formulate a sort of conceptual more the and it
0:11:55well
0:11:56so we treat each way of channel switching as a a
0:12:00you really can see that what is a a a remote on a problem
0:12:04is nothing but a choice
0:12:05a possible option
0:12:07have and we don't know which of change the bass
0:12:09as exactly what do we have
0:12:11so then not we'll a which um is that what are we only have forms we of switching
0:12:16so we want to do which one is the good one
0:12:18because we don't know the transition probably so we don't know which one to choose
0:12:22so it same we just read of the problem to the class more they are banded problem
0:12:27not that i C there are two channel is we have to address
0:12:31the first one the price are we have to answer is how long to play each i
0:12:35so a region that we just play one as we got some word for the classic
0:12:39model
0:12:40but not a a a a a one way of switching on much
0:12:43so we have to follow that we of switching among channels long enough you all there to see if i
0:12:50so how long we play
0:12:52and terms not to the optimal lattice of formal each way use spell star
0:12:57depends on the transition probability
0:12:59unfortunately fortunately and that's something we don't not
0:13:02so that's one they we have to dress
0:13:04and the out way is the reward in this case and the longer i be kinda
0:13:08not even independent
0:13:10oh i be of rows are
0:13:12because
0:13:12each um is are only able way of switching on among channels
0:13:15and that they are also reaching a on the same as that of and J
0:13:19they are no longer
0:13:20and that right
0:13:22so to address these two challenges that's our uh solution here
0:13:26so first how long to play each uh
0:13:28the optimal lattice of playing
0:13:30each arm of only each we of a channel switching
0:13:34depends on the parameters are no priming mean
0:13:37so to get a a a round of was back to the trick we have here is
0:13:41we play a each and with increasing that
0:13:43so we do not fixed
0:13:45a a specific lance for playing each a
0:13:47but rather choose a sequence that rules with
0:13:51and
0:13:51we will pace
0:13:52certain price in terms of required
0:13:55but we can choose the increasing read of view sequence which really small a lot a lot of T
0:14:01and
0:14:02do the price a pay we pay will be how which we really
0:14:05a
0:14:06yeah a thing to deal with not i T samples
0:14:10um what we did it is we um proved a modified each one of thing
0:14:15which essentially is the chief for proving a lot required to in the classic i D and eight uh
0:14:21model
0:14:22so the uh channel thing an eh
0:14:25uh are we should not foundation is all the conditional um
0:14:29expectations should have should have the same me so that a conditional exactly expectation should be the same me
0:14:35and then we come by to the probability that
0:14:38the sample mean is to far from the truth
0:14:41and in this case we how we don't have identical um every we sample anymore
0:14:46so we need to to modify that because the conditional expectation no longer the same
0:14:50but we can get a the result if the conditional expectations are with thing
0:14:54a certain range of each other
0:14:56so
0:14:57then we can use similar techniques to prove there is
0:15:01and the what we have shown here is
0:15:03we can achieve a which very close to a log all there for the required
0:15:08essentially
0:15:09in in front of a lot we don't have a constant anymore is
0:15:12it's but a you know increasing sequence
0:15:14but this sequence can be a it is small
0:15:17as slow as you want
0:15:18so you choose
0:15:19a small you want to you hug close you want to get a lot T and based on that
0:15:24we can choose
0:15:25the increasing the for each
0:15:27L N
0:15:28so i was give the details
0:15:30on the relation between G T and uh L T actually very simple
0:15:34is just showing at each time
0:15:36what is the current L where you
0:15:39and uh that's a it or what is conclude um
0:15:43this paper here
0:15:44so what
0:15:45we have considered a a a a of the problem was a multi bit it by um dynamic spectrum access
0:15:52um we believe this general approach would be useful for for solving a either
0:15:57uh rest we spend it probably even a mark of general mark of C your process is on the primary
0:16:03a what is the message here
0:16:05the method here is if we are dealing with a small T are bandit problem
0:16:09with a known
0:16:10uh then them
0:16:12and we want to achieve
0:16:14the same i reach word as we can get and the not
0:16:18so in general problem is intractable
0:16:21but the problem may have a whole
0:16:24if
0:16:24and the the not model
0:16:26the optimal policy has what we call a find that option
0:16:30by means
0:16:32there are only fun and two ways of
0:16:34do we find the and optimal policy only has four that possible for
0:16:39depending on the primary
0:16:42don't we treat each possible for
0:16:44as as a a a a we don't to know the prior
0:16:47parameters then we tried to or which one is actually a one for that and a line on them
0:16:53so that's the general message
0:16:54but of course for each specific problem we have to but yeah sure
0:16:59whether log
0:17:00oh i'm sure close to log where can be but she
0:17:04and in the context of dynamic spectrum access that's great we have been focusing are
0:17:08the problem is really possible because house
0:17:11of this i a universal structure of the our P policy for that
0:17:15no model pace
0:17:16and uh i a little bit advertisement
0:17:19um i be giving a another problem related a problem being about forty minutes
0:17:25uh
0:17:25because this one
0:17:27a uh we can only somehow for a very special cases where you have only to state markov chain and
0:17:33all the channels are are are arms a stochastic identical
0:17:37the what if we're dealing with a general fine state markov chain with the no in the transition problem
0:17:44what can we do there
0:17:45so this is paper actually is a dressing
0:17:48this complete be general model
0:17:50um but we can also achieve not require
0:17:53but
0:17:54the definition of the required it is uh
0:17:56we
0:17:57oh
0:17:58oh
0:18:02path
0:18:03i don't one person something to sort of question from the speech
0:18:08true
0:18:15oh i yes
0:18:16so that's essentially here
0:18:20you that cool
0:18:21yes yeah
0:18:22so
0:18:23that the required we get it is these G Q locked chi right so we want chi G to grow
0:18:28very slow is time T
0:18:30and uh so if you want to achieve a lot a lot should be
0:18:34uh for chi chi
0:18:36then
0:18:36eventually do sequence
0:18:38uh
0:18:39this is just
0:18:41the number of
0:18:43but what is the value of L for the current
0:18:46so
0:18:47L one i is
0:18:48well we start we play one i L L one
0:18:51slot
0:18:52so then for the first L once lots the these sequence was what have the same value a one
0:18:58for the that
0:18:59what a a a a a model be out to we play out to time
0:19:02so for the next L tools as well as the but will be out
0:19:06so
0:19:06L is increasing
0:19:08so these sequence will be also be in crazy but even slower or
0:19:12because it's repeating the same value
0:19:15but it
0:19:16yeah
0:19:19okay
0:19:20or running times to time well just a record some
0:19:31um
0:19:33i field to see if a connection was not little huh but this is really trying to learn
0:19:38um
0:19:38trying to maximise this throughput put and there known parameters
0:19:42but a a is more like the inclusion was your peer users here we even only have one
0:19:48uh and secondary user
0:19:49i i i probably miss
0:19:51um
0:19:52but an action that i i don't really see whether two problems i really to
0:19:56um
0:19:57maybe we can talk more data
0:19:59you to to rooms
0:20:01so so let's
0:20:02that's
0:20:02the speaker