0:00:15 | that |
---|
0:00:17 | it there is a joint work with uh |
---|
0:00:19 | and i is due that of a seems to the found my |
---|
0:00:22 | the |
---|
0:00:23 | and you guy and uh a a separation much very from us C |
---|
0:00:27 | um so first |
---|
0:00:29 | the problem we are considering a a is that dynamic spectrum access and on the model |
---|
0:00:34 | and the is illustrated that system where looking at to um |
---|
0:00:38 | there are and need given then channels uh |
---|
0:00:41 | i i don't get it it to a primary system that's using a slot transmission structure |
---|
0:00:46 | and uh of second and their use of the objective here is to find the the i'd slot and the |
---|
0:00:52 | limit its sensing |
---|
0:00:53 | oh we a at any given so uh a set use the can only choose K channels to set |
---|
0:00:59 | and if |
---|
0:01:00 | the channel is i do it what transmit and uh |
---|
0:01:04 | if you transmit if you patch what i'd slot you will receive one unit to work then essentially is you |
---|
0:01:09 | get one packet delete |
---|
0:01:11 | and the problem here is how to way i choose |
---|
0:01:14 | which K channels was we should sense and the beginning of a |
---|
0:01:18 | based on |
---|
0:01:20 | observation as a visual have so far |
---|
0:01:21 | and to answer this question the first thing we need to specify |
---|
0:01:25 | is the model of channel all two |
---|
0:01:27 | so if we look at |
---|
0:01:29 | each channel so |
---|
0:01:30 | each channel is essentially piece the i do is a stochastic process |
---|
0:01:34 | and each channel is model as a stock as a process |
---|
0:01:37 | with on parameter |
---|
0:01:39 | so we do know what kind of stochastic process with be ways |
---|
0:01:42 | but the prime in |
---|
0:01:44 | so the simple liz |
---|
0:01:45 | so the process we can be |
---|
0:01:47 | i be here will be a i pro |
---|
0:01:50 | so the channel or as the ah for channel i will the i people in the only with a known |
---|
0:01:55 | means it |
---|
0:01:56 | so see the i'd say is the probability channellise i'd in any given as |
---|
0:02:01 | and |
---|
0:02:02 | changing |
---|
0:02:02 | i |
---|
0:02:04 | and not |
---|
0:02:05 | we want to choose |
---|
0:02:07 | without knowing which channel was more likely to be bus so i don't |
---|
0:02:11 | imagining for pink is with on the by S and you want to get as many had |
---|
0:02:16 | oh |
---|
0:02:17 | so now how do we choose |
---|
0:02:18 | channel |
---|
0:02:19 | um K out of |
---|
0:02:20 | you know there is lot |
---|
0:02:22 | and objective is to maximise house swoop |
---|
0:02:25 | so what is the maximum throughput would we kind |
---|
0:02:28 | even assume |
---|
0:02:29 | we not the perfect |
---|
0:02:31 | we know all the C |
---|
0:02:33 | in that scenario we no we should always sense the channel with the largest to seat |
---|
0:02:38 | with |
---|
0:02:39 | the max probability |
---|
0:02:40 | i |
---|
0:02:41 | and you that case |
---|
0:02:42 | i was we'll put we look at it is just see |
---|
0:02:45 | as every i does well we get |
---|
0:02:46 | when you in |
---|
0:02:48 | the now without knowing all the see guys is it possible to achieve the same maximum throughput be fine by |
---|
0:02:54 | the non model seen that |
---|
0:02:55 | and if yes how |
---|
0:02:58 | a turns of this is actually a well studied problem you've we take a D model |
---|
0:03:03 | this is you was special case of more the amp and |
---|
0:03:06 | vol |
---|
0:03:07 | and the region problem we can explain using this slot machine with a dependent are |
---|
0:03:14 | and that you just time count is that like one um to play |
---|
0:03:17 | but we don't know the we word statistics |
---|
0:03:20 | and the we word processor soon i |
---|
0:03:23 | and maximise long term |
---|
0:03:25 | i |
---|
0:03:26 | as essence of this problem is in the tradeoff between exploitation and exploration |
---|
0:03:32 | so you to it to like |
---|
0:03:33 | we want to play a a a um that has a large is the sample me based on what we |
---|
0:03:37 | have |
---|
0:03:38 | is there are so |
---|
0:03:39 | but that same time |
---|
0:03:40 | we have to play each are sufficiently many both |
---|
0:03:44 | a sufficient number of times it so that our some whole me is close enough to the true |
---|
0:03:49 | oh that's essentially the |
---|
0:03:51 | i which a problem |
---|
0:03:53 | and uh you can be read was we formulate it in the following way |
---|
0:03:56 | see the one to see that and is the we what mean of each are which is a little or |
---|
0:04:01 | no |
---|
0:04:02 | and the maximum we what we can i get over a reading of lance T |
---|
0:04:07 | you |
---|
0:04:08 | it has it's no is given by the maximal see that i use subscript the one here to in the |
---|
0:04:13 | extreme the here |
---|
0:04:14 | to see the white the largest one multiplied by right |
---|
0:04:17 | which is achieved by always playing the bass ah |
---|
0:04:20 | very into it |
---|
0:04:22 | or not we don't know all the see so we don't know which way is the best um |
---|
0:04:26 | we can design design a policy high |
---|
0:04:29 | that tell us which um to play based on what we have |
---|
0:04:32 | there are so far |
---|
0:04:33 | and uh this is the re what we can get |
---|
0:04:36 | for a given policy |
---|
0:04:38 | and how do we know how to these policy is |
---|
0:04:41 | it's measured by that you friends |
---|
0:04:43 | to these ideals in their real way where you do not |
---|
0:04:46 | so this is called the required |
---|
0:04:48 | also called cost of nor learning that's the cost we pay |
---|
0:04:51 | for not snowy them uh |
---|
0:04:53 | and |
---|
0:04:54 | very intuitive to this |
---|
0:04:56 | every time we spend time |
---|
0:04:58 | a a a a a a a that's not the |
---|
0:05:01 | best |
---|
0:05:02 | we waiting incur a loss of C that one minus seat |
---|
0:05:05 | we are spending time |
---|
0:05:07 | i |
---|
0:05:08 | a multiplied by the expected time |
---|
0:05:10 | we |
---|
0:05:11 | yeah play a the ice best |
---|
0:05:13 | so to minimize wet |
---|
0:05:14 | we want to minimize a model |
---|
0:05:16 | spend a each |
---|
0:05:18 | and uh object here is to minimize the growth rate of the web |
---|
0:05:23 | she |
---|
0:05:23 | it's intuitive |
---|
0:05:25 | whether always grew of with time because we and never stop playing |
---|
0:05:28 | each a |
---|
0:05:29 | to make sure our sample means zach |
---|
0:05:32 | and we can also see |
---|
0:05:34 | if we have a sampling year roles of whether with time T then we can achieve the maximum average reward |
---|
0:05:40 | of the best uh |
---|
0:05:41 | a find five |
---|
0:05:42 | the normal models area because if you divide by T C these is not be know you with |
---|
0:05:49 | and there is a problem is nice solved by at all |
---|
0:05:53 | i five |
---|
0:05:54 | they show that |
---|
0:05:55 | the minimum required will street is lot |
---|
0:05:58 | with |
---|
0:05:58 | T |
---|
0:05:59 | they also found the best leading constant uh and given in terms of kl divergence |
---|
0:06:04 | and they the also construct a the policy to achieve peace past |
---|
0:06:08 | well |
---|
0:06:09 | but was rather complicated policy |
---|
0:06:11 | and uh about ten to a fifty years later |
---|
0:06:14 | there's simple policies to achieve lot |
---|
0:06:17 | at |
---|
0:06:17 | this is |
---|
0:06:18 | essentially a index |
---|
0:06:20 | type of policy based on simple me |
---|
0:06:22 | one of the most uh uh uh well no one is |
---|
0:06:25 | proposed by our uh |
---|
0:06:27 | is the so called U C B one policy |
---|
0:06:29 | so in this case what compute the in that |
---|
0:06:31 | for each a are at each time T |
---|
0:06:34 | the in is them play the sum of me of these are |
---|
0:06:38 | class |
---|
0:06:39 | this |
---|
0:06:39 | turn is called of the confidence of on |
---|
0:06:42 | used term is given by a square with the two of log T divide by G i |
---|
0:06:47 | yeah i is the number of times we have played arm |
---|
0:06:50 | i |
---|
0:06:51 | not a car |
---|
0:06:53 | so you can see |
---|
0:06:54 | if we are playing a a a a significant a last log T used remote allow |
---|
0:07:00 | it will make the index the large as |
---|
0:07:02 | that we will have to |
---|
0:07:03 | so the is always played around with a large |
---|
0:07:06 | oh use actually were thing |
---|
0:07:08 | the policy to at you spend a lot of keep time |
---|
0:07:11 | each uh |
---|
0:07:12 | which that's what line all be has shown that |
---|
0:07:15 | what we have |
---|
0:07:16 | so you've what it you know i be model though wow a channel occupancy |
---|
0:07:20 | the problem is completely so |
---|
0:07:22 | it's just a special case of the classic the um and |
---|
0:07:27 | so you we consider a more general more um |
---|
0:07:30 | complex model that's the each channel is a markov chain |
---|
0:07:33 | to state B the i don't |
---|
0:07:35 | it change according to this |
---|
0:07:37 | a a transition that |
---|
0:07:38 | here |
---|
0:07:39 | and we don't know the transition problem it's we don't know P zero one Q one |
---|
0:07:43 | at all the channels are still has had an equal they have the same transition |
---|
0:07:48 | not |
---|
0:07:48 | how do we design channels action policy |
---|
0:07:51 | to maximise |
---|
0:07:52 | swoop |
---|
0:07:53 | turns out |
---|
0:07:54 | this is a variation of the uh a more yeah and it problem but has never been self so we |
---|
0:08:00 | are |
---|
0:08:00 | essentially a on the first to to address |
---|
0:08:03 | these |
---|
0:08:04 | more down band |
---|
0:08:05 | and um my |
---|
0:08:07 | and the what to challenges here |
---|
0:08:10 | so first here we can see if we have a markovian model for each channel |
---|
0:08:15 | then the optimal policy even if we know the transition probability |
---|
0:08:19 | is no longer the state home one channel |
---|
0:08:21 | we need to |
---|
0:08:23 | so each across channels based on our prediction or which channel is more likely to |
---|
0:08:27 | i do like these |
---|
0:08:29 | a fixed |
---|
0:08:29 | because of the channel memory makes prediction |
---|
0:08:33 | so that in this case we can not simply |
---|
0:08:36 | minimize our required by funding at that time was spent on a bad channel |
---|
0:08:41 | because |
---|
0:08:41 | there's no really bad channel |
---|
0:08:43 | say we need to switch the was smart we are all switching on much |
---|
0:08:48 | so you actually large mine to or what is the best way |
---|
0:08:51 | to switching on among channels based on how observation |
---|
0:08:55 | and in general |
---|
0:08:56 | there are infinite possibilities of |
---|
0:08:58 | how to to switching |
---|
0:08:59 | so the wheel are |
---|
0:09:00 | from these infinite possibility which one is the bad |
---|
0:09:03 | so that's really that difficult part |
---|
0:09:05 | and perhaps the reason that this problem has not addressed |
---|
0:09:09 | a lead to each |
---|
0:09:10 | so actually turns out |
---|
0:09:12 | if we do one of the transition probability |
---|
0:09:15 | it's is this problem itself is complex enough not this is is actually the S is multi um band problem |
---|
0:09:21 | and they're not that that makes |
---|
0:09:23 | uh formulate that we do ninety eight eight |
---|
0:09:26 | and this problem in general is peace space ha |
---|
0:09:29 | that's shown but have them each one that's |
---|
0:09:32 | the cool think here is we have a special recipe spend it |
---|
0:09:35 | the think we are you ways here is the only a two state markov G channel each channel |
---|
0:09:40 | and we have shown and they no model |
---|
0:09:42 | the optimal policy is a simple now |
---|
0:09:46 | and do this is uh a a a a real work uh um have down with a a couple of |
---|
0:09:50 | clever with er |
---|
0:09:51 | so the optimality is |
---|
0:09:53 | current to and there's certain conditions |
---|
0:09:55 | so if we have a key one one we to that P zero one which be is is that daisy |
---|
0:09:59 | chain |
---|
0:10:00 | it's more likely to stay rather than change state |
---|
0:10:03 | then of to melody is generally um shoe |
---|
0:10:06 | but that you are the think like you P was model that P zero once a sure what three channels |
---|
0:10:12 | and the all K equal to and mine there's one |
---|
0:10:14 | but we come it's can gently these holds for all and for |
---|
0:10:18 | you you a rising case we have be able to prove |
---|
0:10:21 | so but that's |
---|
0:10:22 | assume we can this time your to project or is true then we know and the not model myopic policy |
---|
0:10:28 | is the optimal |
---|
0:10:30 | and the more significant to these on the the problem |
---|
0:10:33 | is that the mouth you policy has a simple so a universal struck |
---|
0:10:39 | oh which means a we need to know is what if P one mine's squid to that P zero one |
---|
0:10:43 | we always switch in this way |
---|
0:10:46 | as we stay at a i don't channel if this channel with current the sense is i don't who was |
---|
0:10:50 | stay with |
---|
0:10:51 | you it becomes a it then we switched to the channel visited the log is |
---|
0:10:56 | that's are are way of speech |
---|
0:10:58 | and if |
---|
0:10:59 | you want one small the |
---|
0:11:00 | if zero one then way switching a you from way |
---|
0:11:03 | we actually stay if the channel was a the and the was which if becomes i with where |
---|
0:11:08 | why you re what then we go to somewhere else |
---|
0:11:11 | and how the with switch with switched to the channel most recently visited a are all channels with a T |
---|
0:11:17 | that you most of all |
---|
0:11:18 | you've there are not such an no |
---|
0:11:20 | the we switch to the channel busy long as |
---|
0:11:23 | anyway |
---|
0:11:24 | there essentially a of this says is all optimal we of change |
---|
0:11:29 | among channels |
---|
0:11:30 | i'm only takes two possible |
---|
0:11:32 | depending on the transition |
---|
0:11:34 | so that's really |
---|
0:11:36 | makes the are no problem |
---|
0:11:38 | significant a simpler to some instead of searching my infinite possible ways of how to switching on channels |
---|
0:11:44 | we know there are only two possible |
---|
0:11:47 | if we don't to P zero one |
---|
0:11:48 | what is either there this way or that |
---|
0:11:51 | the |
---|
0:11:52 | we can formulate a sort of conceptual more the and it |
---|
0:11:55 | well |
---|
0:11:56 | so we treat each way of channel switching as a a |
---|
0:12:00 | you really can see that what is a a a remote on a problem |
---|
0:12:04 | is nothing but a choice |
---|
0:12:05 | a possible option |
---|
0:12:07 | have and we don't know which of change the bass |
---|
0:12:09 | as exactly what do we have |
---|
0:12:11 | so then not we'll a which um is that what are we only have forms we of switching |
---|
0:12:16 | so we want to do which one is the good one |
---|
0:12:18 | because we don't know the transition probably so we don't know which one to choose |
---|
0:12:22 | so it same we just read of the problem to the class more they are banded problem |
---|
0:12:27 | not that i C there are two channel is we have to address |
---|
0:12:31 | the first one the price are we have to answer is how long to play each i |
---|
0:12:35 | so a region that we just play one as we got some word for the classic |
---|
0:12:39 | model |
---|
0:12:40 | but not a a a a a one way of switching on much |
---|
0:12:43 | so we have to follow that we of switching among channels long enough you all there to see if i |
---|
0:12:50 | so how long we play |
---|
0:12:52 | and terms not to the optimal lattice of formal each way use spell star |
---|
0:12:57 | depends on the transition probability |
---|
0:12:59 | unfortunately fortunately and that's something we don't not |
---|
0:13:02 | so that's one they we have to dress |
---|
0:13:04 | and the out way is the reward in this case and the longer i be kinda |
---|
0:13:08 | not even independent |
---|
0:13:10 | oh i be of rows are |
---|
0:13:12 | because |
---|
0:13:12 | each um is are only able way of switching on among channels |
---|
0:13:15 | and that they are also reaching a on the same as that of and J |
---|
0:13:19 | they are no longer |
---|
0:13:20 | and that right |
---|
0:13:22 | so to address these two challenges that's our uh solution here |
---|
0:13:26 | so first how long to play each uh |
---|
0:13:28 | the optimal lattice of playing |
---|
0:13:30 | each arm of only each we of a channel switching |
---|
0:13:34 | depends on the parameters are no priming mean |
---|
0:13:37 | so to get a a a round of was back to the trick we have here is |
---|
0:13:41 | we play a each and with increasing that |
---|
0:13:43 | so we do not fixed |
---|
0:13:45 | a a specific lance for playing each a |
---|
0:13:47 | but rather choose a sequence that rules with |
---|
0:13:51 | and |
---|
0:13:51 | we will pace |
---|
0:13:52 | certain price in terms of required |
---|
0:13:55 | but we can choose the increasing read of view sequence which really small a lot a lot of T |
---|
0:14:01 | and |
---|
0:14:02 | do the price a pay we pay will be how which we really |
---|
0:14:05 | a |
---|
0:14:06 | yeah a thing to deal with not i T samples |
---|
0:14:10 | um what we did it is we um proved a modified each one of thing |
---|
0:14:15 | which essentially is the chief for proving a lot required to in the classic i D and eight uh |
---|
0:14:21 | model |
---|
0:14:22 | so the uh channel thing an eh |
---|
0:14:25 | uh are we should not foundation is all the conditional um |
---|
0:14:29 | expectations should have should have the same me so that a conditional exactly expectation should be the same me |
---|
0:14:35 | and then we come by to the probability that |
---|
0:14:38 | the sample mean is to far from the truth |
---|
0:14:41 | and in this case we how we don't have identical um every we sample anymore |
---|
0:14:46 | so we need to to modify that because the conditional expectation no longer the same |
---|
0:14:50 | but we can get a the result if the conditional expectations are with thing |
---|
0:14:54 | a certain range of each other |
---|
0:14:56 | so |
---|
0:14:57 | then we can use similar techniques to prove there is |
---|
0:15:01 | and the what we have shown here is |
---|
0:15:03 | we can achieve a which very close to a log all there for the required |
---|
0:15:08 | essentially |
---|
0:15:09 | in in front of a lot we don't have a constant anymore is |
---|
0:15:12 | it's but a you know increasing sequence |
---|
0:15:14 | but this sequence can be a it is small |
---|
0:15:17 | as slow as you want |
---|
0:15:18 | so you choose |
---|
0:15:19 | a small you want to you hug close you want to get a lot T and based on that |
---|
0:15:24 | we can choose |
---|
0:15:25 | the increasing the for each |
---|
0:15:27 | L N |
---|
0:15:28 | so i was give the details |
---|
0:15:30 | on the relation between G T and uh L T actually very simple |
---|
0:15:34 | is just showing at each time |
---|
0:15:36 | what is the current L where you |
---|
0:15:39 | and uh that's a it or what is conclude um |
---|
0:15:43 | this paper here |
---|
0:15:44 | so what |
---|
0:15:45 | we have considered a a a a of the problem was a multi bit it by um dynamic spectrum access |
---|
0:15:52 | um we believe this general approach would be useful for for solving a either |
---|
0:15:57 | uh rest we spend it probably even a mark of general mark of C your process is on the primary |
---|
0:16:03 | a what is the message here |
---|
0:16:05 | the method here is if we are dealing with a small T are bandit problem |
---|
0:16:09 | with a known |
---|
0:16:10 | uh then them |
---|
0:16:12 | and we want to achieve |
---|
0:16:14 | the same i reach word as we can get and the not |
---|
0:16:18 | so in general problem is intractable |
---|
0:16:21 | but the problem may have a whole |
---|
0:16:24 | if |
---|
0:16:24 | and the the not model |
---|
0:16:26 | the optimal policy has what we call a find that option |
---|
0:16:30 | by means |
---|
0:16:32 | there are only fun and two ways of |
---|
0:16:34 | do we find the and optimal policy only has four that possible for |
---|
0:16:39 | depending on the primary |
---|
0:16:42 | don't we treat each possible for |
---|
0:16:44 | as as a a a a we don't to know the prior |
---|
0:16:47 | parameters then we tried to or which one is actually a one for that and a line on them |
---|
0:16:53 | so that's the general message |
---|
0:16:54 | but of course for each specific problem we have to but yeah sure |
---|
0:16:59 | whether log |
---|
0:17:00 | oh i'm sure close to log where can be but she |
---|
0:17:04 | and in the context of dynamic spectrum access that's great we have been focusing are |
---|
0:17:08 | the problem is really possible because house |
---|
0:17:11 | of this i a universal structure of the our P policy for that |
---|
0:17:15 | no model pace |
---|
0:17:16 | and uh i a little bit advertisement |
---|
0:17:19 | um i be giving a another problem related a problem being about forty minutes |
---|
0:17:25 | uh |
---|
0:17:25 | because this one |
---|
0:17:27 | a uh we can only somehow for a very special cases where you have only to state markov chain and |
---|
0:17:33 | all the channels are are are arms a stochastic identical |
---|
0:17:37 | the what if we're dealing with a general fine state markov chain with the no in the transition problem |
---|
0:17:44 | what can we do there |
---|
0:17:45 | so this is paper actually is a dressing |
---|
0:17:48 | this complete be general model |
---|
0:17:50 | um but we can also achieve not require |
---|
0:17:53 | but |
---|
0:17:54 | the definition of the required it is uh |
---|
0:17:56 | we |
---|
0:17:57 | oh |
---|
0:17:58 | oh |
---|
0:18:02 | path |
---|
0:18:03 | i don't one person something to sort of question from the speech |
---|
0:18:08 | true |
---|
0:18:15 | oh i yes |
---|
0:18:16 | so that's essentially here |
---|
0:18:20 | you that cool |
---|
0:18:21 | yes yeah |
---|
0:18:22 | so |
---|
0:18:23 | that the required we get it is these G Q locked chi right so we want chi G to grow |
---|
0:18:28 | very slow is time T |
---|
0:18:30 | and uh so if you want to achieve a lot a lot should be |
---|
0:18:34 | uh for chi chi |
---|
0:18:36 | then |
---|
0:18:36 | eventually do sequence |
---|
0:18:38 | uh |
---|
0:18:39 | this is just |
---|
0:18:41 | the number of |
---|
0:18:43 | but what is the value of L for the current |
---|
0:18:46 | so |
---|
0:18:47 | L one i is |
---|
0:18:48 | well we start we play one i L L one |
---|
0:18:51 | slot |
---|
0:18:52 | so then for the first L once lots the these sequence was what have the same value a one |
---|
0:18:58 | for the that |
---|
0:18:59 | what a a a a a model be out to we play out to time |
---|
0:19:02 | so for the next L tools as well as the but will be out |
---|
0:19:06 | so |
---|
0:19:06 | L is increasing |
---|
0:19:08 | so these sequence will be also be in crazy but even slower or |
---|
0:19:12 | because it's repeating the same value |
---|
0:19:15 | but it |
---|
0:19:16 | yeah |
---|
0:19:19 | okay |
---|
0:19:20 | or running times to time well just a record some |
---|
0:19:31 | um |
---|
0:19:33 | i field to see if a connection was not little huh but this is really trying to learn |
---|
0:19:38 | um |
---|
0:19:38 | trying to maximise this throughput put and there known parameters |
---|
0:19:42 | but a a is more like the inclusion was your peer users here we even only have one |
---|
0:19:48 | uh and secondary user |
---|
0:19:49 | i i i probably miss |
---|
0:19:51 | um |
---|
0:19:52 | but an action that i i don't really see whether two problems i really to |
---|
0:19:56 | um |
---|
0:19:57 | maybe we can talk more data |
---|
0:19:59 | you to to rooms |
---|
0:20:01 | so so let's |
---|
0:20:02 | that's |
---|
0:20:02 | the speaker |
---|