0:00:18 | hi everybody |
---|
0:00:19 | uh my name is uh any check of each and today i'm going any i'm going to to to to |
---|
0:00:24 | talk about distributed optimization like |
---|
0:00:26 | uh uh uh in that some as talk before |
---|
0:00:30 | uh and uh i i i should say two things |
---|
0:00:33 | before beginning uh first uh of course i should acknowledge my |
---|
0:00:37 | uh a the to got don't key just some fun will the shame |
---|
0:00:40 | we are uh all the could exact combine take |
---|
0:00:44 | just some and what shame are also if you to used it to uh with this an R S |
---|
0:00:49 | and uh uh say also that some results |
---|
0:00:52 | i'm going to present uh where derive the very recently and uh are not in the |
---|
0:00:58 | uh in our uh i guess a paper |
---|
0:01:01 | so for starters the let uh |
---|
0:01:04 | uh us a few words about uh uh uh a action uh in i imagine a uh i i have |
---|
0:01:09 | a function uh F and i want to minimize a lot fat a finite dimensional |
---|
0:01:13 | uh factor space |
---|
0:01:15 | uh uh you uh all no uh uh uh uh i'm sure the deterministic gradient algorithm which consist |
---|
0:01:21 | a which is an a relative a them and uh uh uh consist of a |
---|
0:01:26 | following the the gradient lines |
---|
0:01:28 | uh actually if i want to minimize i uh i take the opposite direction with uh |
---|
0:01:33 | uh ghana a sub being uh a a a positive steps |
---|
0:01:37 | he's is uh if i know uh the the function uh F |
---|
0:01:42 | uh if i i i know a function F up to uh a a a random uh uh uh a |
---|
0:01:47 | perturbation |
---|
0:01:48 | uh which is |
---|
0:01:50 | a which has to be and biased |
---|
0:01:53 | uh i i could do the same thing and the the corresponding uh |
---|
0:01:57 | uh i is the is the celebrated the stochastic gradient uh had reason |
---|
0:02:03 | and uh and are um |
---|
0:02:05 | a classical assumptions one uh and full well behaved the of uh functions we can uh one can show that |
---|
0:02:12 | uh the the estimates a sequence converges to the to the roots of the gradient of a |
---|
0:02:19 | so uh we are going to uh to talk about this very uh minimization problem but in a the distributed |
---|
0:02:26 | framework |
---|
0:02:27 | uh |
---|
0:02:27 | so uh first |
---|
0:02:29 | the the outline of my talk use the following at first we are going to |
---|
0:02:33 | uh do with unconstrained optimization and |
---|
0:02:36 | the the the the algorithm i asymptotically |
---|
0:02:40 | uh and then uh move on to constrained optimization and illustrate this on a power a allocation exam |
---|
0:02:46 | so for and constraint uh optimization now |
---|
0:02:49 | we uh i imagine we have uh |
---|
0:02:52 | and network of and agent |
---|
0:02:55 | at work can be a a a a a a a from very different natures of sensors mobile phones roberts |
---|
0:03:00 | a central |
---|
0:03:01 | and agents are connecting are are are connected to according to uh uh a a random uh a graph with |
---|
0:03:07 | supposed to be a a time-varying topology biology |
---|
0:03:10 | and they want to uh |
---|
0:03:11 | to uh achieve a global mission and an for that there are able to to talk in the neighbourhood that |
---|
0:03:16 | to cooperate |
---|
0:03:19 | so uh more precisely in this talk we assume that each agents |
---|
0:03:23 | uh each agent I has the utility function |
---|
0:03:27 | uh F I and we want to uh the network wants to minimize |
---|
0:03:33 | uh the some of the utility function |
---|
0:03:36 | so uh |
---|
0:03:38 | uh |
---|
0:03:39 | we have to say a to and for less and on to uh difficult is the first difficulty is that |
---|
0:03:45 | uh a I i uh ignores |
---|
0:03:48 | uh also utility function of the the the also notes |
---|
0:03:51 | yeah it's we want to uh up to my something that depend on them |
---|
0:03:54 | so uh obviously corporation is going to be needed |
---|
0:03:58 | the second uh remark is that |
---|
0:04:00 | uh agent I uh only knows its own utility function up to a random perturbation |
---|
0:04:07 | so we also have to use |
---|
0:04:09 | the stochastic opera make approximation frame |
---|
0:04:13 | so uh a a a a de algorithm we are going to uh to analyse |
---|
0:04:18 | uh i |
---|
0:04:19 | i stated |
---|
0:04:20 | i i i'm going to comment and uh in the second |
---|
0:04:23 | uh uh uh uh uh is based on uh |
---|
0:04:26 | on i is a to the in the work of a T C send their at |
---|
0:04:29 | has been many contribution of afterwards |
---|
0:04:32 | uh interesting uh were by nee digit that an but menu menu many male work |
---|
0:04:37 | so um |
---|
0:04:39 | uh the agent |
---|
0:04:41 | but it's it's exactly the is the very algorithm that we so in the in the top by of that |
---|
0:04:45 | yeah that the it's a there are two uh two step for this uh agrees and |
---|
0:04:50 | in the first step uh |
---|
0:04:52 | uh each agent receives |
---|
0:04:54 | but uh the the patch the perturbed of the |
---|
0:04:57 | of the gradient and uh updates its estimate uh in a temporary re-estimate |
---|
0:05:02 | uh a that the |
---|
0:05:04 | uh uh uh uh and plus one i |
---|
0:05:06 | oh each agent and uh does this |
---|
0:05:09 | and then the talk |
---|
0:05:11 | uh and uh uh agent |
---|
0:05:13 | for a for instance here a agent i |
---|
0:05:16 | uh at dates he's |
---|
0:05:17 | he's fine and estimate |
---|
0:05:19 | using the temporal |
---|
0:05:21 | estimates of |
---|
0:05:22 | of his neighbours |
---|
0:05:23 | uh uh in a a and the weighted sum |
---|
0:05:26 | so W uh and plus Y i J |
---|
0:05:29 | has to be zero |
---|
0:05:31 | if the nodes uh are not connected |
---|
0:05:33 | this is the the the first of can string |
---|
0:05:38 | a the network model is the following there are also a constraint on the on the weights W |
---|
0:05:43 | uh W S must sum up to one in a a role and uh and uh and sits a double |
---|
0:05:50 | a stochastic metric |
---|
0:05:52 | first uh the first the requirement here is uh a is not very costly uh a uh uh a each |
---|
0:05:59 | agent and i can uh can tune he's weights so as to to to sum of to one but the |
---|
0:06:03 | second one is is uh is more difficult to achieve because agent has to uh |
---|
0:06:07 | to uh to cooperate to uh be able to sum up to one |
---|
0:06:11 | so there's a nice solution uh |
---|
0:06:13 | using P always go sleep as introduced in in so when boyd |
---|
0:06:17 | uh uh that ensures the the W rees |
---|
0:06:20 | matrices is our uh uh doubly stochastic |
---|
0:06:23 | uh we also we the uh we also assume that the is W R i a D that that that |
---|
0:06:29 | can be we and and this force uh assumption is a technical one |
---|
0:06:33 | uh i here with the spectral read used uh essentially actually loosely speaking it says that the network is uh |
---|
0:06:39 | is connected has one connected company |
---|
0:06:43 | so it's a very mild assumption |
---|
0:06:46 | so now let's |
---|
0:06:47 | so the uh is uh move on to D yeah two |
---|
0:06:50 | yes S to tick and i i Z |
---|
0:06:52 | we uh we define a a T to which is that we stack |
---|
0:06:56 | all the the estimates of all the agents in a single or a vector T to |
---|
0:07:00 | and are each interested in uh in two things first in the in the average of the estimates |
---|
0:07:06 | and also in the disagreements |
---|
0:07:09 | a a how the the the local estimates differ from the average |
---|
0:07:13 | so the the the difficult points the the difficult technical point is to uh is to propose uh satisfy a |
---|
0:07:20 | uh a satisfying conditions |
---|
0:07:22 | uh and there's the difficulty in uh in the stability |
---|
0:07:26 | a a be this is the property that all the the do vector or T does ben uh remain bounded |
---|
0:07:32 | with probability one |
---|
0:07:34 | and uh we uh we provide uh such a condition using a yep in a function and and what it |
---|
0:07:39 | when interest in minimisation we can take |
---|
0:07:42 | the function F the as our uh yep and a function i'm not going to |
---|
0:07:46 | to enter the detail |
---|
0:07:48 | oh the first result is uh that's with probably one uh consensus is achieved |
---|
0:07:54 | meaning that all the estimates |
---|
0:07:55 | uh achieve the eventually the same value |
---|
0:07:59 | uh and that the the average of the estimates converge to the set of the row of roots of are |
---|
0:08:05 | uh |
---|
0:08:06 | uh a gradient of F with which are are are uh are target but |
---|
0:08:09 | also if the if this set is the is made of isolated points the then we converse to one of |
---|
0:08:16 | this point |
---|
0:08:18 | uh we also uh uh uh prove the uh uh and all the to uh uh a results which is |
---|
0:08:24 | central limit am |
---|
0:08:26 | uh i'm not going to to to enter into the details but |
---|
0:08:29 | it it just says that uh under or uh uh uh |
---|
0:08:34 | good assumptions |
---|
0:08:35 | uh the the a normalized the a normalized the uh these agreements in distribution |
---|
0:08:42 | two |
---|
0:08:43 | um a a motion vector |
---|
0:08:45 | and i should point out that here uh |
---|
0:08:48 | the did the distribution of this cushion vector is degenerated since all do this the random vectors here are the |
---|
0:08:54 | same |
---|
0:08:56 | so instead of the of uh going into the detail it's me uh uh uh |
---|
0:09:01 | drew you attention on street consequence consequences of this uh |
---|
0:09:05 | uh a result the first one is that |
---|
0:09:08 | the estimate H D D estimates converges |
---|
0:09:11 | uh a a converge at speed uh square root of a of can i |
---|
0:09:16 | and uh a consequence of this consequence is that the the algorithm performs as well as the the centralized agrees |
---|
0:09:23 | and |
---|
0:09:24 | and also from the degenerate nature of the distribution with so |
---|
0:09:28 | we can uh uh uh uh a and that the disagreements |
---|
0:09:32 | uh uh uh are nick can negligible |
---|
0:09:35 | uh with uh |
---|
0:09:37 | with respect to the difference between the target uh value |
---|
0:09:41 | because the the the |
---|
0:09:43 | the distribution of the vector |
---|
0:09:45 | uh puts mask only on the consensus lie |
---|
0:09:50 | so uh another uh another uh remark is that uh uh uh what another um |
---|
0:09:58 | uh |
---|
0:09:59 | yeah remark is that it's since this is a a agreement is achieve |
---|
0:10:02 | in in the at very high speed |
---|
0:10:04 | uh compared to the to the mall |
---|
0:10:08 | to do the and the speed to watch to the the target value |
---|
0:10:11 | we we do not need to communicate to often and |
---|
0:10:13 | and actually we could show that's uh if the probability |
---|
0:10:17 | uh that there's a communication uh a goes to zero |
---|
0:10:21 | as as uh uh i i as soon as it i the it's lower than uh one of or score |
---|
0:10:26 | would of N |
---|
0:10:27 | we can still guarantee the convergence and and hence |
---|
0:10:30 | save networks and and keeps the keeping the same performance |
---|
0:10:35 | so now let's go to a too constrained optimization the my problem is the same except |
---|
0:10:40 | that now a my estimates are required to a to remain in the compact and convex set uh kept D |
---|
0:10:47 | of D |
---|
0:10:48 | and uh this uh this set is known by all the agents |
---|
0:10:52 | so uh we could use a a pretty much the same i reason except that now |
---|
0:10:57 | if one estimate |
---|
0:10:58 | uh goes out of a said D we uh bring it back to the boundary uh of the using uh |
---|
0:11:04 | a projection |
---|
0:11:07 | uh a a and of the second that's the second guy step is and change |
---|
0:11:12 | so uh a uh two remarks that the first remark is that the now no more stability issues which which |
---|
0:11:18 | is a difficult to technical point so it's a good news but the bad news that a the projection |
---|
0:11:23 | yeah introduce the introduces other a technical difficult |
---|
0:11:28 | a a uh a result is that the consensus is still achieved with probability you one |
---|
0:11:35 | and that the average of the estimates converge to the set of uh kick K you points |
---|
0:11:40 | and again if uh if this set is made of the isolated points then uh the average converges to one |
---|
0:11:47 | of this |
---|
0:11:50 | uh perhaps i should sketch the that should us keep the the the sketch of proof and and get it |
---|
0:11:56 | a to eighty five i have time |
---|
0:11:58 | so uh uh uh a a |
---|
0:12:01 | as an illustration uh we are are going to address the problem of uh |
---|
0:12:06 | of uh interference channel we have to uh |
---|
0:12:09 | source destination uh pass |
---|
0:12:12 | and uh we assume that there is no uh channel state information at the transmitter are only the nations |
---|
0:12:19 | observes that's the channel the channel gains |
---|
0:12:21 | uh and the the |
---|
0:12:25 | the goal here is that the dish the destinations rates |
---|
0:12:28 | in order to minimize the a |
---|
0:12:32 | the probability of error or uh of the the the channels |
---|
0:12:36 | and this uh i the exact function they want to minimize is actually |
---|
0:12:40 | uh a weighted sum of all the probability of error for each source path uh destination |
---|
0:12:46 | and uh the constrained come from from the fact that each uh transmitter can not |
---|
0:12:51 | uh uh use uh a a a power a power more than and uh a given uh |
---|
0:12:56 | uh a given threshold |
---|
0:12:59 | oh uh uh uh each user or has a a a a a an estimates uh |
---|
0:13:05 | it's it's the destination has a an estimate of all the |
---|
0:13:09 | the the the transmitter are whereas |
---|
0:13:12 | and we apply the the |
---|
0:13:15 | the algorithm reason we presented a just the replacing these the abstract the great that of F with our specific |
---|
0:13:21 | function to maximise |
---|
0:13:23 | uh to minimize sorry and uh and using uh uh uh |
---|
0:13:27 | the specific said D as the imposed by a can stray |
---|
0:13:31 | and the for for what we we we've seen we can guarantee that |
---|
0:13:37 | uh this |
---|
0:13:37 | i agrees um you is going to converge to a a |
---|
0:13:40 | a look minimum a a a zero ugh like a cake at points |
---|
0:13:45 | and the and with probability one |
---|
0:13:48 | so we could do the |
---|
0:13:49 | we could use a more uh a a a uh involved the |
---|
0:13:54 | uh settings the we more than uh |
---|
0:13:57 | uh with more than the uh one channel |
---|
0:14:00 | or from each source |
---|
0:14:01 | uh this nation |
---|
0:14:04 | so uh let's uh show some numerical a very rapidly some new right numerical results |
---|
0:14:09 | uh used here we plotted the um |
---|
0:14:13 | the power allocation for the two uh |
---|
0:14:16 | uh transmitters uh estimated the |
---|
0:14:20 | uh over the the time |
---|
0:14:22 | and they are that there are it's so |
---|
0:14:24 | there are two curves each uh each time one is for the centralized agrees algorithm and the other is for |
---|
0:14:30 | the distributed algorithm |
---|
0:14:31 | a center use them here is in blue |
---|
0:14:33 | distributed is in red |
---|
0:14:35 | here uh it's in |
---|
0:14:38 | to as the the the central and and green |
---|
0:14:41 | the this the distributed |
---|
0:14:42 | oh of this is for the first user are and this for the second user |
---|
0:14:46 | and we see that's we reach uh eventually uh well here is not so here |
---|
0:14:52 | uh that we reach uh uh a a a a a stable |
---|
0:14:55 | stable points |
---|
0:14:57 | as predicted by the the our results |
---|
0:15:00 | so let me and make me can conclude that the first we |
---|
0:15:04 | we study the uh distribute a stochastic at a reason |
---|
0:15:07 | uh in both constrained and and the framework |
---|
0:15:12 | we provide a realistic sufficient conditions problem for convergence with probability you one |
---|
0:15:18 | and we stated the central limit your M |
---|
0:15:21 | only for the the and constrained uh case |
---|
0:15:24 | the the the |
---|
0:15:26 | as the as true to work we we would like to get read of this uh |
---|
0:15:30 | the buttons the cast uh |
---|
0:15:32 | this doubly stochastic settee constraints of over the weight matrices since is |
---|
0:15:35 | a bit cumbersome for uh for network |
---|
0:15:38 | and and uh we would like so to establish a such a a a a a central limit your theorem |
---|
0:15:42 | in the in the constraint case |
---|
0:15:44 | thank you for a century |
---|
0:16:04 | i |
---|
0:16:09 | but why and this one is given to the node |
---|
0:16:15 | i |
---|
0:16:21 | in the is yeah |
---|
0:16:23 | in |
---|
0:16:29 | okay for a for as if if you want to to to get into the is that the one bite |
---|
0:16:33 | at least does and use of course that was go sixteen since it's |
---|
0:16:36 | i |
---|
0:16:44 | yeah the a our point here is not uh i we absolutely aware of a |
---|
0:16:48 | of that's it this algorithm |
---|
0:16:52 | yeah this is a sorry that is what what what what we say here that i i resume is is |
---|
0:16:56 | known and we we don't claim uh |
---|
0:16:58 | note from a from was from the is |
---|
0:17:00 | to T Vs case where with |
---|
0:17:02 | for |
---|