Speech Transcript - CONVERGENCE OF A DISTRIBUTED PARAMETER ESTIMATOR FOR SENSOR NETWORKS WITH LOCAL AVERAGING OF THE ESTIMATES

0:00:18	hi everybody
0:00:19	uh my name is uh any check of each and today i'm going any i'm going to to to to
0:00:24	talk about distributed optimization like
0:00:26	uh uh uh in that some as talk before
0:00:30	uh and uh i i i should say two things
0:00:33	before beginning uh first uh of course i should acknowledge my
0:00:37	uh a the to got don't key just some fun will the shame
0:00:40	we are uh all the could exact combine take
0:00:44	just some and what shame are also if you to used it to uh with this an R S
0:00:49	and uh uh say also that some results
0:00:52	i'm going to present uh where derive the very recently and uh are not in the
0:00:58	uh in our uh i guess a paper
0:01:01	so for starters the let uh
0:01:04	uh us a few words about uh uh uh a action uh in i imagine a uh i i have
0:01:09	a function uh F and i want to minimize a lot fat a finite dimensional
0:01:13	uh factor space
0:01:15	uh uh you uh all no uh uh uh uh i'm sure the deterministic gradient algorithm which consist
0:01:21	a which is an a relative a them and uh uh uh consist of a
0:01:26	following the the gradient lines
0:01:28	uh actually if i want to minimize i uh i take the opposite direction with uh
0:01:33	uh ghana a sub being uh a a a positive steps
0:01:37	he's is uh if i know uh the the function uh F
0:01:42	uh if i i i know a function F up to uh a a a random uh uh uh a
0:01:47	perturbation
0:01:48	uh which is
0:01:50	a which has to be and biased
0:01:53	uh i i could do the same thing and the the corresponding uh
0:01:57	uh i is the is the celebrated the stochastic gradient uh had reason
0:02:03	and uh and are um
0:02:05	a classical assumptions one uh and full well behaved the of uh functions we can uh one can show that
0:02:12	uh the the estimates a sequence converges to the to the roots of the gradient of a
0:02:19	so uh we are going to uh to talk about this very uh minimization problem but in a the distributed
0:02:26	framework
0:02:27	uh
0:02:27	so uh first
0:02:29	the the outline of my talk use the following at first we are going to
0:02:33	uh do with unconstrained optimization and
0:02:36	the the the the algorithm i asymptotically
0:02:40	uh and then uh move on to constrained optimization and illustrate this on a power a allocation exam
0:02:46	so for and constraint uh optimization now
0:02:49	we uh i imagine we have uh
0:02:52	and network of and agent
0:02:55	at work can be a a a a a a a from very different natures of sensors mobile phones roberts
0:03:00	a central
0:03:01	and agents are connecting are are are connected to according to uh uh a a random uh a graph with
0:03:07	supposed to be a a time-varying topology biology
0:03:10	and they want to uh
0:03:11	to uh achieve a global mission and an for that there are able to to talk in the neighbourhood that
0:03:16	to cooperate
0:03:19	so uh more precisely in this talk we assume that each agents
0:03:23	uh each agent I has the utility function
0:03:27	uh F I and we want to uh the network wants to minimize
0:03:33	uh the some of the utility function
0:03:36	so uh
0:03:38	uh
0:03:39	we have to say a to and for less and on to uh difficult is the first difficulty is that
0:03:45	uh a I i uh ignores
0:03:48	uh also utility function of the the the also notes
0:03:51	yeah it's we want to uh up to my something that depend on them
0:03:54	so uh obviously corporation is going to be needed
0:03:58	the second uh remark is that
0:04:00	uh agent I uh only knows its own utility function up to a random perturbation
0:04:07	so we also have to use
0:04:09	the stochastic opera make approximation frame
0:04:13	so uh a a a a de algorithm we are going to uh to analyse
0:04:18	uh i
0:04:19	i stated
0:04:20	i i i'm going to comment and uh in the second
0:04:23	uh uh uh uh uh is based on uh
0:04:26	on i is a to the in the work of a T C send their at
0:04:29	has been many contribution of afterwards
0:04:32	uh interesting uh were by nee digit that an but menu menu many male work
0:04:37	so um
0:04:39	uh the agent
0:04:41	but it's it's exactly the is the very algorithm that we so in the in the top by of that
0:04:45	yeah that the it's a there are two uh two step for this uh agrees and
0:04:50	in the first step uh
0:04:52	uh each agent receives
0:04:54	but uh the the patch the perturbed of the
0:04:57	of the gradient and uh updates its estimate uh in a temporary re-estimate
0:05:02	uh a that the
0:05:04	uh uh uh uh and plus one i
0:05:06	oh each agent and uh does this
0:05:09	and then the talk
0:05:11	uh and uh uh agent
0:05:13	for a for instance here a agent i
0:05:16	uh at dates he's
0:05:17	he's fine and estimate
0:05:19	using the temporal
0:05:21	estimates of
0:05:22	of his neighbours
0:05:23	uh uh in a a and the weighted sum
0:05:26	so W uh and plus Y i J
0:05:29	has to be zero
0:05:31	if the nodes uh are not connected
0:05:33	this is the the the first of can string
0:05:38	a the network model is the following there are also a constraint on the on the weights W
0:05:43	uh W S must sum up to one in a a role and uh and uh and sits a double
0:05:50	a stochastic metric
0:05:52	first uh the first the requirement here is uh a is not very costly uh a uh uh a each
0:05:59	agent and i can uh can tune he's weights so as to to to sum of to one but the
0:06:03	second one is is uh is more difficult to achieve because agent has to uh
0:06:07	to uh to cooperate to uh be able to sum up to one
0:06:11	so there's a nice solution uh
0:06:13	using P always go sleep as introduced in in so when boyd
0:06:17	uh uh that ensures the the W rees
0:06:20	matrices is our uh uh doubly stochastic
0:06:23	uh we also we the uh we also assume that the is W R i a D that that that
0:06:29	can be we and and this force uh assumption is a technical one
0:06:33	uh i here with the spectral read used uh essentially actually loosely speaking it says that the network is uh
0:06:39	is connected has one connected company
0:06:43	so it's a very mild assumption
0:06:46	so now let's
0:06:47	so the uh is uh move on to D yeah two
0:06:50	yes S to tick and i i Z
0:06:52	we uh we define a a T to which is that we stack
0:06:56	all the the estimates of all the agents in a single or a vector T to
0:07:00	and are each interested in uh in two things first in the in the average of the estimates
0:07:06	and also in the disagreements
0:07:09	a a how the the the local estimates differ from the average
0:07:13	so the the the difficult points the the difficult technical point is to uh is to propose uh satisfy a
0:07:20	uh a satisfying conditions
0:07:22	uh and there's the difficulty in uh in the stability
0:07:26	a a be this is the property that all the the do vector or T does ben uh remain bounded
0:07:32	with probability one
0:07:34	and uh we uh we provide uh such a condition using a yep in a function and and what it
0:07:39	when interest in minimisation we can take
0:07:42	the function F the as our uh yep and a function i'm not going to
0:07:46	to enter the detail
0:07:48	oh the first result is uh that's with probably one uh consensus is achieved
0:07:54	meaning that all the estimates
0:07:55	uh achieve the eventually the same value
0:07:59	uh and that the the average of the estimates converge to the set of the row of roots of are
0:08:05	uh
0:08:06	uh a gradient of F with which are are are uh are target but
0:08:09	also if the if this set is the is made of isolated points the then we converse to one of
0:08:16	this point
0:08:18	uh we also uh uh uh prove the uh uh and all the to uh uh a results which is
0:08:24	central limit am
0:08:26	uh i'm not going to to to enter into the details but
0:08:29	it it just says that uh under or uh uh uh
0:08:34	good assumptions
0:08:35	uh the the a normalized the a normalized the uh these agreements in distribution
0:08:42	two
0:08:43	um a a motion vector
0:08:45	and i should point out that here uh
0:08:48	the did the distribution of this cushion vector is degenerated since all do this the random vectors here are the
0:08:54	same
0:08:56	so instead of the of uh going into the detail it's me uh uh uh
0:09:01	drew you attention on street consequence consequences of this uh
0:09:05	uh a result the first one is that
0:09:08	the estimate H D D estimates converges
0:09:11	uh a a converge at speed uh square root of a of can i
0:09:16	and uh a consequence of this consequence is that the the algorithm performs as well as the the centralized agrees
0:09:23	and
0:09:24	and also from the degenerate nature of the distribution with so
0:09:28	we can uh uh uh uh a and that the disagreements
0:09:32	uh uh uh are nick can negligible
0:09:35	uh with uh
0:09:37	with respect to the difference between the target uh value
0:09:41	because the the the
0:09:43	the distribution of the vector
0:09:45	uh puts mask only on the consensus lie
0:09:50	so uh another uh another uh remark is that uh uh uh what another um
0:09:58	uh
0:09:59	yeah remark is that it's since this is a a agreement is achieve
0:10:02	in in the at very high speed
0:10:04	uh compared to the to the mall
0:10:08	to do the and the speed to watch to the the target value
0:10:11	we we do not need to communicate to often and
0:10:13	and actually we could show that's uh if the probability
0:10:17	uh that there's a communication uh a goes to zero
0:10:21	as as uh uh i i as soon as it i the it's lower than uh one of or score
0:10:26	would of N
0:10:27	we can still guarantee the convergence and and hence
0:10:30	save networks and and keeps the keeping the same performance
0:10:35	so now let's go to a too constrained optimization the my problem is the same except
0:10:40	that now a my estimates are required to a to remain in the compact and convex set uh kept D
0:10:47	of D
0:10:48	and uh this uh this set is known by all the agents
0:10:52	so uh we could use a a pretty much the same i reason except that now
0:10:57	if one estimate
0:10:58	uh goes out of a said D we uh bring it back to the boundary uh of the using uh
0:11:04	a projection
0:11:07	uh a a and of the second that's the second guy step is and change
0:11:12	so uh a uh two remarks that the first remark is that the now no more stability issues which which
0:11:18	is a difficult to technical point so it's a good news but the bad news that a the projection
0:11:23	yeah introduce the introduces other a technical difficult
0:11:28	a a uh a result is that the consensus is still achieved with probability you one
0:11:35	and that the average of the estimates converge to the set of uh kick K you points
0:11:40	and again if uh if this set is made of the isolated points then uh the average converges to one
0:11:47	of this
0:11:50	uh perhaps i should sketch the that should us keep the the the sketch of proof and and get it
0:11:56	a to eighty five i have time
0:11:58	so uh uh uh a a
0:12:01	as an illustration uh we are are going to address the problem of uh
0:12:06	of uh interference channel we have to uh
0:12:09	source destination uh pass
0:12:12	and uh we assume that there is no uh channel state information at the transmitter are only the nations
0:12:19	observes that's the channel the channel gains
0:12:21	uh and the the
0:12:25	the goal here is that the dish the destinations rates
0:12:28	in order to minimize the a
0:12:32	the probability of error or uh of the the the channels
0:12:36	and this uh i the exact function they want to minimize is actually
0:12:40	uh a weighted sum of all the probability of error for each source path uh destination
0:12:46	and uh the constrained come from from the fact that each uh transmitter can not
0:12:51	uh uh use uh a a a power a power more than and uh a given uh
0:12:56	uh a given threshold
0:12:59	oh uh uh uh each user or has a a a a a an estimates uh
0:13:05	it's it's the destination has a an estimate of all the
0:13:09	the the the transmitter are whereas
0:13:12	and we apply the the
0:13:15	the algorithm reason we presented a just the replacing these the abstract the great that of F with our specific
0:13:21	function to maximise
0:13:23	uh to minimize sorry and uh and using uh uh uh
0:13:27	the specific said D as the imposed by a can stray
0:13:31	and the for for what we we we've seen we can guarantee that
0:13:37	uh this
0:13:37	i agrees um you is going to converge to a a
0:13:40	a look minimum a a a zero ugh like a cake at points
0:13:45	and the and with probability one
0:13:48	so we could do the
0:13:49	we could use a more uh a a a uh involved the
0:13:54	uh settings the we more than uh
0:13:57	uh with more than the uh one channel
0:14:00	or from each source
0:14:01	uh this nation
0:14:04	so uh let's uh show some numerical a very rapidly some new right numerical results
0:14:09	uh used here we plotted the um
0:14:13	the power allocation for the two uh
0:14:16	uh transmitters uh estimated the
0:14:20	uh over the the time
0:14:22	and they are that there are it's so
0:14:24	there are two curves each uh each time one is for the centralized agrees algorithm and the other is for
0:14:30	the distributed algorithm
0:14:31	a center use them here is in blue
0:14:33	distributed is in red
0:14:35	here uh it's in
0:14:38	to as the the the central and and green
0:14:41	the this the distributed
0:14:42	oh of this is for the first user are and this for the second user
0:14:46	and we see that's we reach uh eventually uh well here is not so here
0:14:52	uh that we reach uh uh a a a a a stable
0:14:55	stable points
0:14:57	as predicted by the the our results
0:15:00	so let me and make me can conclude that the first we
0:15:04	we study the uh distribute a stochastic at a reason
0:15:07	uh in both constrained and and the framework
0:15:12	we provide a realistic sufficient conditions problem for convergence with probability you one
0:15:18	and we stated the central limit your M
0:15:21	only for the the and constrained uh case
0:15:24	the the the
0:15:26	as the as true to work we we would like to get read of this uh
0:15:30	the buttons the cast uh
0:15:32	this doubly stochastic settee constraints of over the weight matrices since is
0:15:35	a bit cumbersome for uh for network
0:15:38	and and uh we would like so to establish a such a a a a a central limit your theorem
0:15:42	in the in the constraint case
0:15:44	thank you for a century
0:16:04	i
0:16:09	but why and this one is given to the node
0:16:15	i
0:16:21	in the is yeah
0:16:23	in
0:16:29	okay for a for as if if you want to to to get into the is that the one bite
0:16:33	at least does and use of course that was go sixteen since it's
0:16:36	i
0:16:44	yeah the a our point here is not uh i we absolutely aware of a
0:16:48	of that's it this algorithm
0:16:52	yeah this is a sorry that is what what what what we say here that i i resume is is
0:16:56	known and we we don't claim uh
0:16:58	note from a from was from the is
0:17:00	to T Vs case where with
0:17:02	for

CONVERGENCE OF A DISTRIBUTED PARAMETER ESTIMATOR FOR SENSOR NETWORKS WITH LOCAL AVERAGING OF THE ESTIMATES

Distributed and Collaborative Signal Processing

Presented by: Jérémie Jakubowicz, Author(s): Pascal Bianchi, Gersende Fort, Walid Hachem, Jérémie Jakubowicz, LTCI Telecom ParisTech / CNRS, France