Thank you, Brian.
Good morning.
It is my honour to be here
and this is my first time in Interspeech.
I thought that
Interspeech have an interesting culture that before the talk there is dancing and singing,
but it was not offered to me.
So
and I saw many old friends here, many familiar faces
and this community is technically like a cousin to me.
I have learnt a lot from this community.
I'm working in signal processing in general
with application to communication, multimedia and many other things, but not speech.
I don't know why, but I found the answer yesterday.
thing, okay.
Yesterday,
when I first came to America as a teaching assistant
I was teaching assistant for signal systems
and I had to teach students there's linear shift in invariance ??,
there's a concept ??
And every time they were laughing behind me,
so I asked them once: Why are you laughing?
And they told me.
F. You need to pronounce your F.
That's why I cannot do speech processing.
So
today
just like Brian said,
there's no speech, there's no keynote, if I submit a paper to Interspeech it will
be 'yes' or it will be rejected,
because it doesn't match heading, keynote yesterday or keyword at all. However
I hope this can be useful to you. I learnt something like hidden Markov model
and deep learning
from this community and I hope that what you see today, the idea, may be
useful to you.
Okay,
so the story begins.
First
social media.
This is a new phenomenon. More and more decisions and
activity in our daily life are being recorded, tracked and shared.
It's scary, right?
Yes, it's scary. This is kind of user-generated big data.
For example like our motion, our movement everywhere, they are being tracked.
Yes, it is scary. However it is also a 'plus'.
Some potential and very interesting and important research direction for us
and machine learning is a very powerful tool
as being used in this community and also potentially
in many other enterprises, such as social media.
Machine learning aims to use reasoning to find new and relevant
information given some background knowledge. There're many applications
and
you all know, I believe many of you using that machine learning consists of three
elements:
representation, evaluation, automatisation. I will not get into detail of these.
Although it is a very powerful tool
there are limitations and constraints.
Why?
Because the generalization assumption may not hold.
Especially in user-generated data scenario.
Why?
Because the user behaves differently at different time under different settings.
You behave very differently when you are hungry or
you need to go to restroom then you cannot sit here, yeah? You, we
at different we may behave in a different way.
And the training set may not be statistically consistent
with the testing set.
That was the assumption in this machine learning algorithm
and also single objective functions
cannot cover user's
difference interests.
Users have different interests and therefore they have different objective functions.
And users are also rational and therefore they are all selfish. They all want to
optimize their own utility.
And we need to consider this.
So
the knowledge contained in the data
can be difficult to fully exploited from
this machine learning macroscopic view. Although it's very powerful.
Because the data are the outcome of users' interactions.
We cannot ignore the users' interactions.
We need to explicitly consider interaction of users
and basically that mean their decision process
should be taken into consideration.
And in the sequels you will find out that Game theory is
powerful tool
that can study users' strategic decision making, because we can take into
multiple objective function at a single time and we can
also take into consideration of users' local and individual interests
into consideration.
Okay,
let's face it.
Learning is for what? Learning is for making optimal decisions. We don't learn just for
fun.
And learning and decision making are coupled together
due to network externalities. What is that? What does I mean network externalities? I mean
that my decision, my action
will affect your decisions and your behavior.
We all affect each other. You made decision to come
here and I made decision to come here, that's why we are all here.
We affect each other. However what is the problem?
The problem is that there is a missing link.
And the missing link is that
in a very traditional sense machine/social learning and strategic decision making are still two
unrelated disciplines.
And so here
we propose
a notion of decision learning to preach in between
that
when we are doing learning we also need to consider the decision making process.
So this is a
title of the
talk. 'Decision learning' by that we mean learning with
strategic
decision making.
Okay,
I'm going to use three example to illustrate
how decision learning work.
First decision learning with Evolutionary User Behavior. Second is with Sequential User Behavior and then
we'll talk about how we design mechanism
from system perspective.
First let's talk about Evolutionary User Behavior. Here I want
to use information diffusion over social network as an example.
Okay,
now you see
this is the Twitter hashtag during 2008 US Presidential elections.
It
captures spreading of comments and phrasing by candidates.
And look at
pink one, you can see that pink one
this one says it is lipstick on a pig.
It is a phrase by famous Sarah Palin, if you know her.
And you can say that when a phrase was comment
there is a duration and it'll reach to the top
and then eventually it decreases and then goes down.
You see that everything have a durations.
And
this is for the political comment
and this was done in 2008. Now, if you are politician you want to measure
that your
message can be delivered. When to reach to the peak?
When will be the starting point and when will be the end point?
It had to be before the election not after, right?
Okay, now this is the other point.
That is online advertising for new product.
How to predict
the popularity of this advertisement and eventually what is the market share?
Can we predict that?
Okay, all of this relate to one problem that we call Information Diffusion Problem.
How information diffuses?
Users exchange information over social
network. Okay, we're talking about this on the our social network
and the study of this information diffusion,
this problem is important. Why? Because if we understand the dynamics
of this information diffusion. We can predict and maybe control
the start/peak timing and distribution.
And we can estimate a final reach of the populations
it also perhaps we can identify the influential users and links
for our purpose.
Okay,
now
Dynamics In Information Diffusion.
It is a sequence of decision making processes.
It is not what we said, what we though that information just
spreads by itself.
No.
For information to diffuse
it relies on other users to forward or post the information they receive.
Like I have my social network and if you are in my network, I post
something and you will see that
and once you see that you have to make a decision to post it or
not
and then from there now everybody when you are in somebody's social network
everybody have a decision making process: Should I post, is the information exciting,
are our friends interested in it? Or I don't want to post because
it could be embarrassing.
There is a decision making process here
and information diffusion is an evolving process
like our evolution.
So
for information diffusion we ask
whether to post
or forward or not.
And for evolution process
when there's a gene that have a mutant
we ask
whether to take mutant or not.
So it's similar, so we want to relay this together and model this problem.
And social network is always illustrated by a graph structure and we understand that so
we are using the
Graphical evolutionary game.
Now
what is this Graphical Evolutionary Game? Let me make a very simple explanation.
Each player have
a notion of fitness.
How fit you are? How fit I am?
Okay and so this is an evolution that we all know. The stronger one, the
fittest will survive.
A users' fitness
is determined by this as you can see I tried to use this equation and
this is the user's fitness.
And B is the baseline fitness that
depend on myself, okay.
And this U
is interaction payoff
determined by the environment.
And this alpha is selection intensity.
So if I'm going to interpret this I would say
that
someone, like for someone here Haizhou Li here in Singapore
his fitness depends on his own fitness
plus the selection intensity multiplied by overall environment fitness of Singapore together, that is
his own fitness strenght.
I think that makes sense.
So
an evolution process has two important elements. One is selection and one randomness.
These two are very important in the evolutionary process.
So let me use this example. We have a graph
and then randomly we select
a node
that can be a user.
And once we select this one we have already randomness in here, now
we are going to make selections, okay.
So
we have to compute fitness.
Its own fitness and also its neighbours' fitness.
I want to compute all these fitness and we select
the strongest fits
to imitate.
It doesn't have to be imitation, I just used imitation as an example.
There's a ?? protest and ?? despair there many different kinds of processes.
Okay,
so
we need to know which one is the best
fits
and then we are going to imitate
and
this is the best fit and we imitate,
okay.
So this is the
a typical evaluation process that is determined
by users fitness.
Now
Information Diffusion Over Online Social Networks
I have just say that
online social network can be presented as a graph.
As we can see it's a graph.
And this information diffusion depend on the user's actions to forward or not depends on
utility.
If I like or my friend may like it I will calculate the utility of
that and if
that utility is strong I'm going to forward or post.
If that is embarrassing then I am not going to do it. It'd damage my
reputation.
So this is a very similar to the
Graphical evolutionary game.
Now on the left hand side
is the Graphical evolutionary game. On the right hand side we have the Social network,
okay.
Now
there we have notion of fitness as I
explain. Right hand side we have the user's utility. Now we
basically we try to model the entire problem using this utility.
Is there interest or not that we can do so. So on left hand side
is that how can we map
and relay that utility or our interest to the notion of fitness
and then we can use Graphical evolutionary game which is a
very powerful tool that have been useful sometime, that we can use.
Okay, so how can we calculate that? I'd use a very simple example here.
Now let's look at this graph.
Now we have a user.
He chooses strategy to foward.
Okay. What is his own fitness?
Here I choose B=1.
This is based on his own.
And the neighbour is what?
We have Kf neighbor here. We have Kf neighbour
with utility that forwards also.
The rest of them K-Kf
choose not to forward.
Okay, so this is the utility of all his neighbours and you have selection intensity
if that
overall together with his own
fitness. This is the overall fitness of
the user who have the strategies to forward.
For someone not to forward,
we'll be doing the same,
okay.
Now for this
network, we have this configuration probability.
We have the fitness of the 'center guy'.
Now we need to calculate the fitness of all his neighbours.
Let's say someone with strategy to forward also.
What is the
fitness
and if someone with strategy not to forward
what is the fitness? So we calculated all this.
Now
with this we want to calculate what is the
probability that someone will change from forward strategy
to not to forward strategy?
And we can calculate all this, okay. The details I will just omit.
By doing all this
we can find the dynamics of information diffusion.
This dynamics of percentage of users who will
forward the information will be captured by this equation that we can calculate
and
the final evolutionary stable state
can be also obtained by this. And let me explain this. What is this
if the utility of forwards, all forwards is much larger than the rest who will
not do anything about it then
one hundred percent of the users will all forward and post.
Because it's too advantageous to them.
If on the other hand
not to forward have the highest utility and to forward
have the lowest utility then noone is going to do it.
Follows in between, there will be a stable percentage of population that will do that,
okay. This looks complicated.
Now K is degree of freedom. Meaning for each node how many friends you have.
This assumes that
(scale) is a large enough, okay. Let's assume K is sufficiently large. Now you can
see that basically it
is independent of the degree of freedom.
And meaning what? Meaning only the utility can determine how much percentage of the population
will forward or post information or not.
Okay, now let's see this
experiment. These are real data we acquired from Facebook. In this particular graph we have
4039 users with 88234 edges.
And here we have ten
subgroups, ten major user subgroups. First this is a social network so you can see
that in
the low scale it is a straight line here. This is so called scale-free network
meaning
that this is low scale so low scale people have very high
degree of connectivity, meaning it's very powerful. It is exponentially less, okay. So that is
something that we all knew before.
Okay, now let's look at Evolutionary Stable States.
That is
now let's look at left hand side first.
We use four cases as examples. So now we know the utility function, okay. Utility
function of
four parameters now are our model parameters. We want do model of information diffusion process.
So now look at this. The first case.
If utility of forwards is very high or higher than others, then what?
One hundred percent of user or everybody will post.
If
it reduces
and is not as good
however still good enough then
then there are some percentages in this case some sixty and some percent that will
post or forward.
If it's getting less
then about thirty and some percent
will post and forward.
If on the other hand
it's not good at all. The utility is so bad, there's nobody left to post
then
nobody is going to forward it
and do anything. And there is a rising time and we can calculate that time
and eventual population,
that percentage of people. We said there are ten subgroups and in each group they
all behave the same. This
is just to show that.
Okay, now this is something that we say: Okay, if we had a model parameter
what is the behavior?
Now, how about we don't have the model parameter. We don't.
We only had the data and this of the data from MemeTracker.
MemeTracker is this
it is
this news, this cycle network. It builds map
for the daily news cycle, okay.
By trying to link
that what others,
these are news report,
for these information.
You see as example here is a comment: we're not commenting
on that story I'm afraid. Okay. With this
then someone reports: we're not commenting on that. And there are three different links. And
then somebody
changed a little bit: we're not commenting on that story.
And then there is a link, okay. So
this is a very good source of
to find these information, how you diffuse, okay. So we had this data and this
is a huge amount of
data. As you can see that we have more than 172 million news, articles and
sources.
Okay, so we might from here
now what we want to do is
we want to learn the utility
that is, I have this vast amount of data,
can I find a utility? I can reduce to few parameters to describe how informations
may diffuse.
Okay, this is the result.
This is dynamic of four pieces of information.
This curve associates with word
Google Wave.
This associates with the word
Tehran.
The grey line is a real data.
And the red one is our fitting, our result.
You can see it very well and the blue line
is using these only
using only the machine learning approach.
So you can see that
what we can do can fit much better.
And most importantly that not only that
we describe this information diffusion phenomena into, in this case only in two parameter, because
we normalize these two, okay there are more and more. We only used two parameters
we can describe how the information
diffuse, when it stops, when it reached to the peak and the
way it (ends) and if we only have partial information we can predict.
I didn't have something to show you we can predict
and also.
Okay.
Now
let's do this experiment. We have five
group of sites from the database, okay. We know this is a
large database. Each group contains five hundred sites.
And we estimate the final stable population,
the lowest population that posts or forwards these information.
So these five groups we can find very interesting.
At the end it had different percentage of
users in this particular group.
That will post or forward information.
And
this is to show that the black one is, these are from,
these are our model.
And the red one is from the real dataset and you can see that they
are very consistent.
Okay, now back to this.
What does this means?
This is very interesting. We can see that group five
behave cohesively
or share major common interests.
That means that those users in this particular group
more than eighty one percent will agree with the social (group),
with their own networking friends. Whenever they post something
eighty one percent that they will also post.
So
and however in the group one. There only about
nineteen percent will do so.
Therefore if you hire an advertiser you want to
do advertisements. Which group are going to forward to?
And then to expect
a high
percentage of
results that many people with see.
That would be this group.
So we can
expect our group five in this particular case behave
cohesively and share major common interests. However group one
share rather very little common interest.
This is to mine
the coherence of group and
if we can do anything about it or not.
Okay.
Now I'm going to change gear to talk about Sequential user behavior.
Here I'm going to use some
well known website that we collected data from to
illustrate the problem. You all know Groupon. I think some of you may also use
it and advantage of it,
right? On Groupon we sometimes find out there is a very good deal for a
restaurant
that you always go and we buy from there.
And
and Yelp.
We're also using Yelp.
When I am here I am using Singapore app to determine which restaurant I want
to go.
Once we pull them together it gets very interesting.
We found that
this is a Yelp rating
for some very good restaurant.
We found that
Yelp's star rating decline, okay you see this is declining after a succesful Groupon deal.
With rating that's very high and
many people buy this Groupon
and then in the next few months you can see that
its rating is declining.
Why?
This is a negative network externality
in place here. The one I'm mentioned, I'm going to explain that.
So this degrading quality may be due to overwhelming number of customers, because it's too
successful. Many people buy and then in the following day many people will show up
and they only have three waitress. And then you have one hundred people who show
up. Quality would not be good.
Kitchen is limited and may not be good.
That could be the reason.
Okay, so
what is a phenomena here?
Learning
to get best deal. Everybody want to get best deal.
And then they make a decision to take advantage of that and then their decisions
affect each other.
Reduce all this ?? fashion from each other.
That's what we call
negative network externality.
Your decision and my decision
eventually affect each other,
okay.
So how can we have model this problem?
I like to go to a problem in machine learning called Chinese restaurant problem or
Chinese
restaurant process. Some of you are using it also.
I don't need to explain this is Singapore if you don't know
what Chinese restaurant problem is, just choose chinese restaurant you go to, okay.
So
Chinese restaurant problem
we have limited round tables with unknown size
and this round table is of unknown size could be cloud service or it could
be a deal,
okay.
And customer which choose table to sit
or open a new table.
For the Chinese restaurant process we have infinite number of a tables,
with infinite number of seats.
A customer enters and with predefined probably to
either choose table to sit or open new table.
This is non-strategic because it's parametric, okay.
So now
we introduce Chinese restaurant game.
We want to introduce this strategic behaviour, this decision making here. Why? Because
eventually we want to understand the negative
externality effect, to model it and to understand that.
Okay, so this Chinese restaurant game.
We have table
with unknown size
Rx (θ):
is system state (θ). System state is the condition of the restaurant. How much money
they have, how much
budget they have and the environment, okay.
A customer has signal S about the system state from
advertisement, from friends, from wherever they have been before.
Or from advertisement.
Which table to seat
to have maximum utility?
You know that if you go to chinese restaurant some day it'd very unfortunate many
people seat
with you and eat, is that right?
Utilities is very bad. It's not confortable at all.
And this is a negative network externality.
More customers at one table less space for each user and therefore
utility is less
for past and future.
What does that mean in future?
That mean
when you come in
based on the condition you sit at a table that may not be the best
decision.
In ten minutes
ten people may sit with you, okay. So you need to predict the future and
that is the problem and the
future affect you decision right now.
And that make it in fact difficult.
Okay, so this is a sequential decision making problem, customers make decisions sequentially
and information of observation that we have is that for every customer comes,
every player, they can only seat
at that moment how many people seat at the table.
And then also the signal, okay, from those who seat before him
that's all he can get.
So when first customer arrives
it picks up a table
and then the second one
pick another table, and so on and everybody picks one. Sequential decision making process.
Can
number three user make the best decision at that moment?
No.
If he cannot predict the future and see the future his decision at that moment
may not be the
best, because people not yet coming are going to affect him,
okay.
So now let's assume there is a perfect signal, meaning that
everybody know all the signals, all the decisions other people may want to do given
that
condition that they have.
I would not get to the details so I'll explain
then there is equilibrium grouping meaning the best utility is
that when you choose that
with this signal from everybody it will be the best.
When anyone changes to other table
the utility will only reduce.
So that would be the best strategy.
When a signal is perfect.
Let me explain this. So what will be the strategy?
When everything was,
all the condition if I come in I will know
ten minutes or thirty minutes later of what people come here
I have the perfect information. You know what?
If I come in
I'm going to know which table have the best utility and I'm going to choose
that.
The second one will choose that also, until the table is filled. And then they
will go to the
second best utility and do so, because the signal is perfect, right.
So
customers who come early
have
advantage
but those were protect signals.
In real life
it's not that easy.
It's imperfect signal. We cannot have the
perfect signal and therefore we have to learn
from the noisy signal to form our belief and this is the learning process
and we can use Bayesian learning or some other thing to construct the belief that
we have,
okay.
So
now this.
What is the best response? The best response is:
let's choose the best utility.
But in order to choose the best utility we need to know the final distribution
of the users and like
I said we need to know who come in, where they will seat?
So this utility function depend on subsequent users' decisions
and this is a negative network externality.
And we need to predict the future decisions to find a best response
therefore we had to ?? backward induction to do so. So that everybody is going
to have optimal
decision we can predict and then backward induction from there.
Okay. Now this is the summary of this Chinese restaurant game and this is some
example.
Remember this?
I say that
if there is a good Groupon deal
the utility will go down like that, okay. The red one is the real data.
And now let's use linear model in this model
the utility, okay.
So
this is utility that we model. We learnt from the real data that this is
the utility.
And based on this utility
this utility and then we can attain all these parameters that we have and then
we can use
what we had just derived.
We consider all this negative network extarnalities. We can predict the future to find optimal
result.
As you can see the blue one is learning with negative network externality that we
propose and then
the red one is without. Meaning only at that moment.
We can increase ?? DSSR rating.
We can indeed increased the rating by doing so.
So if we consider this
negative network externality then the strategy will perform much better.
In fact we can also use this to devise strategy.
This is a New restaurant's strategy, okay. Now we have a new restaurant.
This is a low quality restaurant and a high quality restaurant.
Now
what is the number of customers?
Look at this. For a low quality restaurant
in order to increase the number of customers who arrive
the deal price had to be less.
If your quality is not good, you make it cheaper.
If
high quality restaurant even if deal price is a little higher you can still make
a good profit.
This is a signal quality. Your advertisement and also all affects to be none.
If you a high-quality restaurant you want signal quality
to be very good, so people know this is a good restaurant.
If you have a low quality restaurant. Don't let know too much about you, okay.
That would be better.
And next we get to revenue. This is now what is the revenue?
High quality and low high quality restaurants they can also all achive peak revenue.
So
for high quality restaurant to achive peak revenue
the signal equality had to be good
then the revenue was higher. Let people know
and then
through the word of mouth people will come.
However for low quality restaurant it's not good.
Don't let people..
The better the signal quality the lower revenue you are going to have and see.
Okay, so I think this all is common sense, however it's highly nonlinear
and we had to model that and this is indeed a very highly nonlinear problem
and I don't have this
slide to show you. And in our paper you can take a look. So high
quality restaurant should try
to increase the signal quality and low quality address should hide the
quality information and use low deal price to attract the customers.
Indeed we developped a family of this. We had a Chinese restaurant game and we
also had Dynamic Chinese
restaurant game for people who come and go.
And that is for example just like this Wi-Fi, okay. Many people come here and
many people may leave and
they decide which network they want to choose. And also we had this Indian Buffet
Game.
And that is
interesting. We see all this in US, okay.
Chinese restaurant in this all round tables you can
see in Indian restaurant also had these buffet thing.
Okay so
we have multiple
choice of what we can do
and this can be applied to many different scenarios.
And we've applied to for example we have seen this in Yelp, okay. There is
Yelp on Amazon.
And this is so for this review, okay. It's a sequential decision and Question and
answer sites. You can see
all these Question and Answers sites, we also had a problem on this.
Also many other sites as long as you have this sequential decision process that come
in
and why these .. we can apply this to that, because social computing system in
this case, they are all structured in the same fashion that
users arise sequentially
and to make decisions on
whether to do something or not, whether to forward something or not.
And also there are externalities amount users.
Your decision, your actions will affect each other and also there is some uncertainty. So
in this
sense Chinese restaurant game will be a natural choice in modeling and analyzing these users'
behavior, so
that we can come up with the best
piece of the model we have seen and also to devise some incentive mechanism,
okay. To attain some results that you want.
And I believe that in speech and language community
that you also have something that you may want to do.
Speaking of this mechanism
we want to see this I believe that you use a lot of this a
labeling also in this community.
Now you have seen all this
that's all from users' point of view, so you may ask can
system designer design something based on this?
Yes.
So we want to see if we can design some mechanism to
attain something that we really want.
In this case I want to show you
is that how can we collect
good
data
okay.
Large scale labeled dataset
is very important in many applications. I think you all know better than me.
Because more data leads to better accuracy,
however large scale in-house annotation is very expensive.
You need to have a thick pocket to do so.
And often it becomes a bottleneck for that.
And recently you have seen that microtask crowdsourcing is very popular.
It claims to be a very promising way, because it can handle large volume, short
time and low cost.
All the benefits
and a question is: is that so?
We have many website that can do so,
okay.
There's a problem
you see all the
positive side, but there's something that didn't tell you.
That
due to limited budget, okay you and I don't have deep pockets, okay
so that the collected data often have low quality.
Why
low quality? I will explain that. T mutual incentive thing, okay.
Now
for machine learning all signal processing, okay.
Sometime we say okay
let's deal with it, let's cope with this
so let's filter out low quality data or cope with
noise with this modified algorithm to be more robust and probable.
But there is after-effect,
okay. This is a our typical behavior up to that.
Now with these decision learing solution
maybe we can ask
can we do a better job before that?
Can we incentivize
with a mechanism to ensure high quality data from the first place?
Okay
you want to give me more money? No, I don't have money to get you.
Let's see if we can detain ?? something
that can make you happy.
We allow
increase in your budget, maybe reduce your budget, okay. What that is
okay.
This
crowdsourcing
has such characteristic
repetitive and tedious small tasks. Yes or no, correct or not correct. And you do
that
and how much do they earn?
USD 0.04 cents for one thing.
Too many
too many of them have house or wife and they have nothing to do,
when baby is sleeping they can make a little bit more budget, so they can
help pay for some of the bill.
So worker receives a small amount of reward and there's no competion among workers.
So what's the problem?
The problem is that this may lack proper incentive.
It is profitable for a worker to submit poor solutions.
Yeah?
Because of the
the number of solution is just too high, nobody is going to check on it.
It'd be too expensive for you to check on it, right?
Think about it. That's why it's expensive.
So
and however you want to check on all this, just like I said.
Provide incentive is challanging. Why? Requesters typically have low budgets for each task
and also to offer better incentive you have to pay more, so people can do
better job, you need more
money and also in order to verify the solution is also expensive. All these are
problems.
So what incentive mechanism should
requesters employ to collect high quality solutions in a cost-effective way?
So here we propose a mechanism
that sounds interesting and we also have the Oracle analysts that found it can work
and we also did
experiments, so what is that? I'll show you.
Now up to now there are two very basic reward mechanisms
that we can do so. That we can evaluate the solution.
One is called consensus mechanism. Consensus mean what? Okay
now I have all these solutions people submitted.
I do a majority vote. I don't know the right answer but
I do a majority check. After majority check I know it's correct.
Correct fits the majority check.
Or I can do reward accuracy mechanism. Meaning I can
I can verify
the solution is correct or not with probability.
It's too expensive with probability. I want to verify all of them, but I don't
have time.
If I had time I'd it myself.
Okay, now I have these people so I may have some probability points, probability principle,
probability
that I'm going to make a sample
and then to check on the solution.
Now you can see these two had cost C or ??
And this relates to a parameter and this is not under control of requester.
So there's a problem here.
This cost, the one I mentioned, is a fundamental problem.
For the traditional (methods) it's a problem.
So
now
before we do so
let's
come up with this incentive mechanism first that's called Mt, okay.
This is called Incentive Mechanism via Training. Idea is this
to employ quality-aware training to reduce mechanism cost.
What does that mean?
We will assign
enforced unpaid training to workers when
performing poorly.
You flip off one poorly
then they had to go to some training time and in this training time they're
not going to make any money
okay and to guess some credential back so before they can do that again.
So this is more like a punishment if they don't do well.
So
okay now we had two states. Everybody had two states.
What are they?
Okay this is normal.
They produce results, give them money so that they are happy.
However
if they don't do well
they are going to go to the training state.
And in this training state workers have to do a set of training tasks.
Unpaid.
Okay
to gain qualification
for the working state.
Okay this can be just a policy
to deter it,
not necessary may be implemented. You will see that.
So everybody now
had two states.
Now you can see two states.
One of these is worker's action at working state.
Now everybody's action at working state.
This will determine the probability of this worker may continue to be in working state
or
he may go to the training state.
And for how long it may get from training state back to working state.
Worker will prefer to be here to make money.
However if the job is not good enough he is going to go here. Nobody
want to come over here.
Okay, so worker's action at this moment
may not only affect his immediate utility, but also his future utility or both
together.
There's a notion of this long-term expected utility here
and basically for you formulate this problem
it is
is a like the
Markov Decision Process (MDP).
But this Markov Decision Process is not easy, because that
the
this MDP faced by each workers also depends on others' actions.
Our action, how's the quality of our work is depending on how others are doing
their quality work, so it is indeed
Game-Theoretic Markov Decision Process. That means what you are doing affects what I am doing.
What I'm doing will affect what you're doing.
Okay
so from here
I will just summarize the results. Basically we show that
with this Game-Theoretic MDP
there is a guarantee of Nash Equilibrium
as long as the number of training set N is large enough.
If the number or training set,
I mean the training problem to be done N is too small,
nobody will worry, okay. I go to the training state, very soon I can get
out just like somebody commits a crime,
you put them in the jail and one day
they come out. You know what? Many people walk out and in everyday.
If you put them in for ten years. Very few will .. they will think
twice before do something again.
Okay, now
we also show that
if we
follow certain design procedure
we can achieve arbitrary low mechanism cost. The cost can be even
very small, almost to zero, if we design it correctly.
Okay, now let's see the experiment setup.
Task, a very simple task
this is to calculate the sum of two randomly generated double digit numbers.
Second year,
second grade student can do so, okay. And (workers are given) ten point for each
accepted solution.
And the goal is for
these participants to maximize or to accumulate these points. This is not real money. We
work
with points, we are doing an experiment.
And the text assigned to these participants comes in three sets. And each one is
limited to three minutes, okay.
Only in this limit had to do so.
And so we had these three sets.
The first set is that reward accuracy mechanism with sampling probability 1, that mean
in this
in this set
we tell all the participants
that we are going to check your every solutions.
And the set number two
with sampling probability 3.
That we will only have thirty percent or probability 0.3 to check your solutions.
That is set two. Set three is
sampling probability policy 0.3 only, however we also tell them we also have this incentive
mechanism.
That is if you don't do well enough
you are going to do some training problem before you can come back to do
that.
Okay, so what are the results that we get?
If we tell them that we are going to have this sampling probability 1, you
can tell that
the accuracy, this is the histogram, number of wokers and their accuracy, the accuracy is
quite good.
With 0.3 probability you can see that most of them don't care
okay, because only
0.3 probability will be checked.
And this is the sampling probability.
You can say .. how about this guy? You know there's always
nerd, okay. And they always do good job.
And now with this
sampling probability 0.3. We tell them: hey we have this, if you got checked, you
are going to put
into there. Now you can see the result is basicly as good as this result.
And you say: Okay, who are these students?, can they do this easily?
They are all
engineering students, graduated student, okay. With forty one of them to participate.
So these show that
in fact with such machanism we indeed can collect good data from beginning. We don't
have to
take it saying that data is not good enough and just take it from there.
Let's
do something you know
do to better if we can collect much better data.
Okay, the conclusion and final thought here is that here we see three social media
examples
to illustrate the concept of decision learning.
In this information diffusion problem
we
try to learn
users' utility
functions
for understanding and modeling strategic decision making.
And for the Groupon and Yelp example that we've seen
we try to learn from each others' interactions.
We have this negative network externality
for better decision making.
And for this crowdsouricng we design a mechanism to
enforce users' strategic behaviour to attain better quality data for better learning.
So
by decision learning
we mean that it is the learning with strategic decision making.
And these two are something that has to be considered together.
And we can analyze users' optimal behaviour from user's perspective
and we can also design optimal mechanism from
system designers' perspective.
Now for the
coming big data tsunami. This is something that we envision.
We are going to have big data here
and these big data and
is not
?? steady data overlay
that
Bob and Alice is going to
learn
from this data and be it whatever model here. I say Markov decision process, it
can be anything to learn from here.
Once you learn something from here, you are going to make decision.
But (when) you make decision you are going to take action and most of these
are sequential actions. We are at
different places and different people.
And then you have the outcome and this outcome will come back to affect the
data.
Like I said the data is not static data
and therefore
Bob and Alice
one can be in Singapore and one can be in US, their actions and their
decisions will eventually affect each other,
because the data have been changed.
And this is something that
we believe that is something that
both
challenging and the potential future for research. And I believe that
this is something that also can be
very promising in speech and language community.
Okay, that's it. End of story. Thank you for your attention.
So I am glad, I think we all made a good decision to come to
professor's Ray Liu talk.
We have time for few questions.
I think this was a wonderful talk.
I've noticed that in your Chinese restaurant decision process that you had this backward reduction
stamp and
you showed your backward reduction for just one time step, from T+1 to T.
The question is: is it computationally feasible to do backward reduction from the end of
the end of time to different time?
Oh yes. That's correct.
I think it's surprising issue, right? I mean it's easier
okay to make one cent for task, if they are more difficult they make 10
cent for task and they had to think twice
right? Which one can make better profit?
So I think that different
what I show is just one dimension, meaning I just used one example. In that
particular we
designed mechanism in that way.
In a different scenario
you can design different mechanism in a different way. I think you should be all
fine. There's
no single solution here. This is just a concept, meaning that we can do so.
And I used an example to illustrate that.
Do I have to push? Yes.
I am a little intriqued by the experiments that you have done.
I'm just wondering that if we look at social networking
you know first time you are using it, maybe you're kind of getting
bogged by that information diffusion that happens.
But you know second time or third time user, you just ignore all the incentives.
I'm just going to do what I normally do.
Say every time you do that experiments I am going to get a new set
of
people who are
using this and then making decisions based on that?
Okay, because there are some microphone problems I didn't hear clearly. But let me
guess that I think you are saying that
if they do many times they have different experiences so may game the system, right?
So I think it would not (happen), because we use game theory to formulate that,
we found
ultimate equilibrium in that solution, so
basicly doesn't matter how they game,
they will have the best strategy that can achieve their best equilibrium solution for that
so doesn't matter what they game. They only have
some of the best strategy they can do, they can go.
Whatever other strategy would not get them better so it's okay.
Anymore questions?
So seems like they want to make a decision to go for coffee
early, but before that
I would like to show our gratitude to a professor Ray Liu and our vice-president
Haizhou will present
a souvenir to professor Liu.