Speech Transcript - Decision Learning in Data Science: Where John Nash Meets Social Media

Thank you, Brian.

Good morning.

It is my honour to be here

and this is my first time in Interspeech.

I thought that

Interspeech have an interesting culture that before the talk there is dancing and singing,

but it was not offered to me.

and I saw many old friends here, many familiar faces

and this community is technically like a cousin to me.

I have learnt a lot from this community.

I'm working in signal processing in general

with application to communication, multimedia and many other things, but not speech.

I don't know why, but I found the answer yesterday.

thing, okay.

Yesterday,

when I first came to America as a teaching assistant

I was teaching assistant for signal systems

and I had to teach students there's linear shift in invariance ??,

there's a concept ??

And every time they were laughing behind me,

so I asked them once: Why are you laughing?

And they told me.

F. You need to pronounce your F.

That's why I cannot do speech processing.

today

just like Brian said,

there's no speech, there's no keynote, if I submit a paper to Interspeech it will

be 'yes' or it will be rejected,

because it doesn't match heading, keynote yesterday or keyword at all. However

I hope this can be useful to you. I learnt something like hidden Markov model

and deep learning

from this community and I hope that what you see today, the idea, may be

useful to you.

Okay,

so the story begins.

First

social media.

This is a new phenomenon. More and more decisions and

activity in our daily life are being recorded, tracked and shared.

It's scary, right?

Yes, it's scary. This is kind of user-generated big data.

For example like our motion, our movement everywhere, they are being tracked.

Yes, it is scary. However it is also a 'plus'.

Some potential and very interesting and important research direction for us

and machine learning is a very powerful tool

as being used in this community and also potentially

in many other enterprises, such as social media.

Machine learning aims to use reasoning to find new and relevant

information given some background knowledge. There're many applications

and

you all know, I believe many of you using that machine learning consists of three

elements:

representation, evaluation, automatisation. I will not get into detail of these.

Although it is a very powerful tool

there are limitations and constraints.

Why?

Because the generalization assumption may not hold.

Especially in user-generated data scenario.

Why?

Because the user behaves differently at different time under different settings.

You behave very differently when you are hungry or

you need to go to restroom then you cannot sit here, yeah? You, we

at different we may behave in a different way.

And the training set may not be statistically consistent

with the testing set.

That was the assumption in this machine learning algorithm

and also single objective functions

cannot cover user's

difference interests.

Users have different interests and therefore they have different objective functions.

And users are also rational and therefore they are all selfish. They all want to

optimize their own utility.

And we need to consider this.

the knowledge contained in the data

can be difficult to fully exploited from

this machine learning macroscopic view. Although it's very powerful.

Because the data are the outcome of users' interactions.

We cannot ignore the users' interactions.

We need to explicitly consider interaction of users

and basically that mean their decision process

should be taken into consideration.

And in the sequels you will find out that Game theory is

powerful tool

that can study users' strategic decision making, because we can take into

multiple objective function at a single time and we can

also take into consideration of users' local and individual interests

into consideration.

Okay,

let's face it.

Learning is for what? Learning is for making optimal decisions. We don't learn just for

fun.

And learning and decision making are coupled together

due to network externalities. What is that? What does I mean network externalities? I mean

that my decision, my action

will affect your decisions and your behavior.

We all affect each other. You made decision to come

here and I made decision to come here, that's why we are all here.

We affect each other. However what is the problem?

The problem is that there is a missing link.

And the missing link is that

in a very traditional sense machine/social learning and strategic decision making are still two

unrelated disciplines.

And so here

we propose

a notion of decision learning to preach in between

that

when we are doing learning we also need to consider the decision making process.

So this is a

title of the

talk. 'Decision learning' by that we mean learning with

strategic

decision making.

Okay,

I'm going to use three example to illustrate

how decision learning work.

First decision learning with Evolutionary User Behavior. Second is with Sequential User Behavior and then

we'll talk about how we design mechanism

from system perspective.

First let's talk about Evolutionary User Behavior. Here I want

to use information diffusion over social network as an example.

Okay,

now you see

this is the Twitter hashtag during 2008 US Presidential elections.

captures spreading of comments and phrasing by candidates.

And look at

pink one, you can see that pink one

this one says it is lipstick on a pig.

It is a phrase by famous Sarah Palin, if you know her.

And you can say that when a phrase was comment

there is a duration and it'll reach to the top

and then eventually it decreases and then goes down.

You see that everything have a durations.

And

this is for the political comment

and this was done in 2008. Now, if you are politician you want to measure

that your

message can be delivered. When to reach to the peak?

When will be the starting point and when will be the end point?

It had to be before the election not after, right?

Okay, now this is the other point.

That is online advertising for new product.

How to predict

the popularity of this advertisement and eventually what is the market share?

Can we predict that?

Okay, all of this relate to one problem that we call Information Diffusion Problem.

How information diffuses?

Users exchange information over social

network. Okay, we're talking about this on the our social network

and the study of this information diffusion,

this problem is important. Why? Because if we understand the dynamics

of this information diffusion. We can predict and maybe control

the start/peak timing and distribution.

And we can estimate a final reach of the populations

it also perhaps we can identify the influential users and links

for our purpose.

Okay,

now

Dynamics In Information Diffusion.

It is a sequence of decision making processes.

It is not what we said, what we though that information just

spreads by itself.

No.

For information to diffuse

it relies on other users to forward or post the information they receive.

Like I have my social network and if you are in my network, I post

something and you will see that

and once you see that you have to make a decision to post it or

not

and then from there now everybody when you are in somebody's social network

everybody have a decision making process: Should I post, is the information exciting,

are our friends interested in it? Or I don't want to post because

it could be embarrassing.

There is a decision making process here

and information diffusion is an evolving process

like our evolution.

for information diffusion we ask

whether to post

or forward or not.

And for evolution process

when there's a gene that have a mutant

we ask

whether to take mutant or not.

So it's similar, so we want to relay this together and model this problem.

And social network is always illustrated by a graph structure and we understand that so

we are using the

Graphical evolutionary game.

Now

what is this Graphical Evolutionary Game? Let me make a very simple explanation.

Each player have

a notion of fitness.

How fit you are? How fit I am?

Okay and so this is an evolution that we all know. The stronger one, the

fittest will survive.

A users' fitness

is determined by this as you can see I tried to use this equation and

this is the user's fitness.

And B is the baseline fitness that

depend on myself, okay.

And this U

is interaction payoff

determined by the environment.

And this alpha is selection intensity.

So if I'm going to interpret this I would say

that

someone, like for someone here Haizhou Li here in Singapore

his fitness depends on his own fitness

plus the selection intensity multiplied by overall environment fitness of Singapore together, that is

his own fitness strenght.

I think that makes sense.

an evolution process has two important elements. One is selection and one randomness.

These two are very important in the evolutionary process.

So let me use this example. We have a graph

and then randomly we select

a node

that can be a user.

And once we select this one we have already randomness in here, now

we are going to make selections, okay.

we have to compute fitness.

Its own fitness and also its neighbours' fitness.

I want to compute all these fitness and we select

the strongest fits

to imitate.

It doesn't have to be imitation, I just used imitation as an example.

There's a ?? protest and ?? despair there many different kinds of processes.

Okay,

we need to know which one is the best

fits

and then we are going to imitate

and

this is the best fit and we imitate,

okay.

So this is the

a typical evaluation process that is determined

by users fitness.

Now

Information Diffusion Over Online Social Networks

I have just say that

online social network can be presented as a graph.

As we can see it's a graph.

And this information diffusion depend on the user's actions to forward or not depends on

utility.

If I like or my friend may like it I will calculate the utility of

that and if

that utility is strong I'm going to forward or post.

If that is embarrassing then I am not going to do it. It'd damage my

reputation.

So this is a very similar to the

Graphical evolutionary game.

Now on the left hand side

is the Graphical evolutionary game. On the right hand side we have the Social network,

okay.

Now

there we have notion of fitness as I

explain. Right hand side we have the user's utility. Now we

basically we try to model the entire problem using this utility.

Is there interest or not that we can do so. So on left hand side

is that how can we map

and relay that utility or our interest to the notion of fitness

and then we can use Graphical evolutionary game which is a

very powerful tool that have been useful sometime, that we can use.

Okay, so how can we calculate that? I'd use a very simple example here.

Now let's look at this graph.

Now we have a user.

He chooses strategy to foward.

Okay. What is his own fitness?

Here I choose B=1.

This is based on his own.

And the neighbour is what?

We have Kf neighbor here. We have Kf neighbour

with utility that forwards also.

The rest of them K-Kf

choose not to forward.

Okay, so this is the utility of all his neighbours and you have selection intensity

if that

overall together with his own

fitness. This is the overall fitness of

the user who have the strategies to forward.

For someone not to forward,

we'll be doing the same,

okay.

Now for this

network, we have this configuration probability.

We have the fitness of the 'center guy'.

Now we need to calculate the fitness of all his neighbours.

Let's say someone with strategy to forward also.

What is the

fitness

and if someone with strategy not to forward

what is the fitness? So we calculated all this.

Now

with this we want to calculate what is the

probability that someone will change from forward strategy

to not to forward strategy?

And we can calculate all this, okay. The details I will just omit.

By doing all this

we can find the dynamics of information diffusion.

This dynamics of percentage of users who will

forward the information will be captured by this equation that we can calculate

and

the final evolutionary stable state

can be also obtained by this. And let me explain this. What is this

if the utility of forwards, all forwards is much larger than the rest who will

not do anything about it then

one hundred percent of the users will all forward and post.

Because it's too advantageous to them.

If on the other hand

not to forward have the highest utility and to forward

have the lowest utility then noone is going to do it.

Follows in between, there will be a stable percentage of population that will do that,

okay. This looks complicated.

Now K is degree of freedom. Meaning for each node how many friends you have.

This assumes that

(scale) is a large enough, okay. Let's assume K is sufficiently large. Now you can

see that basically it

is independent of the degree of freedom.

And meaning what? Meaning only the utility can determine how much percentage of the population

will forward or post information or not.

Okay, now let's see this

experiment. These are real data we acquired from Facebook. In this particular graph we have

4039 users with 88234 edges.

And here we have ten

subgroups, ten major user subgroups. First this is a social network so you can see

that in

the low scale it is a straight line here. This is so called scale-free network

meaning

that this is low scale so low scale people have very high

degree of connectivity, meaning it's very powerful. It is exponentially less, okay. So that is

something that we all knew before.

Okay, now let's look at Evolutionary Stable States.

That is

now let's look at left hand side first.

We use four cases as examples. So now we know the utility function, okay. Utility

function of

four parameters now are our model parameters. We want do model of information diffusion process.

So now look at this. The first case.

If utility of forwards is very high or higher than others, then what?

One hundred percent of user or everybody will post.

it reduces

and is not as good

however still good enough then

then there are some percentages in this case some sixty and some percent that will

post or forward.

If it's getting less

then about thirty and some percent

will post and forward.

If on the other hand

it's not good at all. The utility is so bad, there's nobody left to post

then

nobody is going to forward it

and do anything. And there is a rising time and we can calculate that time

and eventual population,

that percentage of people. We said there are ten subgroups and in each group they

all behave the same. This

is just to show that.

Okay, now this is something that we say: Okay, if we had a model parameter

what is the behavior?

Now, how about we don't have the model parameter. We don't.

We only had the data and this of the data from MemeTracker.

MemeTracker is this

it is

this news, this cycle network. It builds map

for the daily news cycle, okay.

By trying to link

that what others,

these are news report,

for these information.

You see as example here is a comment: we're not commenting

on that story I'm afraid. Okay. With this

then someone reports: we're not commenting on that. And there are three different links. And

then somebody

changed a little bit: we're not commenting on that story.

And then there is a link, okay. So

this is a very good source of

to find these information, how you diffuse, okay. So we had this data and this

is a huge amount of

data. As you can see that we have more than 172 million news, articles and

sources.

Okay, so we might from here

now what we want to do is

we want to learn the utility

that is, I have this vast amount of data,

can I find a utility? I can reduce to few parameters to describe how informations

may diffuse.

Okay, this is the result.

This is dynamic of four pieces of information.

This curve associates with word

Google Wave.

This associates with the word

Tehran.

The grey line is a real data.

And the red one is our fitting, our result.

You can see it very well and the blue line

is using these only

using only the machine learning approach.

So you can see that

what we can do can fit much better.

And most importantly that not only that

we describe this information diffusion phenomena into, in this case only in two parameter, because

we normalize these two, okay there are more and more. We only used two parameters

we can describe how the information

diffuse, when it stops, when it reached to the peak and the

way it (ends) and if we only have partial information we can predict.

I didn't have something to show you we can predict

and also.

Okay.

Now

let's do this experiment. We have five

group of sites from the database, okay. We know this is a

large database. Each group contains five hundred sites.

And we estimate the final stable population,

the lowest population that posts or forwards these information.

So these five groups we can find very interesting.

At the end it had different percentage of

users in this particular group.

That will post or forward information.

And

this is to show that the black one is, these are from,

these are our model.

And the red one is from the real dataset and you can see that they

are very consistent.

Okay, now back to this.

What does this means?

This is very interesting. We can see that group five

behave cohesively

or share major common interests.

That means that those users in this particular group

more than eighty one percent will agree with the social (group),

with their own networking friends. Whenever they post something

eighty one percent that they will also post.

and however in the group one. There only about

nineteen percent will do so.

Therefore if you hire an advertiser you want to

do advertisements. Which group are going to forward to?

And then to expect

a high

percentage of

results that many people with see.

That would be this group.

So we can

expect our group five in this particular case behave

cohesively and share major common interests. However group one

share rather very little common interest.

This is to mine

the coherence of group and

if we can do anything about it or not.

Okay.

Now I'm going to change gear to talk about Sequential user behavior.

Here I'm going to use some

well known website that we collected data from to

illustrate the problem. You all know Groupon. I think some of you may also use

it and advantage of it,

right? On Groupon we sometimes find out there is a very good deal for a

restaurant

that you always go and we buy from there.

And

and Yelp.

We're also using Yelp.

When I am here I am using Singapore app to determine which restaurant I want

to go.

Once we pull them together it gets very interesting.

We found that

this is a Yelp rating

for some very good restaurant.

We found that

Yelp's star rating decline, okay you see this is declining after a succesful Groupon deal.

With rating that's very high and

many people buy this Groupon

and then in the next few months you can see that

its rating is declining.

Why?

This is a negative network externality

in place here. The one I'm mentioned, I'm going to explain that.

So this degrading quality may be due to overwhelming number of customers, because it's too

successful. Many people buy and then in the following day many people will show up

and they only have three waitress. And then you have one hundred people who show

up. Quality would not be good.

Kitchen is limited and may not be good.

That could be the reason.

Okay, so

what is a phenomena here?

Learning

to get best deal. Everybody want to get best deal.

And then they make a decision to take advantage of that and then their decisions

affect each other.

Reduce all this ?? fashion from each other.

That's what we call

negative network externality.

Your decision and my decision

eventually affect each other,

okay.

So how can we have model this problem?

I like to go to a problem in machine learning called Chinese restaurant problem or

Chinese

restaurant process. Some of you are using it also.

I don't need to explain this is Singapore if you don't know

what Chinese restaurant problem is, just choose chinese restaurant you go to, okay.

Chinese restaurant problem

we have limited round tables with unknown size

and this round table is of unknown size could be cloud service or it could

be a deal,

okay.

And customer which choose table to sit

or open a new table.

For the Chinese restaurant process we have infinite number of a tables,

with infinite number of seats.

A customer enters and with predefined probably to

either choose table to sit or open new table.

This is non-strategic because it's parametric, okay.

So now

we introduce Chinese restaurant game.

We want to introduce this strategic behaviour, this decision making here. Why? Because

eventually we want to understand the negative

externality effect, to model it and to understand that.

Okay, so this Chinese restaurant game.

We have table

with unknown size

Rx (θ):

is system state (θ). System state is the condition of the restaurant. How much money

they have, how much

budget they have and the environment, okay.

A customer has signal S about the system state from

advertisement, from friends, from wherever they have been before.

Or from advertisement.

Which table to seat

to have maximum utility?

You know that if you go to chinese restaurant some day it'd very unfortunate many

people seat

with you and eat, is that right?

Utilities is very bad. It's not confortable at all.

And this is a negative network externality.

More customers at one table less space for each user and therefore

utility is less

for past and future.

What does that mean in future?

That mean

when you come in

based on the condition you sit at a table that may not be the best

decision.

In ten minutes

ten people may sit with you, okay. So you need to predict the future and

that is the problem and the

future affect you decision right now.

And that make it in fact difficult.

Okay, so this is a sequential decision making problem, customers make decisions sequentially

and information of observation that we have is that for every customer comes,

every player, they can only seat

at that moment how many people seat at the table.

And then also the signal, okay, from those who seat before him

that's all he can get.

So when first customer arrives

it picks up a table

and then the second one

pick another table, and so on and everybody picks one. Sequential decision making process.

Can

number three user make the best decision at that moment?

No.

If he cannot predict the future and see the future his decision at that moment

may not be the

best, because people not yet coming are going to affect him,

okay.

So now let's assume there is a perfect signal, meaning that

everybody know all the signals, all the decisions other people may want to do given

that

condition that they have.

I would not get to the details so I'll explain

then there is equilibrium grouping meaning the best utility is

that when you choose that

with this signal from everybody it will be the best.

When anyone changes to other table

the utility will only reduce.

So that would be the best strategy.

When a signal is perfect.

Let me explain this. So what will be the strategy?

When everything was,

all the condition if I come in I will know

ten minutes or thirty minutes later of what people come here

I have the perfect information. You know what?

If I come in

I'm going to know which table have the best utility and I'm going to choose

that.

The second one will choose that also, until the table is filled. And then they

will go to the

second best utility and do so, because the signal is perfect, right.

customers who come early

have

advantage

but those were protect signals.

In real life

it's not that easy.

It's imperfect signal. We cannot have the

perfect signal and therefore we have to learn

from the noisy signal to form our belief and this is the learning process

and we can use Bayesian learning or some other thing to construct the belief that

we have,

okay.

now this.

What is the best response? The best response is:

let's choose the best utility.

But in order to choose the best utility we need to know the final distribution

of the users and like

I said we need to know who come in, where they will seat?

So this utility function depend on subsequent users' decisions

and this is a negative network externality.

And we need to predict the future decisions to find a best response

therefore we had to ?? backward induction to do so. So that everybody is going

to have optimal

decision we can predict and then backward induction from there.

Okay. Now this is the summary of this Chinese restaurant game and this is some

example.

Remember this?

I say that

if there is a good Groupon deal

the utility will go down like that, okay. The red one is the real data.

And now let's use linear model in this model

the utility, okay.

this is utility that we model. We learnt from the real data that this is

the utility.

And based on this utility

this utility and then we can attain all these parameters that we have and then

we can use

what we had just derived.

We consider all this negative network extarnalities. We can predict the future to find optimal

result.

As you can see the blue one is learning with negative network externality that we

propose and then

the red one is without. Meaning only at that moment.

We can increase ?? DSSR rating.

We can indeed increased the rating by doing so.

So if we consider this

negative network externality then the strategy will perform much better.

In fact we can also use this to devise strategy.

This is a New restaurant's strategy, okay. Now we have a new restaurant.

This is a low quality restaurant and a high quality restaurant.

Now

what is the number of customers?

Look at this. For a low quality restaurant

in order to increase the number of customers who arrive

the deal price had to be less.

If your quality is not good, you make it cheaper.

high quality restaurant even if deal price is a little higher you can still make

a good profit.

This is a signal quality. Your advertisement and also all affects to be none.

If you a high-quality restaurant you want signal quality

to be very good, so people know this is a good restaurant.

If you have a low quality restaurant. Don't let know too much about you, okay.

That would be better.

And next we get to revenue. This is now what is the revenue?

High quality and low high quality restaurants they can also all achive peak revenue.

for high quality restaurant to achive peak revenue

the signal equality had to be good

then the revenue was higher. Let people know

and then

through the word of mouth people will come.

However for low quality restaurant it's not good.

Don't let people..

The better the signal quality the lower revenue you are going to have and see.

Okay, so I think this all is common sense, however it's highly nonlinear

and we had to model that and this is indeed a very highly nonlinear problem

and I don't have this

slide to show you. And in our paper you can take a look. So high

quality restaurant should try

to increase the signal quality and low quality address should hide the

quality information and use low deal price to attract the customers.

Indeed we developped a family of this. We had a Chinese restaurant game and we

also had Dynamic Chinese

restaurant game for people who come and go.

And that is for example just like this Wi-Fi, okay. Many people come here and

many people may leave and

they decide which network they want to choose. And also we had this Indian Buffet

Game.

And that is

interesting. We see all this in US, okay.

Chinese restaurant in this all round tables you can

see in Indian restaurant also had these buffet thing.

Okay so

we have multiple

choice of what we can do

and this can be applied to many different scenarios.

And we've applied to for example we have seen this in Yelp, okay. There is

Yelp on Amazon.

And this is so for this review, okay. It's a sequential decision and Question and

answer sites. You can see

all these Question and Answers sites, we also had a problem on this.

Also many other sites as long as you have this sequential decision process that come

and why these .. we can apply this to that, because social computing system in

this case, they are all structured in the same fashion that

users arise sequentially

and to make decisions on

whether to do something or not, whether to forward something or not.

And also there are externalities amount users.

Your decision, your actions will affect each other and also there is some uncertainty. So

in this

sense Chinese restaurant game will be a natural choice in modeling and analyzing these users'

behavior, so

that we can come up with the best

piece of the model we have seen and also to devise some incentive mechanism,

okay. To attain some results that you want.

And I believe that in speech and language community

that you also have something that you may want to do.

Speaking of this mechanism

we want to see this I believe that you use a lot of this a

labeling also in this community.

Now you have seen all this

that's all from users' point of view, so you may ask can

system designer design something based on this?

Yes.

So we want to see if we can design some mechanism to

attain something that we really want.

In this case I want to show you

is that how can we collect

good

data

okay.

Large scale labeled dataset

is very important in many applications. I think you all know better than me.

Because more data leads to better accuracy,

however large scale in-house annotation is very expensive.

You need to have a thick pocket to do so.

And often it becomes a bottleneck for that.

And recently you have seen that microtask crowdsourcing is very popular.

It claims to be a very promising way, because it can handle large volume, short

time and low cost.

All the benefits

and a question is: is that so?

We have many website that can do so,

okay.

There's a problem

you see all the

positive side, but there's something that didn't tell you.

That

due to limited budget, okay you and I don't have deep pockets, okay

so that the collected data often have low quality.

Why

low quality? I will explain that. T mutual incentive thing, okay.

Now

for machine learning all signal processing, okay.

Sometime we say okay

let's deal with it, let's cope with this

so let's filter out low quality data or cope with

noise with this modified algorithm to be more robust and probable.

But there is after-effect,

okay. This is a our typical behavior up to that.

Now with these decision learing solution

maybe we can ask

can we do a better job before that?

Can we incentivize

with a mechanism to ensure high quality data from the first place?

Okay

you want to give me more money? No, I don't have money to get you.

Let's see if we can detain ?? something

that can make you happy.

We allow

increase in your budget, maybe reduce your budget, okay. What that is

okay.

This

crowdsourcing

has such characteristic

repetitive and tedious small tasks. Yes or no, correct or not correct. And you do

that

and how much do they earn?

USD 0.04 cents for one thing.

Too many

too many of them have house or wife and they have nothing to do,

when baby is sleeping they can make a little bit more budget, so they can

help pay for some of the bill.

So worker receives a small amount of reward and there's no competion among workers.

So what's the problem?

The problem is that this may lack proper incentive.

It is profitable for a worker to submit poor solutions.

Yeah?

Because of the

the number of solution is just too high, nobody is going to check on it.

It'd be too expensive for you to check on it, right?

Think about it. That's why it's expensive.

and however you want to check on all this, just like I said.

Provide incentive is challanging. Why? Requesters typically have low budgets for each task

and also to offer better incentive you have to pay more, so people can do

better job, you need more

money and also in order to verify the solution is also expensive. All these are

problems.

So what incentive mechanism should

requesters employ to collect high quality solutions in a cost-effective way?

So here we propose a mechanism

that sounds interesting and we also have the Oracle analysts that found it can work

and we also did

experiments, so what is that? I'll show you.

Now up to now there are two very basic reward mechanisms

that we can do so. That we can evaluate the solution.

One is called consensus mechanism. Consensus mean what? Okay

now I have all these solutions people submitted.

I do a majority vote. I don't know the right answer but

I do a majority check. After majority check I know it's correct.

Correct fits the majority check.

Or I can do reward accuracy mechanism. Meaning I can

I can verify

the solution is correct or not with probability.

It's too expensive with probability. I want to verify all of them, but I don't

have time.

If I had time I'd it myself.

Okay, now I have these people so I may have some probability points, probability principle,

probability

that I'm going to make a sample

and then to check on the solution.

Now you can see these two had cost C or ??

And this relates to a parameter and this is not under control of requester.

So there's a problem here.

This cost, the one I mentioned, is a fundamental problem.

For the traditional (methods) it's a problem.

now

before we do so

let's

come up with this incentive mechanism first that's called Mt, okay.

This is called Incentive Mechanism via Training. Idea is this

to employ quality-aware training to reduce mechanism cost.

What does that mean?

We will assign

enforced unpaid training to workers when

performing poorly.

You flip off one poorly

then they had to go to some training time and in this training time they're

not going to make any money

okay and to guess some credential back so before they can do that again.

So this is more like a punishment if they don't do well.

okay now we had two states. Everybody had two states.

What are they?

Okay this is normal.

They produce results, give them money so that they are happy.

However

if they don't do well

they are going to go to the training state.

And in this training state workers have to do a set of training tasks.

Unpaid.

Okay

to gain qualification

for the working state.

Okay this can be just a policy

to deter it,

not necessary may be implemented. You will see that.

So everybody now

had two states.

Now you can see two states.

One of these is worker's action at working state.

Now everybody's action at working state.

This will determine the probability of this worker may continue to be in working state

he may go to the training state.

And for how long it may get from training state back to working state.

Worker will prefer to be here to make money.

However if the job is not good enough he is going to go here. Nobody

want to come over here.

Okay, so worker's action at this moment

may not only affect his immediate utility, but also his future utility or both

together.

There's a notion of this long-term expected utility here

and basically for you formulate this problem

it is

is a like the

Markov Decision Process (MDP).

But this Markov Decision Process is not easy, because that

the

this MDP faced by each workers also depends on others' actions.

Our action, how's the quality of our work is depending on how others are doing

their quality work, so it is indeed

Game-Theoretic Markov Decision Process. That means what you are doing affects what I am doing.

What I'm doing will affect what you're doing.

Okay

so from here

I will just summarize the results. Basically we show that

with this Game-Theoretic MDP

there is a guarantee of Nash Equilibrium

as long as the number of training set N is large enough.

If the number or training set,

I mean the training problem to be done N is too small,

nobody will worry, okay. I go to the training state, very soon I can get

out just like somebody commits a crime,

you put them in the jail and one day

they come out. You know what? Many people walk out and in everyday.

If you put them in for ten years. Very few will .. they will think

twice before do something again.

Okay, now

we also show that

if we

follow certain design procedure

we can achieve arbitrary low mechanism cost. The cost can be even

very small, almost to zero, if we design it correctly.

Okay, now let's see the experiment setup.

Task, a very simple task

this is to calculate the sum of two randomly generated double digit numbers.

Second year,

second grade student can do so, okay. And (workers are given) ten point for each

accepted solution.

And the goal is for

these participants to maximize or to accumulate these points. This is not real money. We

work

with points, we are doing an experiment.

And the text assigned to these participants comes in three sets. And each one is

limited to three minutes, okay.

Only in this limit had to do so.

And so we had these three sets.

The first set is that reward accuracy mechanism with sampling probability 1, that mean

in this

in this set

we tell all the participants

that we are going to check your every solutions.

And the set number two

with sampling probability 3.

That we will only have thirty percent or probability 0.3 to check your solutions.

That is set two. Set three is

sampling probability policy 0.3 only, however we also tell them we also have this incentive

mechanism.

That is if you don't do well enough

you are going to do some training problem before you can come back to do

that.

Okay, so what are the results that we get?

If we tell them that we are going to have this sampling probability 1, you

can tell that

the accuracy, this is the histogram, number of wokers and their accuracy, the accuracy is

quite good.

With 0.3 probability you can see that most of them don't care

okay, because only

0.3 probability will be checked.

And this is the sampling probability.

You can say .. how about this guy? You know there's always

nerd, okay. And they always do good job.

And now with this

sampling probability 0.3. We tell them: hey we have this, if you got checked, you

are going to put

into there. Now you can see the result is basicly as good as this result.

And you say: Okay, who are these students?, can they do this easily?

They are all

engineering students, graduated student, okay. With forty one of them to participate.

So these show that

in fact with such machanism we indeed can collect good data from beginning. We don't

have to

take it saying that data is not good enough and just take it from there.

Let's

do something you know

do to better if we can collect much better data.

Okay, the conclusion and final thought here is that here we see three social media

examples

to illustrate the concept of decision learning.

In this information diffusion problem

try to learn

users' utility

functions

for understanding and modeling strategic decision making.

And for the Groupon and Yelp example that we've seen

we try to learn from each others' interactions.

We have this negative network externality

for better decision making.

And for this crowdsouricng we design a mechanism to

enforce users' strategic behaviour to attain better quality data for better learning.

by decision learning

we mean that it is the learning with strategic decision making.

And these two are something that has to be considered together.

And we can analyze users' optimal behaviour from user's perspective

and we can also design optimal mechanism from

system designers' perspective.

Now for the

coming big data tsunami. This is something that we envision.

We are going to have big data here

and these big data and

is not

?? steady data overlay

that

Bob and Alice is going to

learn

from this data and be it whatever model here. I say Markov decision process, it

can be anything to learn from here.

Once you learn something from here, you are going to make decision.

But (when) you make decision you are going to take action and most of these

are sequential actions. We are at

different places and different people.

And then you have the outcome and this outcome will come back to affect the

data.

Like I said the data is not static data

and therefore

Bob and Alice

one can be in Singapore and one can be in US, their actions and their

decisions will eventually affect each other,

because the data have been changed.

And this is something that

we believe that is something that

both

challenging and the potential future for research. And I believe that

this is something that also can be

very promising in speech and language community.

Okay, that's it. End of story. Thank you for your attention.

So I am glad, I think we all made a good decision to come to

professor's Ray Liu talk.

We have time for few questions.

I think this was a wonderful talk.

I've noticed that in your Chinese restaurant decision process that you had this backward reduction

stamp and

you showed your backward reduction for just one time step, from T+1 to T.

The question is: is it computationally feasible to do backward reduction from the end of

the end of time to different time?

Oh yes. That's correct.

I think it's surprising issue, right? I mean it's easier

okay to make one cent for task, if they are more difficult they make 10

cent for task and they had to think twice

right? Which one can make better profit?

So I think that different

what I show is just one dimension, meaning I just used one example. In that

particular we

designed mechanism in that way.

In a different scenario

you can design different mechanism in a different way. I think you should be all

fine. There's

no single solution here. This is just a concept, meaning that we can do so.

And I used an example to illustrate that.

Do I have to push? Yes.

I am a little intriqued by the experiments that you have done.

I'm just wondering that if we look at social networking

you know first time you are using it, maybe you're kind of getting

bogged by that information diffusion that happens.

But you know second time or third time user, you just ignore all the incentives.

I'm just going to do what I normally do.

Say every time you do that experiments I am going to get a new set

people who are

using this and then making decisions based on that?

Okay, because there are some microphone problems I didn't hear clearly. But let me

guess that I think you are saying that

if they do many times they have different experiences so may game the system, right?

So I think it would not (happen), because we use game theory to formulate that,

we found

ultimate equilibrium in that solution, so

basicly doesn't matter how they game,

they will have the best strategy that can achieve their best equilibrium solution for that

so doesn't matter what they game. They only have

some of the best strategy they can do, they can go.

Whatever other strategy would not get them better so it's okay.

Anymore questions?

So seems like they want to make a decision to go for coffee

early, but before that

I would like to show our gratitude to a professor Ray Liu and our vice-president

Haizhou will present

a souvenir to professor Liu.

Decision Learning in Data Science: Where John Nash Meets Social Media

Keynotes

K. J. Ray Liu, University of Maryland, College Park, USA