0:00:15okay
0:00:16so i'm not talking about understanding the use a user in social but conversations
0:00:24and i really representing
0:00:27the a team of students here so i wanna knowledge the students are really a
0:00:33part of this
0:00:34i also work we enjoy denotes the and that's faculty advisers
0:00:40and that it's
0:00:42it's been a lot of fun and the students i really
0:00:46okay so it doesn't know about amazon like surprise and so i should point out
0:00:52that name they are here a sounding board and then i don't think it is
0:00:58again something
0:00:59and is an important e
0:01:02was a response to a caller competition
0:01:06which was the l x imposed on a lot surprise and so the idea
0:01:10back in twenty sixteen base elicited skull and they want university students
0:01:16to build a social but i
0:01:18and well as it is also but to converse quote unquote coherently and originally a
0:01:24with people on a good topics inference and so it's very open domain
0:01:29so my gratitude on with the team leader said you want to do this and
0:01:34i think you're crazy but okay
0:01:36and he got it together and they wrote a proposal and the
0:01:42and the intended to select and then we can then the field of the system
0:01:46and all that
0:01:48at the end of that we had about ten million or more than ten million
0:01:52conversations
0:01:54with real users
0:01:55and between that
0:01:57and the fact that we're working with the new type of conversational i basically what
0:02:02we where it is there is a lot of research problems in that is based
0:02:07dialogue that it is i hadn't thought of before and so i'm the focus of
0:02:14this type is gonna be understanding user
0:02:17a particular including user modelling but i want to start out by saying this is
0:02:23used once all these the overall big picture i'll give you a little of those
0:02:28probably picture is just one small piece
0:02:31so what it what i mean by social plots so and why do i mean
0:02:34by think this is a new type of conversational ai
0:02:38so a lot of work in conversational ai has two spaces and people often talk
0:02:45about it as two different possible task
0:02:48so there is the virtual system and what task oriented dialogue
0:02:53and in that
0:02:54type of dialogue system
0:02:57you're executing can we have the answering questions it or something that is social back
0:03:03and forth
0:03:04on the opposite end of the spectrum is a chap which is oriented towards chitchat
0:03:10kind of how are you know what you're doing today but it really limited content
0:03:16to talk about
0:03:20i like to think of these not is to different option
0:03:23but as
0:03:25a two different types of conversation you know broader space has at least two dimensions
0:03:32probably more but there is the accomplished task dimension where the virtual assistants trying to
0:03:39do something in the chat but is not and there is a social conversation dependent
0:03:43where the jackpot is being social but doesn't have as much to talk about
0:03:49so what we are trying to do us something that's in between
0:03:55we do we're a little bit less social and a little bit less
0:04:00a task oriented
0:04:02then the other two
0:04:04well i i'd argue that it is to some extent
0:04:09a task goal oriented because you're providing information
0:04:14so there's some so most social exchange and information so with that background
0:04:19what i'm gonna talk about
0:04:22is initially that then of the social but for our chi specifically and that is
0:04:31that the conversational gateway
0:04:33and all of us system overview i'm gonna kind is true that because this is
0:04:39early days of working on social but that's and the architecture with you is not
0:04:45gonna be the our architecture that anybody'll use a couple years from now but we
0:04:50need to understand it to see how we're collecting the data and what we're doing
0:04:55then they want to focus in on characteristics of real users and this is just
0:05:00an analysis somewhat anecdotal but i think it's important to understand where we're going and
0:05:05then i'll start in panel talk a little bit of our first steps in user
0:05:09modelling and out in this was something queries
0:05:12okay so this is also by as the conversational way
0:05:17so what
0:05:19we see
0:05:20is that this social but when people come to talk at social but they are
0:05:25not they don't have a specific task that they wanted you don't wanna work
0:05:30a restaurant reservation for example that they do you come up with some sorta
0:05:36ideas of what they might
0:05:38and yes or conversely
0:05:40and they were new information and their priorities are interested in a of all their
0:05:45goals available
0:05:46and so the social but is still indicating that a balding
0:05:50so one of vocal set
0:05:54the users are also in this case
0:05:56coming to a little a little device
0:05:59to talk to okay that our accessible dot so they know they're talking to about
0:06:05we are not trying to pass a two pass into i
0:06:10i would argue that users should know that they're talking to a box and so
0:06:15making the lasso human like as to what the users may not be such a
0:06:21thing to do
0:06:24pretty much the systems
0:06:26i know that you know in some
0:06:29for some people channel after a little controversy all this is not a chat but
0:06:33i think there really are applications for this
0:06:37for example you could imagine in language learning having a conversational agent that can converse
0:06:43was which is a good way to practise language tutoring systems a good way to
0:06:48interact with learning about information at their own case with depending on your own interests
0:06:56you know
0:06:57are you we're using
0:07:00a chapter information exploration interactive health information recommandations and just to give you a nice
0:07:07you have how you can imagine that so when i come home i actually use
0:07:11the my i'm not a power user but i actually use my
0:07:15why alexi the and often times when i come home i want to listen to
0:07:20the news well i'm at dinner well you can imagine if you could interact with
0:07:25the you could tailor the news to the stuff that you're actually interested in
0:07:32and that there is the notion of an exercise coach or your coach so we
0:07:38end up teaching conversational a high course
0:07:41screenrecorder a building on what we have learned the teams of students to read and
0:07:46there was a great coaching a i system that
0:07:52one of the student teams bill so a lot of actual applications like think this
0:07:58technology can lead to and a lot of people are shown that interested in
0:08:04okay so are you is that it's a conversational gateway timeline content so again when
0:08:10you get when o
0:08:12you might want to talk to a the in a rat we had your system
0:08:15to learn about what's going on in the world
0:08:18and in this particular case we're scraping client and the contact would be a new
0:08:24source it is it could be video could be well actually no with all text
0:08:30was so it's new sources the could be whether we're not using quarter of we
0:08:34use a and b
0:08:36we read from a red it's the discussion for that could be so all of
0:08:41the stuff that's online you could interact with
0:08:46so just to give you an example or even by
0:08:48this is an actual dialog in all examples i'm gonna give you are actual examples
0:08:53and exposing a lot of our system
0:08:57so in the first case you have to start out with you says let's chat
0:09:02that evokes a system because we're supposed to be anonymous in the competition everybody was
0:09:08required just a this is the know what surprise social but and then added that
0:09:12can just go on a chat and
0:09:16you have to chat about topics you have just play games and chat about whether
0:09:22so we are for or something
0:09:26somebody will accept the
0:09:29they will talk about that and try to leave the conversation for thirty eight somebody's
0:09:35not saying too much
0:09:38and so far in set with this case we're talking about movies and we might
0:09:43talk about a director or there were we might go which you that sort of
0:09:51like so that's how the dialogue going
0:09:53in the beginning i'm showing here
0:09:56a recognition error the person after house or a person says that alright reason we
0:10:01can get that rat get that the answer responded that correctly is because actually we
0:10:07have n best alternatives
0:10:09and so we could do you get out in figure out based on probabilities and
0:10:14based on the actual context responding to house k at the present actually said
0:10:21okay
0:10:23so i want to highlight why this and how this type of social but is
0:10:31different from a virtual assistant that has much more research
0:10:38well so that have i use that is a sort of conversational ai cyst
0:10:44components and even if you're doing and to and
0:10:47you're and i and would sort of rebuild we're often different stages that maybe training
0:10:53and the and reduce different stages of the speech and language understanding
0:10:57the dialogue management response generation but also every system is gonna have some sort of
0:11:04backend application that you're interacting with
0:11:08so and a virtual assistant
0:11:12the speech and language understanding is constrained domain
0:11:16can be and easier task you like task intends
0:11:21oftentimes you're filling out forms a binding constraints to resolve with the person wants to
0:11:28do on the social by the end
0:11:33are more social
0:11:37or information oriented i want information on this topic
0:11:42so the entrance are a little bit your french
0:11:44and in terms of understanding at the sentiment is gonna play a role
0:11:51the dialogue management side on the virtual system you're trying to resolve ambiguities security and
0:11:58options to figure out what's the best solution to this problem
0:12:01and then executed task
0:12:04and the roar would be timely completion of the task
0:12:09a lot so i
0:12:11you're actually trying to learn about the interests of the user
0:12:15and the suggestions at least in our system but that's information oriented you one make
0:12:20suggestions of things that might wanna hear about
0:12:24and the reward is user satisfaction which is not so concrete
0:12:29and that's very challenging
0:12:32the backend per a virtual system with a and b is structured database are back
0:12:40and is totally unstructured so we have data structure
0:12:46and lastly because it's a constraint or maybe virtual assistant response generation is you are
0:12:53then in our case
0:12:55which is an open-domain because we could be presenting information on in there
0:13:02okay so let me tell you a little bit about our system
0:13:06and i'll give into a how we
0:13:11the velocity cut it a little bit overview and then evaluate the system
0:13:18so
0:13:19again this is the new problem when we started we had no experience with the
0:13:23lexus skills we didn't have our own dialogue system and
0:13:27using their tools
0:13:29well as it really a good solution because it will for designing for speech or
0:13:34fine
0:13:36and that's not what we were doing we're actually doing conversation
0:13:39as opposed to you know the form filling task oriented things that people have designed
0:13:44apps
0:13:47so
0:13:48that was a little hard
0:13:50and find that
0:13:53there was no data no people often chair challenge is of that no amazon had
0:14:00data they just they should have given that you know there was no data amazon
0:14:04did not have data they had interaction straight and transactional interaction like such a kitchen
0:14:11timer
0:14:12you know plane using
0:14:14they did not have conversations
0:14:17this was one of the reasons i'm sure this part of the competition
0:14:20and i
0:14:23after
0:14:24the performance or so getting the data from other teams in the recognizer recognition error
0:14:31rate went down a according to them in a paper three percent
0:14:35so i really didn't have the data
0:14:38so it's a new it was unusual you're a new probable and what that means
0:14:43was there's no existing degraded entering so we started out thinking that's what we would
0:14:48do when we started out with do we present a sequence modeling with whiskers it
0:14:53doesn't work
0:14:54because it's all data
0:14:58so we have read a yes in terms of starting from scratch
0:15:04i think
0:15:06because we're starting from scratch our system was you see that in
0:15:12so that data that we collected in the beginning
0:15:15you know was good retrain your recognizer
0:15:17what was not so good morning how to improve our system
0:15:21so this is all the say we're at the beginning the system wasn't so good
0:15:25it of all it had to well okay so that setting the state probably to
0:15:30the system design
0:15:33alright so we when we first started building a system we first started getting data
0:15:38we realised have that it was we side effect okay what we wanna think about
0:15:42in terms of designing this just
0:15:45so
0:15:46i think that people what makes someone a good conversationalist
0:15:50so you know to a perceptron and you looking for people to talk to you
0:15:56generally want to talk to somebody has something interesting to say
0:16:00okay
0:16:01and how we also want to talk to somebody listening to you and
0:16:05joint we are interested what
0:16:07you have set
0:16:09okay
0:16:10the principle seem reasonable to apply to a social but and in fact i think
0:16:15they really work for us your some examples
0:16:18so
0:16:20we saw that users would react positively children something you will tell you later how
0:16:26we have got that information so for example around christmas time
0:16:31a people what like to talk about christmas and we in calling our content have
0:16:36undefined
0:16:37this little tidbit space accent beer ingredients to the international space and station just in
0:16:43time for christmas and a lot of people that was kind of interest and they
0:16:47like that piece of information data and also like sort of
0:16:52cool size of our a lot of the users are turkeys and so they like
0:16:58the fact that babies as you are ten months get is that how much someone
0:17:01values a particular goal
0:17:03by observing how hard they are willing to work to achieve that
0:17:07i interesting people that was interesting and like that
0:17:11they do not like all news
0:17:14so that we had a fixed that problem really early on we tell me something
0:17:19that's two years old that gave us better use
0:17:22the also didn't like unpleasantly then it you know it turns out there's a lot
0:17:26of bad news in terms of current events i mean that if you're scraping you
0:17:31will get plane accidents where people die and things like that
0:17:37so we started hearing or and you are visiting us that reactions
0:17:42but filtering is really hard problem
0:17:45so we can filter for people dying but we are a piece of news that
0:17:50people really didn't like was something about cutting the dog's head off so that's really
0:17:56unpleasant we wanna with that
0:17:59so another thing that we want to try to do show interest in what the
0:18:02user says of course they're gonna lose interest if you're not
0:18:06if you get too much stuff they
0:18:08that you don't want to talk about they wanna get acknowledgement
0:18:12something that's really working in these conversations they need to get encouragement to express their
0:18:17opinions does not be used to this
0:18:20so we ask questions like have seen superman
0:18:26it's layer
0:18:27which part did you like best
0:18:30so that's important part of the dialogue
0:18:33and fortunately to ask questions you need a little bit of knowledge of the work
0:18:40so you can ask seven standard questions about movies but once the domain gets brighter
0:18:45we might ask questions like this article mentioned google have you heard of
0:18:52yes
0:18:53i generated this happened to us in the demo
0:18:57unit we we're doing this averaged ml so in this case you know everybody last
0:19:02but sometimes you know what are the actual uses a gets annoying
0:19:06alright so this leads to our design philosophy of just summarise briefly
0:19:12we're content driven and user central
0:19:15so we had to do daily and i need to keep are
0:19:19are information price
0:19:22so we had a large and dynamic content collect collection and represent with the knowledge
0:19:27graph
0:19:28and dialogue manager that promotes popular content and diverse sources
0:19:33or the user centred side we had language understanding that incorporates
0:19:42sentiment analysis
0:19:44we try to learn a user personality in the world around topic changes and tracking
0:19:49j engagement and on the language durations so i
0:19:53we tried to use prosody appropriate grounding
0:20:00so
0:20:01this is the system and i'm not gonna tell you everything i'm just giving you
0:20:05to the lecture but you can see is a language understanding component dialogue management component
0:20:11language generation there's this back and where we're doing content management
0:20:15we're using and
0:20:17and question answering system that
0:20:20in this are provided
0:20:22we're using not expert we're using eight of us for
0:20:26some text analysis
0:20:29so that's a big picture there's lots of modules because we're at the beginning stages
0:20:33were constantly swapping in and changing things
0:20:37and enhancing things so it is a modular architecture to be able to about the
0:20:42rapid development
0:20:45so very quickly aren't each of the different components
0:20:50natural language understanding is multidimensional
0:20:55we're trying to capture different things some responses can be long and in capture both
0:21:01questions and commands
0:21:03we have to cut taxes topics that people are trying to talk about and the
0:21:08user reactions
0:21:11the dialogue manager is hierarchical l
0:21:14so we have a master and minions and the master is trying to control the
0:21:20overall conversation negotiate and right topics to talk about
0:21:26thinking about coherence of topics
0:21:29engagement of the user and of course it's important to the since work on trent
0:21:34content driven
0:21:36two are considered content availability you don't want to suggest talking about something that you
0:21:40don't have anything to say about it
0:21:42the minutes it'll are focused things
0:21:46for related to social aspects of the conversation and different types of news sources "'cause"
0:21:52different types of news sources
0:21:54or information sources
0:21:56come with different types of
0:21:59metadata an extra information so with movies we have relations between you know actors and
0:22:05movies well for a general news source we just have the news and the metadata
0:22:11about the top
0:22:15this is
0:22:16back to the example it you before
0:22:19and in this example there's stages of negotiation and that would be handled by the
0:22:24master and
0:22:27different types of information sources that were jumping around the n
0:22:32that are handled by the different
0:22:35go so are different many skills so the movie is one skill
0:22:40we great from a celebrated that skulls channel is
0:22:45and so that the last hole
0:22:47those often are willie
0:22:52and then we also sh great from another source it's giving us a and that's
0:22:55we're that job you're between skills
0:23:00and the language understanding so i
0:23:02basically we get dialogue acts
0:23:05for
0:23:07the dialogue manager and we get information that's to be presented from the dialogue manager
0:23:13and the response generation is gonna take those internet into the actual texture got it
0:23:19you're gonna say that includes a brace generation but also prosody adjustment
0:23:25the tricky thing for the so for the things use a lot you just the
0:23:29prosody in the speech synthesis
0:23:31so we have no control over audio but we do you have control
0:23:34i'm using s m l
0:23:37so you can
0:23:38make your
0:23:39i'm like enthusiastic
0:23:43of which you have to do with the prosody instead of having the above three
0:23:46d
0:23:48intonation
0:23:49by for the is that we present in
0:23:53news we actually just read as it is we rebuilt or it to get things
0:23:58that are covered more conversational
0:24:00but we're
0:24:01but that's text
0:24:03pretty domain and that's really hard to control prosody for
0:24:08actually we also do some filtering in the response generation which will see later
0:24:15content management has this end we crawl online content
0:24:19we have to filter inappropriate and depressing content
0:24:23then we index to index to using some language some parsing and entity detection
0:24:30we use metadata that we get from the source
0:24:34for topic information but also use popularity metadata
0:24:38and then we
0:24:39good at all into a big knowledge graph our knowledge graph and eighty thousand entries
0:24:46and three thousand topics so in and you can have multiple topics
0:24:51so here's a idea
0:24:53so we would take for example over in
0:24:57e upper left inside
0:24:59are a bunch of news article or
0:25:03bits of content that mention ut austin over here it is a bunch of things
0:25:07that men mentioned google
0:25:09et cetera
0:25:12okay so the system is evaluated
0:25:17by what amazon decided and basically that was really one to five user ratings that
0:25:24was the most important thing and then in terms of the final that there is
0:25:28that i it it's duration the ultimate goal
0:25:31if we had made it to twenty minutes
0:25:34with all the judges then we the team would've gotten a million dollars
0:25:38so we actually did really well
0:25:41i didn't expect us to get your five minutes
0:25:43so ten minutes was pretty good
0:25:46it's a hard it's really hard problem
0:25:47but the interesting thing is the other judges that we're not so all of the
0:25:53development was the amazon users
0:25:55but they are three people for interactive interactors and three people for judges
0:26:02where for the finals and they were people who were motivated to improve the system
0:26:09people who were like news reporters you're conversational it is
0:26:12and so the motivated conversationalist
0:26:15actually last a lot longer than the average amazon user however there are more critical
0:26:19so the average amazon user divas higher score so that's basically how it works
0:26:26so what we
0:26:27i actually pretty balanced and
0:26:29is the average the amazon users
0:26:32but the rating is at the end of the conversation
0:26:36you have a huge amount of variance
0:26:39and some of them
0:26:42declines rate is actually more than half of them inclined to rate the system
0:26:47so the ratings are expensive noisy and sparse
0:26:51and i haven't that
0:26:52you can have you know we're not occur between the states we get word sense
0:26:58this then you in a weird sense ambiguities can lead you to do something that's
0:27:03off topic
0:27:05and so you can have guide conversations you can get is i can get that
0:27:08depressing news you can have sections of the conversation that are working well
0:27:13and sections that don't work so well
0:27:15so you're or a score
0:27:17is not a equally representing all parts of the conversation
0:27:22and so in order to actually use that overall score
0:27:27to meaningfully do design
0:27:29we have taken a and then to the fact that users give us more information
0:27:34they actually accept or reject topics that we propose
0:27:39they proposed topics
0:27:41and the reaction to the content is important
0:27:45so what we actually do
0:27:47it's we take the conversation level recognition and we projected back to dialogue segment we
0:27:53can segment just because we know the topics from the system's perspective
0:27:58and we project that using the information of user engagement
0:28:03so you could be projected that non-uniformly
0:28:07and once we have those segment level estimated ratings
0:28:12then we can aggregate across conversations for example we can aggregate across topic we can
0:28:17add aggregate cross specific content
0:28:19or we can apply across eventually accurately aggregated cost use it right
0:28:25so this is how we could figure out a this is the content a lot
0:28:28of people like
0:28:29this is a constant a lot of people don't work so that's basically it
0:28:34so what i'm a bunch of the user's task just some kind regarding constraints
0:28:38we could not you
0:28:40and i think we have a audio side
0:28:43so speech recognition
0:28:46all we got is text we get an audio for privacy reasons
0:28:49asr is imperfect
0:28:51we don't get any audio so we don't get
0:28:54pauses we don't have sentence segmentation that's been changed in the version but we didn't
0:28:59have that
0:29:01we don't have intonation so there's a lot of things that we can is
0:29:06detect
0:29:09and this is it we can do u s and all but that's all we
0:29:14can do
0:29:15so there are some constraints so that just to say
0:29:18a lot of the errors are false alarm errors are all gonna show you have
0:29:23any examples you can appreciate
0:29:25okay so i'm just several conversations
0:29:29so what i wanna say here
0:29:31is used some observations and then talk about personal implications
0:29:38and then all the three of these i can talk about the user modeling
0:29:42so
0:29:44there are
0:29:46for dinner points and wanna make for users have different interests
0:29:51they may have opinions on a different opinion on the thing is a
0:29:56and use were example in the us
0:29:59news about from
0:30:00little is the whole or opposite reactions from users
0:30:06they have different senses of humour
0:30:09some people like our jobs and some people don't
0:30:13there is they have different interaction styles different well and they're different ages isn't of
0:30:18family so just a you example how this impacts the system
0:30:23one of the things that we found
0:30:26was people like to talk about vampires for some reason
0:30:29so this was the piece of information that a presented a lot to people and
0:30:36that
0:30:37basically says did you know that relation vampires are tiny monsters that perot into people's
0:30:42heads
0:30:43and for some the talk about that
0:30:45now we don't control the prosody on this because this is general content so it's
0:30:50basically read prosody
0:30:52and so when people are listening to this
0:30:55if there actually listening they are often amused as a kind of an so but
0:31:01sometimes
0:31:03they think it's
0:31:04a bad
0:31:05okay so they're not of
0:31:09or
0:31:10they what they had
0:31:13because this is didn't make sense to them
0:31:18i times you can tell they're not really listen
0:31:22so far well
0:31:26citrus
0:31:29but last three
0:31:30there are a user community is a little more complicated
0:31:35there are also the callipers
0:31:38and so this would and
0:31:41resulting in topic changes for those people like that
0:31:47they are different interaction styles so this is one user
0:31:51talking about vampires all kind of this was i useful user i'll come back to
0:31:56this for other examples
0:31:59and then we know that she user which is actually more frequent category
0:32:05where a lot of the answers when one word so
0:32:10this is important to appreciate that it affects
0:32:14language unit
0:32:15so
0:32:16the a type of user actually is a lot harder for language understanding
0:32:22because
0:32:23there are there is more recognition errors
0:32:27we're not
0:32:28you know it's harder to get intent
0:32:31this type of user actually is also hard for language understanding because
0:32:37we don't have prosody
0:32:40so what it's saying no in a way that
0:32:44so it
0:32:46if i ask a question
0:32:48do you want to hear more about this and the person says no
0:32:51that means they do not want to hear more about this if you a request
0:32:54if you say something and there are a lot and pairs as know that
0:33:01if you wanna hear more about this
0:33:03and so it's important because we don't have prosody
0:33:07that we use state dependent dialogue and language understanding but even that doesn't always got
0:33:14it
0:33:15so this is my argument for
0:33:17right industry we give us project
0:33:21okay so they have different calls to the information seeking goal
0:33:26the information some people just generally want to know more others ask specific questions others
0:33:32is really hard questions like why
0:33:35i there is
0:33:36a like maybe empire percent
0:33:39well i'll laugh are on and start asking a relevant question to the topic of
0:33:45vampires
0:33:46but not
0:33:48and the user to call or the that are that we were talking about is
0:33:53it really true that are like tv vampires and then there is a speech recognition
0:33:58here
0:34:00opinions sharing
0:34:02some people would like to spark a lot like to also share their opinions that
0:34:07actually not so hard to deal with because you can you that is you might
0:34:11in a party and not in huh
0:34:15and then there's other people who want to get to know each other they want
0:34:18to find out
0:34:19why a lexus favourite x is tell us about their favourite axes and so those
0:34:24are different levels you have to accommodate
0:34:27a we also have an adversarial user is we share suppose three family friendly
0:34:33if we do things that are not in we currently we got taken offline as
0:34:37this is really in the field
0:34:40for us
0:34:41did not use everything
0:34:44so we did not wanna get taken offline
0:34:46so we work really hard and we did many times we worked really hard though
0:34:51to build content filters in the come up with strategies to handle adversarial users
0:34:57so in this particular case we're not supposed to talk about anything
0:35:03related to pornography or sacks or anything like that
0:35:07so you just but a lot of users so you just have to have a
0:35:11strategy for dealing with that so
0:35:14in this case
0:35:17we just tell people are much as well
0:35:20when they have a sense of language one time we got taken offline because if
0:35:24you didn't understand what they said sometimes a good strategies to repeat what they set
0:35:30and that
0:35:31and so what we were doing this we were filtering all the concept we were
0:35:35presenting but we forgot to filter what the people's
0:35:38so our solution there was to take the babble heard and replace it with random
0:35:45funny words is one of my students came up with this that i would never
0:35:48i thought it was a really stupid idea but it actually people laugh so it
0:35:52people really liked
0:35:54so we say things like unicorn i imagine you record or it's actually more funny
0:36:00if it's in the middle of a conversation and its you know butterfly open your
0:36:05whatever it is
0:36:06and change then we change the subject and then there's a lot of people who
0:36:10manage and control it just have a strategy
0:36:14which
0:36:15i don't understand or whatever okay
0:36:20the last problem is working with children and you have a lot of children and
0:36:25problem one in working with children it is a that speech recognition just doesn't work
0:36:30as well for young children everybody knows that
0:36:32it companies
0:36:34have included some stuff to get it h o ring age found in him to
0:36:38lower things but really young children it doesn't work as well
0:36:41i'm quite sure i'm looking at the and bass
0:36:44this is a kid talking about the pet hamster but other than that it's really
0:36:48hard to figure out what they were talking about in this case asking them to
0:36:52repeat
0:36:53is not gonna solve the problem it's better to just change the topic
0:37:00they think it's content filtering
0:37:02so when you're talking to a kid at christmas time
0:37:08a lot of times in the us a lot of people want to talk about
0:37:11class
0:37:13fortunately a lot of the contents
0:37:16that we were scraping from was or at all
0:37:20and
0:37:22we take it also i because we set sail a class was a lot i
0:37:27another concept sequences that
0:37:31i we were not only people this i two is that would so ever saw
0:37:35that what points that start talking about the other class
0:37:39so results actually
0:37:42okay so we have a user personality not well
0:37:45it's based on the fly factor model that's based on the we ask questions based
0:37:51on this two questions but the real world readable more conversational
0:37:55weaker ones and things questions that we don't actually used to make it
0:38:01more engaging the people but we can ask human because this is the and the
0:38:06interaction where we're supposed talk about topics of people to want to just
0:38:10you know do you with all sorry
0:38:12so the data we have is very noisy and impoverished we're not asking that many
0:38:17questions
0:38:18buy tickets is it doesn't give us some information so what we can see
0:38:23is that personality for the things that we explored
0:38:27thus correlate certain types of personality correlates with higher user ratings
0:38:32so people who are extroverted
0:38:35agreeable
0:38:36or in haven't you give us high ratings okay sort of make sense
0:38:44i think that's interesting is there is a statistically significant correlation
0:38:49we owe personality traits and some of the topics that they like
0:38:54you know not for the topics a lot of people use
0:38:59not everything but there is system
0:39:01this correlation
0:39:04for certain types like kindergarten actually hurts
0:39:09there was the that data seem to be pretty good some extra perks like recent
0:39:13fashion introvert like a i-th routing task
0:39:18if you are and imaginative you like
0:39:22and you like things like a i've time travel anyway and
0:39:27low conscientious now as was explained as you know you don't like to in your
0:39:31home or a and those work with those people like pokemon be in one craft
0:39:36so that data actually sort of it sounds
0:39:39okay so just summary here
0:39:43the implications are that
0:39:45age and dialect
0:39:47that the implications are the user characteristics okay every single component of the system
0:39:55that age trial are dialect verbosity a pack language understanding your interests the fact that
0:40:03dialogue management and the types of if you're you talk a lot more errors that
0:40:10affects the dialogue management strategy
0:40:13you're interested that content management
0:40:15you're h does because of how you filtering is
0:40:20as we begin to user modeling we wanna multidimensional content
0:40:24in that so we can get ratings the different user trials
0:40:28and lastly the phrasing that we use the generation
0:40:34if we have more information about the user
0:40:36should be adjusted based on
0:40:39so a user modeling
0:40:42this is really early work
0:40:44so this is a preliminary so nothing public but i thought it would be under
0:40:48talk about in this audience
0:40:50so i'm gonna talk a little bit about why we care for content nanking and
0:40:54the user but in future embedding models
0:40:57so
0:40:59while we wanted to the task that were interested in and is given a particular
0:41:04the contents
0:41:06a project whether the user is going to engage positively or negatively
0:41:10or slowly with that content
0:41:13and so the time span is gonna be characterized in terms of the information source
0:41:17topic entities
0:41:20at some point later sentence and valence but we haven't done that yet
0:41:24the user engagement is characterized in terms of what topics as the user suggest
0:41:31what topics
0:41:32what does the user accept or reject
0:41:36positive or negative sentiment in reaction to the content but also a positive or negative
0:41:42sentiment in reaction to the ball
0:41:45because that reflects an overall being unhappy with content but maybe not a specific font
0:41:51probably generally
0:41:55so the types of features were using
0:41:58include both some user independent stuff that's like the bias term
0:42:03so relatedness the current topic and general popularity in dialogues
0:42:08but then the user specific features for mapping these different types of measures of engagement
0:42:14into a few additional features
0:42:18and then the work trying to use the light cues
0:42:24the user to capture things like age personality
0:42:29not the issue here is
0:42:32we have very little data so we don't know
0:42:36we have to treat each conversation independently conversations we know that no the conversation came
0:42:41from the same device
0:42:43but these devices are used by families and oftentimes use more than one person so
0:42:47you cannot assume that
0:42:49the person is the same problem
0:42:54conversation to conversation
0:42:56for specific device
0:42:58in the future you can still have that information but this is we have to
0:43:02use only a conversation
0:43:04so that it is very sparse
0:43:07so you have to learn from other users
0:43:09so
0:43:11just this is just a motivational slide
0:43:14this is just say that the user is really important so when we're predicting the
0:43:19final rating of the conversation if we consider topic factors
0:43:25i didn't factors and user factor so topic factors are what the topics or the
0:43:30topic coherent stuff like that
0:43:33who was it's just by the agent that there is there are things that the
0:43:37agent's is
0:43:40how they say that and then the user factors are user engagement and
0:43:46the robot's them and things like that
0:43:49user factors are alone
0:43:52you better performance than everything together
0:43:55in predicting the final conversation level so the user is really of work
0:44:02okay so
0:44:04i do not mention neural networks except to say that we didn't you and training
0:44:11so i'm gonna now mentioned in that it doesn't mean that in fact are used
0:44:18because everything has to be passed et cetera but we are using them in terms
0:44:22of finding user embeddings
0:44:24so the first thing we did was actually not be used a neural network
0:44:30well as latent dirichlet allocation
0:44:34a which is a standard way to do topic modeling that works modeling for any
0:44:42task
0:44:43so what we're thinking about what we think about this is each user is a
0:44:46bag of words
0:44:48and that would be a document like a documents
0:44:52and we're gonna come up where represent lda the clusters the different what about topics
0:44:59of lda would be user type so unsupervised learning user types
0:45:05so we just had to just do let's just use hand what topics or clusters
0:45:11because we don't think there's that many different user types and this would be undercut
0:45:16somewhat interpretable
0:45:17and that if you look at the most frequent words
0:45:22you the following phenomena
0:45:26people who like interact with certain types of things the people like to know what's
0:45:30one particular cluster people talk about music was another particular cluster
0:45:35and the personality quiz
0:45:38like this
0:45:39shows that another cluster
0:45:43interesting and
0:45:44a lot interest in the let's the
0:45:47shows that
0:45:48and another cluster
0:45:50interesting you know be oriented so with the legs what your name what's your favourite
0:45:57but analysis self oriented person i think i am
0:46:03there's people who are generally positive
0:46:06a whole one interesting
0:46:09and there's people who are interested immediately
0:46:14so that i l
0:46:18it's so first of all you play traffic the lda in order to get the
0:46:23interesting interpretable cluster you have clusters you have to do some frames
0:46:27you have dropped frequent words it turns out i really that we needed to keep
0:46:32yes and no in there is a positive people and negative people
0:46:35but because you get yes no a questions are just so i'm gonna get those
0:46:40in there
0:46:40that you have to for them out
0:46:44so uniqueness to make it work and there is you know there's this class and
0:46:47i have fundamentally in a perplexity of that's what we're doing
0:46:53without is that the right objective take your right users
0:46:57well trained on another a problem we played around the different objective
0:47:03to learn user embeddings and this was user we identification this is also unsupervised
0:47:09and then it is
0:47:11you're gonna take a bunch of sentences from user and bunch of other senses orchards
0:47:18from the same user
0:47:21and try to learn embeddings that make those things from the same user closer together
0:47:28and things
0:47:29to a user
0:47:32farther apart
0:47:33okay so we have
0:47:35distance to sell
0:47:37we want to minimize
0:47:39and distance to others we're gonna maximizes it's a minus sign
0:47:43so when somebody's talking about tasks and they keep talking about task
0:47:46we want those to be close
0:47:48and when they talk about something totally different that's gonna be five away
0:47:52that is another way of dealing with drawing up things
0:47:57so
0:48:00if we this work was actually done related
0:48:04and we have this problem where we're gonna let cid each and what are you
0:48:09serious and i say finally somebody else like that
0:48:13you from their tweets
0:48:15so using this unsupervised learning which we call reality it turns out and you're picking
0:48:21in from forty one person in forty three thousand random people we evaluated with mean
0:48:27reciprocal rank
0:48:29so basically the mean rain
0:48:32with our best
0:48:34just which was initialized with worked about
0:48:37and then use the identification is twelve that well at a forty three thousand is
0:48:41pretty good
0:48:42lda is a five hundred
0:48:44so this type of user adding i think is very promising
0:48:49very for dealing with learning about user types
0:48:53okay so how do we evaluate them that's with this task of embedding channel project
0:49:00engagement
0:49:02and a conversation level ratings x
0:49:05okay so in summary
0:49:07the a unit summarize the sounding board stuff and then the user stuff so basically
0:49:13the social by
0:49:16as a conversational gateway
0:49:18involves not
0:49:20accomplishing tasks
0:49:22i in hearing about helping the user of all the goals and collaborating to learn
0:49:28interest
0:49:30and the user what the user is doing is learning new fast
0:49:35exploring information ensuring opinions
0:49:37so that the end of your conversational a system
0:49:42the radical system components are basically related to the user into the common to tracking
0:49:47the user intents
0:49:50and engagement
0:49:51but may also managing and evolving collection of contents
0:49:58with you can think about a social chat knowledge
0:50:01and as i said in the beginning
0:50:04million conversations with real users and this new form of conversational it i
0:50:10least menu problems of this is just the tip of the ester
0:50:14okay so the sociological asr that user group that information exploration
0:50:20re a user variation so
0:50:23you know i'm sure that either conversationally i get a lot of user variation but
0:50:27it
0:50:28but with a lot
0:50:30understanding the user involves no
0:50:34not just what they said you can send that but also they are and lastly
0:50:38that use amount has implications for all components of the dialogue system
0:50:43and for evaluation
0:50:45so lots of open issues this is the typical shape of the iceberg
0:50:49a user and reward functions dialogue policy learning
0:50:54user response generation and so we have a context where language modeling that use the
0:50:59user model as an input
0:51:01and rate for user simulators of those times the things you could do we haven't
0:51:05started out and you have this
0:51:09but well as dependent the word function we have anyway it so it's a at
0:51:13this platform for language processing research and that i will still
0:51:43so that is stuff that i know best about and they're definitely where other people
0:51:48you
0:51:49participate who are interested in user modelling
0:51:53so wouldn't be so the system we feel that had no user modeling this is
0:51:58the coast
0:52:00value that this is close to competition
0:52:03using our data
0:52:05okay
0:52:05so we had no user modelling and they're
0:52:07we didn't have a the detection of engagement and the personality stuff
0:52:13and we did that actually started with you we use personality to predict topics
0:52:18so we had a little bit
0:52:19but not the not about it
0:52:23so there were other people interested in user modelling i don't know specifically what it
0:52:27what they did
0:52:30the presentations that were so i know more about the three finalists because of their
0:52:37presentations
0:52:41i
0:52:42don't think there was
0:52:45much using modelling and in that
0:52:50so
0:52:52so i would say i don't know as much
0:52:57more of that was so we did less the
0:53:03trying to use reinforcement learning and that sort of stuff because
0:53:08we just that we don't have the day
0:53:11so the people to more of
0:53:14that approach so i think there is a difference
0:53:19in terms of the silence of the approaches
0:53:22i you know when and the thing is
0:53:26everything is important
0:53:27so you know button most important
0:53:31you know that i think the user modeling definitely the user
0:53:35centric stuff so that the thing is in terms of being user centred we will
0:53:38change topics quickly
0:53:41and to if things were going style
0:53:44so i think that helped us i think the prosody sensitive generation process
0:53:49but i think most importantly having lots of topic
0:53:52contents
0:53:53interesting content
0:53:55helpless
0:53:57but you know the other stuff that people did it probably would have helped us
0:54:01if we had incorporated it just that was not always some time
0:54:05so it's hard to compare what was more important
0:54:09across teams
0:54:43exactly and that was indeed the strategy
0:54:46i
0:54:51i agree and so we don't do very often
0:54:54so what we did is we had
0:54:57a series of strategies
0:54:59for when we didn't understand what the person said
0:55:02that was one of them
0:55:05we also have the strategy of asking
0:55:07for repetition
0:55:10we also have the strategy of saying we don't understand
0:55:15so there was
0:55:17i think there is at least five different strategies
0:55:21we would cycle between with some randomness a but also some use a the sentiments
0:55:28of that percent to figure out
0:55:31the detected sentiment to figure out
0:55:34which to prioritise
0:55:36tobias
0:55:38between the different strategies so our way of dealing with it is to sample between
0:55:42different strategies
0:55:44there were actually at least one t maybe more than one team that actually used
0:55:50a lighter
0:55:51and incorporated adam's in the same way isn't it wasn't like our are many skills
0:55:58with a little bit like harmony still so they allow to take a shall use
0:56:01the lights the slot eliza into the conversation
0:56:06we did not do that
0:56:08we just had that as one
0:56:11particular strategy our own implementation of it
0:56:21very few
0:56:25but people do assets order to take
0:56:29and
0:56:31the ask questions that are a little bit more difficult so that's the that's like
0:56:35the why question
0:56:37there you people do that that's really hard you don't have a we don't have
0:56:41a solution for that right now
0:56:44more or and
0:56:46they'll ask them or slightly more specific question
0:56:50and we can come up with not a great response the least
0:56:58better than i don't know
0:57:00the thing i don't know when you say what did you find interesting is a
0:57:06it can valid but not great response
0:57:42wonderful question that you are asking that question they are not and because we don't
0:57:46have the prosody we can't tell
0:57:50and so a unit at different version of this talk i those examples and it's
0:57:56very frustrating
0:58:05you will know you would have a i mean prosody analysis is not perfect right
0:58:09but you would have a much better ideas so you could it would be easier
0:58:13to get sarcasm
0:58:17no request
0:58:38so are now natural language generation is not at all sophisticated
0:58:44that's an area where i would definitely want to improve it's just
0:58:49in my own mind it's not the highest priority so when we were generating the
0:58:54content
0:58:55about you know the news of the information or whatever it was
0:59:00basically you take what we got from read it and we that with minimal transformations
0:59:07so there there's transformations to make it
0:59:11shorter
0:59:14there's transformations to there are some simple things
0:59:18to make it a little bit more suited to a conversation
0:59:21all but mostly things that are really not suited to conversation we just for well
0:59:26so that strictly just
0:59:28the wrappers around the
0:59:31are generated but that's fairly straightforward
0:59:36so this is an area
0:59:37that
0:59:39we could do a whole lot better
0:59:52so the knowledge crowd
0:59:56basically provides links
0:59:58we we've it's
0:59:59man it's on you want the details the actual technical details
1:00:05they use dynamo db on the amazon cloud stuff and i can point you to
1:00:11my grad student how we do that it it's really important because we have to
1:00:16handle lots of conversations when we're alive we have to handle conversations all over the
1:00:22country
1:00:23so everything had to be super efficient
1:00:27within a conversation you have to respond quickly so everything has to be super efficient
1:00:32so the what the knowledge graph allows you do years
1:00:37say from this point
1:00:40if i want to stay on topic
1:00:44or keep with related topics
1:00:45this is
1:00:46the region of the set of things that i could go to and that we
1:00:50have a content ranking