0:00:15 | alright first let me thank you for the invitation and the opportunity to |
---|
0:00:20 | to come to all the modes |
---|
0:00:22 | it's so funny because a friend of mine saying all you going to the middle |
---|
0:00:25 | of nowhere i said no i'm going to the middle more idea |
---|
0:00:30 | and i really enjoy coming to new places that i've never been to |
---|
0:00:35 | so i talk about thirty is |
---|
0:00:38 | and new trend sort of technology trend that is really stripping merging and taking off |
---|
0:00:43 | and that is this notion of anticipatory search and how much a speech can contribute |
---|
0:00:49 | to that |
---|
0:00:52 | here sort of our region imagine you having a conversation with a friend and she |
---|
0:00:56 | says only to atone in spitting five minutes and as and putting down the phone |
---|
0:01:00 | and i'm and i look at the screen this is what i wanna see right |
---|
0:01:05 | i wanna |
---|
0:01:06 | basically have the directions of word to any to go and what do we need |
---|
0:01:10 | to be in five minutes |
---|
0:01:12 | and if you think about it we can have all the pieces already right would |
---|
0:01:16 | have user location we have good maps we have good directions we have speech recognition |
---|
0:01:22 | we have some reasonable understanding and so it's kind of a matter of putting it |
---|
0:01:27 | all together into one compelling application |
---|
0:01:32 | so that's kind of the premise we realize that the way that you find information |
---|
0:01:37 | is changing |
---|
0:01:39 | and we're moving towards kind of a query free search in the sense that instead |
---|
0:01:44 | of having to you proactively when you have a something going to find out having |
---|
0:01:48 | to fire up a browser final |
---|
0:01:50 | finally a search box and type in your query getting results it can be much |
---|
0:01:55 | more proactive you when you're context and what you've said and what where you are |
---|
0:02:00 | the information can come to you as opposed to are you having to find information |
---|
0:02:06 | but of course we're not alone in this in this idea |
---|
0:02:09 | recourse well the technology isn't future is that recently joined google had is a has |
---|
0:02:15 | a pretty similar vision so as search engines may be seen also search engine is |
---|
0:02:20 | that they one weight to be ask questions |
---|
0:02:22 | so releasing in our conversations |
---|
0:02:25 | what we say what we |
---|
0:02:26 | right but we would we here and they want to submit are needs |
---|
0:02:30 | and that's |
---|
0:02:31 | remotes that the same premise that |
---|
0:02:34 | expect maps was built on |
---|
0:02:36 | so let's look at some of the enabling trends |
---|
0:02:40 | for and to separate research |
---|
0:02:42 | there's mobile devices |
---|
0:02:44 | there's a i that is making progress |
---|
0:02:47 | and then and so if you put it together there's applications that can take contextual |
---|
0:02:52 | information and start making good predictions about what the user what informational needs of the |
---|
0:02:57 | user might be |
---|
0:02:58 | so like let's look at these you know in more detail |
---|
0:03:02 | it's obviously not surprise that |
---|
0:03:04 | about the whites sre you could as you can probably go anywhere |
---|
0:03:08 | to and you know a few minutes later there's a couple of |
---|
0:03:12 | you |
---|
0:03:13 | videos on youtube already about that event and you know hundreds of pictures the in |
---|
0:03:18 | fact there's technologies now that are trying to recreate some sort of a three D |
---|
0:03:22 | map just based on the fact that you have images from different point of view |
---|
0:03:27 | so |
---|
0:03:29 | then there's the amazing sort of growth of mobile devices so this is a statistic |
---|
0:03:35 | for are smart phones and tablets both running |
---|
0:03:38 | i O S and an and right and of course the absolute count there's us |
---|
0:03:43 | in china because of the |
---|
0:03:44 | population that have the highest up numbers but if you look at the growing market |
---|
0:03:49 | is basically southeast asia and on and stuff so the merrick and some other a |
---|
0:03:54 | growing market |
---|
0:03:56 | so |
---|
0:03:57 | we're ending up in a position where pretty much any adult is gonna have |
---|
0:04:03 | the smart phone in their pockets |
---|
0:04:05 | and so that really changes to the possibilities of what you can do with that |
---|
0:04:11 | because this martin this is mobile devices have a lot of sensors and you can |
---|
0:04:16 | think of well of course we have cameras we have microphones this why there is |
---|
0:04:22 | a gps |
---|
0:04:23 | but also if you look closely for example in this so |
---|
0:04:26 | let's see is for there's gestures sensors proximity sensors covers politics or amateurs |
---|
0:04:33 | there's even a humility sensor so that you could drop your phone in the water |
---|
0:04:38 | they can what the warranty |
---|
0:04:41 | and |
---|
0:04:42 | barometer |
---|
0:04:43 | so basically it turns out that this device is that we are not pockets in |
---|
0:04:48 | so to some extent no more about where we are then we ourselves might be |
---|
0:04:52 | aware |
---|
0:04:56 | and there's more right |
---|
0:04:58 | we all know about sort of logos of that also has |
---|
0:05:02 | you know bone-conduction transducer in addition to well other stuff and then more futuristic things |
---|
0:05:08 | right like there's research actually by |
---|
0:05:12 | and you hear unusual that is able to do recognition just based on the other |
---|
0:05:18 | facial must look activity right you have these sensors so i could be talking and |
---|
0:05:25 | i said without formation a you'd be able to still recognise so in fact i |
---|
0:05:30 | was talking to well to marry that may be an interesting challenge |
---|
0:05:34 | for some |
---|
0:05:36 | feature and used evaluation |
---|
0:05:39 | then there's this more you know three stick a electro and the follow gram headsets |
---|
0:05:45 | that it still kind of you know not very clear what you can do with |
---|
0:05:50 | them but they're becoming more stylish so people might start wearing them |
---|
0:05:55 | and then there's interesting things like this happen application from what roller |
---|
0:06:00 | where |
---|
0:06:01 | basically they have this idea that we all the nowhere an electric a tattoo here |
---|
0:06:07 | nor next |
---|
0:06:07 | that is gonna have the microphone and you can help also with speech recognition |
---|
0:06:13 | there's all kinds of ideas about how to |
---|
0:06:17 | collect more data about what we do in where we are |
---|
0:06:21 | and then there's sort of progressive in the back and right once we get this |
---|
0:06:25 | information what can we do with it |
---|
0:06:28 | and there's been some talk here about how much progress we're making we're all familiar |
---|
0:06:33 | with this |
---|
0:06:34 | with this chart of the famous a word error rates for different tasks |
---|
0:06:39 | no are we reaching some sort of a plateau but we know that that's not |
---|
0:06:43 | the case because there's working dynamic speaker adaptation there's all these work in the in |
---|
0:06:49 | the deep neural networks that we've been talking about also work in extremely large language |
---|
0:06:53 | models that are making the recognition be better |
---|
0:06:58 | there's also some working and all you not language understanding around conversation and topic modeling |
---|
0:07:03 | there's a knowledge grabbed all talking a second and so if you put all these |
---|
0:07:07 | together with some machine learning algorithms we're getting to a point where can be |
---|
0:07:13 | start to be reasonably good at understanding |
---|
0:07:16 | a human conversation |
---|
0:07:19 | so |
---|
0:07:20 | this is in this audience this is this is obviously very well known but it |
---|
0:07:24 | is gonna remarkable that we now have |
---|
0:07:27 | these a fairly substantial improvements in down to convert accuracy things to these |
---|
0:07:33 | do you will networks and there's work here from microsoft ibm google and there's |
---|
0:07:37 | others in the room that are working on this |
---|
0:07:40 | something that you might not be as familiar which is the fact that deep learning |
---|
0:07:45 | is also being applied to not a language understanding |
---|
0:07:49 | and i would |
---|
0:07:51 | when you to |
---|
0:07:52 | but to make sure that you're aware of the so called down for sentiment treebank |
---|
0:07:56 | was recently released by at stanford university |
---|
0:08:00 | and there's is a nice paper recursive give models for semantic compositional at over sentiment |
---|
0:08:05 | treebank by other soccer and was also i mean the same group as andrew on |
---|
0:08:09 | and on chris manning |
---|
0:08:12 | and what they do is |
---|
0:08:14 | the |
---|
0:08:16 | they published made available this corpus all over eleven thousand annotated utterances where they've been |
---|
0:08:25 | parsed in this binary parse tree and then every node is mean annotated with the |
---|
0:08:30 | sentiment about whether it's from very negative neutral prosody very positive |
---|
0:08:37 | and so the and then to the interesting part is |
---|
0:08:41 | how |
---|
0:08:42 | they man so the make they make use of theme multiple |
---|
0:08:48 | layers you know deep neural network to actually model the saying that the levels in |
---|
0:08:54 | a parse tree |
---|
0:08:56 | so that bottom-up can composition really fine the sentiment about a value at any you |
---|
0:09:03 | know by doing these steps |
---|
0:09:05 | so for example if you look at the sentence this film doesn't care about cleverness |
---|
0:09:09 | weirder and you know that kind of intelligent humour |
---|
0:09:12 | there's words like humour the case of plus a very positive one intelligent also so |
---|
0:09:17 | this whole parse-tree |
---|
0:09:19 | we sparsity |
---|
0:09:20 | except when you reach the negation just doesn't |
---|
0:09:24 | care about these and so the overall sentiment is negative |
---|
0:09:28 | and this is very powerful because after now the traditional model has been back of |
---|
0:09:33 | words |
---|
0:09:34 | a vector space and it's |
---|
0:09:38 | heart to model these relationships and |
---|
0:09:41 | we all know that up |
---|
0:09:42 | language has a deep structure it's kind of a recursive structure and |
---|
0:09:47 | there's is long distance relationships with |
---|
0:09:49 | certain modules within the sentence |
---|
0:09:51 | that are harder to capture enough |
---|
0:09:54 | in unless you |
---|
0:09:56 | really get a false sense of the parse tree |
---|
0:09:58 | so applying this |
---|
0:10:01 | they gate getting gains of well |
---|
0:10:05 | you know what twenty five percent |
---|
0:10:07 | improvement in the |
---|
0:10:08 | accuracy of the recognition of the sentiment over these this corpus which by the ways |
---|
0:10:13 | about movies this is from |
---|
0:10:15 | movie reviews |
---|
0:10:17 | so that so encouraging that |
---|
0:10:19 | that this technique that is not popular enough asr can also be transferred to natural |
---|
0:10:25 | language understanding |
---|
0:10:27 | then there's another a very important train |
---|
0:10:30 | the way i seed in how we can improve that which understanding |
---|
0:10:35 | and |
---|
0:10:36 | just of all these earlier today with saying well the kind of the you in |
---|
0:10:40 | asr use gone of missing a bit |
---|
0:10:42 | i think knowledge graphs a really the answer to that |
---|
0:10:46 | and wise that well because |
---|
0:10:48 | we can go from this kind of disembodied strings |
---|
0:10:52 | two and kurt entities in the real world right there is a nice but possible |
---|
0:10:57 | that says from strings to thinks |
---|
0:11:00 | so what that what is what is that |
---|
0:11:03 | and knowledge graph really you can think of it as these giant network what the |
---|
0:11:08 | nodes are concepts and then they're slings that really one entity to another for example |
---|
0:11:13 | you know george cloning appears in ocean's twelve and you know this is movies and |
---|
0:11:19 | an actors |
---|
0:11:20 | and how they're really to each other |
---|
0:11:23 | and the interesting part is if you know some history |
---|
0:11:27 | you might remember psych |
---|
0:11:30 | which was an attempt was still open sec still exist |
---|
0:11:34 | it's an attempt to kind of create these very complex representation of |
---|
0:11:39 | all known human |
---|
0:11:41 | knowledge especially strip common sense |
---|
0:11:44 | but the problem is that one is be able by hand |
---|
0:11:47 | and they spend a lot of time deciding whether a property of an object is |
---|
0:11:51 | intrinsic or extrinsic |
---|
0:11:54 | kind of splitting hairs a something that is not there so it quite relevant the |
---|
0:11:58 | way that this knowledge graphs are being built now is different |
---|
0:12:01 | you will start with |
---|
0:12:05 | start with wikipedia |
---|
0:12:08 | and there you know |
---|
0:12:09 | there's a at the data sets of machine readable version would you pdf that you |
---|
0:12:13 | can ingest and then you can start extracting these entities and the relationships and there's |
---|
0:12:18 | some certain degree of money alteration we can get pretty far with an automatic process |
---|
0:12:22 | and so companies are doing this |
---|
0:12:25 | and |
---|
0:12:26 | for example has knowledge graph that has ten million entities and thirty million properties in |
---|
0:12:32 | time you know connections microsoft have their own the court's authority and they have three |
---|
0:12:36 | hundred billion entities |
---|
0:12:38 | well five have a five hundred twenty million and it is an eighteen good properties |
---|
0:12:43 | and then there's also more specialised ones |
---|
0:12:46 | like factual for example which is a database of places point of interest local businesses |
---|
0:12:52 | and they're also getting to sixty six million entries |
---|
0:12:56 | in fifty different kind |
---|
0:12:58 | and then of course you can take social media |
---|
0:13:01 | and see their origin of entities and relations use which is people as a as |
---|
0:13:07 | the version of a knowledge graph and so linked units now what twenty five million |
---|
0:13:11 | users and facebook is over a billion |
---|
0:13:15 | so |
---|
0:13:16 | if you think carefully about these it means that |
---|
0:13:20 | anytime the do you relate to what concept |
---|
0:13:23 | or named entity like a place robotically organisation or person |
---|
0:13:27 | you could actually you're able to grab that and map it onto one of these |
---|
0:13:32 | entities |
---|
0:13:34 | so that the traditional idea more in the linguistic side of |
---|
0:13:39 | we do part-of-speech and we find this subject and the object |
---|
0:13:43 | we can is they'll be some relationship |
---|
0:13:45 | but this is still not really it's groups |
---|
0:13:48 | i a bit easier material with the knowledge graph you kind of and for these |
---|
0:13:53 | and say you're referring to this movie you're bring to that person and then there's |
---|
0:13:58 | all kinds of inferences and disambiguation that you can do all |
---|
0:14:02 | without knowledge right |
---|
0:14:04 | so |
---|
0:14:04 | i think to the fact that we can start to represent pretty much more human |
---|
0:14:09 | knowledge |
---|
0:14:10 | at least in the terms of sir |
---|
0:14:12 | concepts and entities |
---|
0:14:14 | in a way that it's read fit you know you know you know a commercial |
---|
0:14:18 | representation is very important and that's very big step towards real natural language understanding because |
---|
0:14:23 | it's more grounded |
---|
0:14:27 | one of the usages |
---|
0:14:29 | for |
---|
0:14:31 | for a knowledge graphics for disambiguation and there's is classic sentence from linguistics rate i |
---|
0:14:38 | saw the men on the keel |
---|
0:14:39 | with the telescope |
---|
0:14:41 | that can be interpreted in a variety of ways similar which are depicted in this |
---|
0:14:45 | funny graph right so it's what the linguists call a prepositional phrase attachment |
---|
0:14:51 | problem is it |
---|
0:14:52 | with a telescope is it attached to the hill or to the man |
---|
0:14:56 | or to me and on the hill again does it types of the manner to |
---|
0:14:59 | me so |
---|
0:15:02 | traditionally there's been really no way to solve this except for context but if you |
---|
0:15:07 | think about imagine that you have access to my amazon purchase history |
---|
0:15:14 | how do you and you saw |
---|
0:15:15 | but i just bought a telescope you know two weeks ago pen you would have |
---|
0:15:19 | a kind of a this idea of the priors right you could have a very |
---|
0:15:22 | strong prior that it is me who is using the telescope to see the man |
---|
0:15:26 | on the hill |
---|
0:15:27 | so |
---|
0:15:28 | it's obvious that the more context and the different sources of this context that we |
---|
0:15:33 | can have access to |
---|
0:15:34 | gonna help disambiguate natural language |
---|
0:15:37 | that's context in one aspect and then gonna with different idea is that we also |
---|
0:15:42 | know that you're intent and what you're looking for also depends on where you are |
---|
0:15:47 | so that's another |
---|
0:15:48 | place where |
---|
0:15:50 | location now is important contextual location |
---|
0:15:54 | this is this is not new there's a bunch of companies that are using for |
---|
0:15:58 | example exploring the yours as location local search obviously by sort for japanese restaurants depending |
---|
0:16:05 | on where i am gonna get different results |
---|
0:16:07 | one yell for example |
---|
0:16:09 | then there's also company select employee i that focus on |
---|
0:16:12 | sort of predicting what you might need based on your calendar entries there's Q at |
---|
0:16:17 | startup that was recently part by apple also in this space and then there's also |
---|
0:16:21 | obviously google now |
---|
0:16:22 | that |
---|
0:16:23 | sort of |
---|
0:16:24 | use able to ingest things like your email and makes sense at and understand that |
---|
0:16:29 | you wanna have a flight or a hotel reservation and then take it makes use |
---|
0:16:32 | of that information to bring a relevant alerts when the time is right |
---|
0:16:38 | and finally the last piece is the recommend or systems right we're all familiar with |
---|
0:16:44 | things that they like and amazon you get recommendations for books depending on the stuff |
---|
0:16:48 | that you've but for |
---|
0:16:49 | and the way the systems work is kind of semantic like a lot of spell |
---|
0:16:53 | data but the users and then they class of the users and see all your |
---|
0:16:57 | similar to these users so you might also like this on the book and this |
---|
0:17:01 | is expanding for your net flicks from movies and or an spotty five for music |
---|
0:17:05 | a link in facebook for people that you might know et cetera so |
---|
0:17:10 | all these |
---|
0:17:11 | systems are using context to kind of make predictions or anticipate things that you might |
---|
0:17:17 | mean |
---|
0:17:18 | so |
---|
0:17:19 | it is within this general context of the emergence of anticipatory sort that we start |
---|
0:17:26 | this company and expect laps is the technology company based in san francisco |
---|
0:17:31 | that we start about |
---|
0:17:32 | twenty five years ago |
---|
0:17:34 | with this idea of creating a technology platform that especially designed |
---|
0:17:41 | for |
---|
0:17:42 | this real-time applications that are gonna be able to ingest a lot of states |
---|
0:17:47 | give you relevant contextual information |
---|
0:17:50 | so |
---|
0:17:51 | in sort of run step |
---|
0:17:52 | the way works as we |
---|
0:17:55 | are able to receive |
---|
0:17:57 | it's real time and dates about what you are |
---|
0:18:00 | what you might be saying |
---|
0:18:02 | what you reading like on a new email |
---|
0:18:05 | and you can |
---|
0:18:05 | assign different weights to some of these modalities right so something but i say or |
---|
0:18:10 | something that i treat is gonna have a higher |
---|
0:18:14 | wait and something that |
---|
0:18:15 | i'm an email that i receive which i may just sort of scheme or read |
---|
0:18:19 | as opposed to |
---|
0:18:21 | deep |
---|
0:18:22 | read deeply |
---|
0:18:23 | so but we take all these inputs in real time and this allows and we |
---|
0:18:28 | process then we extract important pieces of information from all the sources and that creates |
---|
0:18:33 | dynamic model our best representation of what the user is doing and their intent and |
---|
0:18:40 | therefore were able to |
---|
0:18:42 | all sorts cap for information across many different data sources to try to provide information |
---|
0:18:47 | there's gonna be useful to that user at that point i |
---|
0:18:52 | and as a forty example of this platform |
---|
0:18:55 | which created mine mel |
---|
0:18:58 | mind meld it's right now and i put our |
---|
0:19:00 | that understands or conversation |
---|
0:19:02 | and fines content as you speak |
---|
0:19:05 | you can think a little bit of the sky where you can invite people and |
---|
0:19:09 | start talking |
---|
0:19:10 | and then we'll get |
---|
0:19:12 | interesting content based on that |
---|
0:19:16 | and all gonna give a demo in a second |
---|
0:19:19 | important aspect of the design of my mlps that we wanted it to make it |
---|
0:19:22 | very easy to share information because if it ever tried to have a kind of |
---|
0:19:27 | a collaboration session a using sky people quickly find especially the i |
---|
0:19:32 | on the ipod that it's difficult to say you wanna share a an article you |
---|
0:19:36 | have to leave the sky at have to find a browser or to some searches |
---|
0:19:41 | and then you find sort of the url and then to try to send the |
---|
0:19:45 | url thrust of the sky i am which may or may not be active and |
---|
0:19:49 | so it's a bit cumbersome so we wanted to |
---|
0:19:52 | make it very easy for users to be able to discover |
---|
0:19:55 | to |
---|
0:19:57 | to navigate and then to share information |
---|
0:20:02 | in the stuff that you share becomes a permanent archive of the conversation then you |
---|
0:20:07 | can look back to use |
---|
0:20:10 | right so with that things that |
---|
0:20:13 | when a give it a little demo |
---|
0:20:18 | my email |
---|
0:20:20 | see how that |
---|
0:20:21 | works |
---|
0:20:24 | so this is my ml and you can see that i have access to |
---|
0:20:27 | some of the sessions or conversations that have taken place in the past we can |
---|
0:20:33 | think of you may have a recording meetings like every tuesday you have your update |
---|
0:20:38 | with your colleagues and so you would joint that section because everybody's already |
---|
0:20:43 | invited |
---|
0:20:44 | and plus you can have all the context |
---|
0:20:47 | all the things to |
---|
0:20:48 | the shared items and the and the conversation that when that was previously happening that |
---|
0:20:54 | session |
---|
0:20:55 | but for now i'm gonna start a new session |
---|
0:20:59 | and i can give a name |
---|
0:21:03 | learn what's |
---|
0:21:04 | i can make it friends only |
---|
0:21:07 | what can make it public rights invite only |
---|
0:21:11 | and |
---|
0:21:16 | it's if the connection works |
---|
0:21:23 | this is now making a call to facebook |
---|
0:21:25 | the face at i |
---|
0:21:27 | that |
---|
0:21:28 | okay here we go so |
---|
0:21:31 | let's say that i will invite alex |
---|
0:21:35 | likes my able |
---|
0:21:37 | okay |
---|
0:21:41 | so |
---|
0:21:42 | now what i'm the only one in the conversation and so otherwise if as soon |
---|
0:21:47 | as alex joins you would also see |
---|
0:21:49 | information about the speaker right |
---|
0:21:51 | you know the thing that we found when you talk to people like no |
---|
0:21:54 | web text run to |
---|
0:21:56 | on the |
---|
0:21:57 | on some sort of a conference call |
---|
0:21:59 | people tend to kind of google each other and find the lincoln profile well here |
---|
0:22:03 | is in which is you that to you right |
---|
0:22:05 | and this is a discovery screen so i'm the only one seeing this information |
---|
0:22:11 | but if i decide to share then everybody else in the conversation would see that |
---|
0:22:15 | which is why for example |
---|
0:22:17 | you know they find the current location |
---|
0:22:21 | of the user right here in the |
---|
0:22:23 | in this whole what's congress hotel |
---|
0:22:28 | so |
---|
0:22:29 | the most interesting parties |
---|
0:22:31 | when you have multiple speakers but for now i'm just gonna give |
---|
0:22:35 | so we will real demo of how this looks like |
---|
0:22:40 | okay mine mel |
---|
0:22:46 | in mind meld |
---|
0:22:49 | so was |
---|
0:22:50 | wondering a whether you by some part about present no batman's brain mapping initiative |
---|
0:22:56 | i so this new technical clarity that makes brains transparent |
---|
0:23:00 | that might be a help for L |
---|
0:23:02 | for these mapping initiative |
---|
0:23:12 | so |
---|
0:23:12 | you can see that you know the we show you that about a ticker items |
---|
0:23:18 | here of |
---|
0:23:19 | what we own what we recognise we try to extract some of the of the |
---|
0:23:23 | key phrases |
---|
0:23:26 | and |
---|
0:23:27 | and then we know we do some post processing and bring irrelevant results |
---|
0:23:33 | see what else |
---|
0:23:36 | okay mine mel |
---|
0:23:38 | so we're gonna have some friends over maybe we should cook some italian food |
---|
0:23:43 | it we can do a mean a strong to so |
---|
0:23:46 | fitted you know for it |
---|
0:23:48 | maybe that would be nice |
---|
0:24:01 | so you can see the mean wait works |
---|
0:24:06 | if i like this for example i can drag |
---|
0:24:10 | and share it |
---|
0:24:11 | and this is what becomes part of the of the archive |
---|
0:24:15 | which then everybody in a conversation C and also becomes experiment archive but i can |
---|
0:24:20 | also access through a browser |
---|
0:24:28 | anybody has a topic or something that might be interested in |
---|
0:24:42 | i is okay my mel so paper more anyways interested in deep belief neural networks |
---|
0:24:48 | that's something that we've been talking about |
---|
0:24:51 | at this L ieee asru |
---|
0:24:54 | conference in other modes |
---|
0:25:12 | so |
---|
0:25:14 | one of the issues is i think pattern i are not connected in facebook |
---|
0:25:19 | because otherwise we would have found |
---|
0:25:22 | the right "'cause" are model |
---|
0:25:25 | i |
---|
0:25:36 | however if we are |
---|
0:25:42 | not even this one okay |
---|
0:25:44 | this is but you can see right so something |
---|
0:25:49 | let's stick to ieee okay i |
---|
0:25:54 | so one of the things that we do is we do look at the intersection |
---|
0:25:58 | of the social graph of the different participant you know call |
---|
0:26:01 | so that we can then |
---|
0:26:03 | be better at |
---|
0:26:06 | disambiguating |
---|
0:26:07 | no named entities right so |
---|
0:26:09 | so if we had been connected and |
---|
0:26:12 | pay a pit on brno would have been the real they don't know what in |
---|
0:26:15 | right here |
---|
0:26:23 | alright so |
---|
0:26:25 | but |
---|
0:26:27 | let me go back to the |
---|
0:26:29 | presentation real quick here |
---|
0:26:31 | so |
---|
0:26:32 | this is the platform that we've than the we build and |
---|
0:26:36 | if you wanna sort of |
---|
0:26:39 | dig a little bit deeper |
---|
0:26:41 | one of the novelties i think is that were combining the traditional and all P |
---|
0:26:45 | with a more we call and of search style approach |
---|
0:26:50 | because the interesting part is that were able to model |
---|
0:26:53 | semantic relevance |
---|
0:26:55 | based on the context |
---|
0:26:57 | the what we're speaker least be easily set and the user model and also from |
---|
0:27:01 | the different data sources that you can you have access to |
---|
0:27:05 | so basis something like work we go for dinner and then the other person says |
---|
0:27:09 | i don't know you like japanese sure any good base around union square |
---|
0:27:13 | we're building these incremental context |
---|
0:27:16 | about the overall intent of the conversation |
---|
0:27:19 | and so |
---|
0:27:21 | were able to then you know |
---|
0:27:23 | do natural language processing the usual stuff part-of-speech tagging noun phrase chunking named entity extraction |
---|
0:27:28 | anaphora resolution semantic parsing topic modeling and some degree of discourse modelling and pragmatics |
---|
0:27:35 | but then the or the piece is that depending on the signal |
---|
0:27:39 | that we get from each of these different data sources and you can think of |
---|
0:27:42 | my social graph that was mentioning |
---|
0:27:45 | the local businesses that factual or el can give you |
---|
0:27:48 | personal files right you give us access to drop box or to europe will drive |
---|
0:27:54 | we can make take that of the data source |
---|
0:27:57 | and then there's more the more general web with needles and general content and videos |
---|
0:28:04 | but what's interesting is that even this the response that we get when we do |
---|
0:28:08 | all these searches |
---|
0:28:09 | that also informed as about what is relevant and what is not |
---|
0:28:13 | about that particular |
---|
0:28:14 | you know conversation |
---|
0:28:17 | put in other words if for example you work to build an application that only |
---|
0:28:20 | deals with movies and T V shows an actor stand any reference to something else |
---|
0:28:25 | that would not find a match |
---|
0:28:27 | would basically not give you |
---|
0:28:28 | results |
---|
0:28:29 | but that also means that would be much more precise right in terms of the |
---|
0:28:34 | answers the that you give the relevancy of the content |
---|
0:28:38 | in so this is something that |
---|
0:28:40 | because we have well |
---|
0:28:42 | kind of very scalable and fast backend |
---|
0:28:45 | allows us to do multiple searches |
---|
0:28:48 | and we have some cash as well but basically these |
---|
0:28:50 | makes as |
---|
0:28:52 | be able to compute the semantic relevance of an utterance never a dynamic way |
---|
0:28:56 | based on context and also based on the type of results that we obtain |
---|
0:29:02 | so this is a you know technology conference so what tech technical conference some of |
---|
0:29:07 | the ongoing R and D as you can imagine is quite substantial |
---|
0:29:11 | in the on the speech side |
---|
0:29:13 | there's |
---|
0:29:14 | we have two engines we have an embedded engine that runs on the ad |
---|
0:29:18 | and also we have passed club a speech processing so an interesting |
---|
0:29:22 | research is you know how to balance that and how to |
---|
0:29:27 | how to be able to on the one hand listen continuously put on the other |
---|
0:29:31 | also be robust to network issues |
---|
0:29:34 | and then there's in terms of practical usage there's things that you can imagine detecting |
---|
0:29:38 | sub optimal audio conditions like when the speakers so far on the mic noise environments |
---|
0:29:44 | as we all know heavy accents are an issue |
---|
0:29:47 | and then |
---|
0:29:48 | one of things we found is because is an ipod at it's very natural for |
---|
0:29:51 | people to kind of leave it on the table and two things happened they speak |
---|
0:29:55 | to each from far away and also the can be multiple people |
---|
0:29:58 | speaking on you know to the same device and our models try to do some |
---|
0:30:02 | speaker adaptation |
---|
0:30:04 | and sometimes that doesn't work that well |
---|
0:30:08 | and then sort of the issue with this kind of the holy grail of could |
---|
0:30:11 | we detect you know a sequence of long |
---|
0:30:14 | and grammatical works and |
---|
0:30:18 | when he's gone of you bridge |
---|
0:30:19 | and of course there's techniques to do that but |
---|
0:30:21 | we're trying to get |
---|
0:30:23 | improve the accuracy of that |
---|
0:30:24 | and then in terms of natural language processing in information retrieval also kind of a |
---|
0:30:28 | design question are things like the class i cannot P problems like word sense disambiguation |
---|
0:30:33 | although obviously the knowledge graph helps a lot |
---|
0:30:36 | and then enough resolution and some of these things we do with the social graph |
---|
0:30:42 | an important aspect is |
---|
0:30:43 | these knowledge graph is useful but |
---|
0:30:45 | how do you dynamically updated how do you keep it fresh |
---|
0:30:49 | and we have some |
---|
0:30:50 | some techniques for that but it's |
---|
0:30:53 | it so |
---|
0:30:54 | ongoing research |
---|
0:30:56 | then every important aspect is |
---|
0:30:59 | deciding that the sorts working this right |
---|
0:31:02 | as we all know if we if you leave a speech engine on |
---|
0:31:05 | but i remember an anecdote from are alex waibel that you told me once it |
---|
0:31:09 | as an engine running in his house and then when he was doing the dishes |
---|
0:31:13 | with a look cling incline that you know the search engine was spouting all kinds |
---|
0:31:17 | of the interesting |
---|
0:31:19 | hypotheses |
---|
0:31:21 | this is been alluded to of course you can have a fairly robust voice activity |
---|
0:31:24 | detection |
---|
0:31:25 | but there's |
---|
0:31:27 | there's always room for improvement |
---|
0:31:30 | the search more than is as i mention is not just |
---|
0:31:33 | understanding that something is speech but also detecting of how relevant something is within this |
---|
0:31:38 | within the context and this comes of these other point of the interrupt ability and |
---|
0:31:45 | mind meld is a bit too verbose right this is just a showcase of what |
---|
0:31:49 | you can do also because the ipod has a lot of real state sequence shoulders |
---|
0:31:53 | different articles in practice and through the a i'll talk about in a second |
---|
0:31:57 | you have a lot of control about how like twenty one to be interrupted when |
---|
0:32:02 | you wanna |
---|
0:32:03 | a search result for an article to be |
---|
0:32:07 | to be shown and this is |
---|
0:32:09 | a function of at least two factors one is |
---|
0:32:13 | have |
---|
0:32:13 | in place in the request is how much the user ones to have certain information |
---|
0:32:18 | and the other one is what i was mentioning about the nature of the information |
---|
0:32:23 | found how strong is the signal from the data sources about the relevancy of what |
---|
0:32:27 | i'm gonna show |
---|
0:32:29 | and what i mean by that is |
---|
0:32:31 | you can think of |
---|
0:32:33 | but you by set |
---|
0:32:36 | the difference between |
---|
0:32:38 | what is the latest movie by woody allen |
---|
0:32:42 | versus i've been talking about woody allen in |
---|
0:32:44 | and i mentioned the that |
---|
0:32:46 | the keys latest movie et cetera |
---|
0:32:48 | right so one is a direct question where am the intent is clear more like |
---|
0:32:53 | a serial like application where and trying to find the specific information the other one |
---|
0:32:58 | is a reference sort of in passing about |
---|
0:33:00 | something |
---|
0:33:01 | i'm and so |
---|
0:33:02 | that |
---|
0:33:03 | would be the these understanding of |
---|
0:33:06 | how eager i am to receive that bit of information |
---|
0:33:09 | so that's work that is ongoing being able to model that |
---|
0:33:14 | and then finally |
---|
0:33:16 | we have a fair amount of feedback from this right especially when the user shares |
---|
0:33:21 | an article that's a pretty strong signal that was relevant |
---|
0:33:25 | on the negative side haven't shown you this but you consider flick one of the |
---|
0:33:31 | entries on the on the right on the left hand side that eager items as |
---|
0:33:34 | we call them you can delete them so that would be good of negative feedback |
---|
0:33:38 | about |
---|
0:33:39 | certain entity or a key phrase that was not |
---|
0:33:41 | deemed relevant by the user |
---|
0:33:44 | how to |
---|
0:33:45 | optimize the learning that we can obtain from taking that user feedback |
---|
0:33:49 | is also something that |
---|
0:33:50 | that we working on |
---|
0:33:52 | especially because |
---|
0:33:54 | the decision to show certain article based is complex enough that |
---|
0:33:58 | sometimes it's harder to assign the right sort of credit or blame for how we |
---|
0:34:03 | got there |
---|
0:34:06 | so just do well |
---|
0:34:09 | sort of |
---|
0:34:10 | twenty five what we're doing there's two products that we're offering |
---|
0:34:14 | one is that might melt |
---|
0:34:16 | my not obvious what you see here |
---|
0:34:18 | and as a matter of fact |
---|
0:34:19 | the mind meld out |
---|
0:34:21 | is gonna be alive on the apple store tonight |
---|
0:34:25 | so |
---|
0:34:27 | we've been working need for awhile and it's finally happening |
---|
0:34:30 | so if a if you're welcome to tried out |
---|
0:34:33 | i guess will be tonight well |
---|
0:34:36 | for whatever time zone you're up store a is set to so i think |
---|
0:34:41 | new zealand users might already be able to download it |
---|
0:34:44 | and then for the us will be |
---|
0:34:46 | in a few hours |
---|
0:34:50 | so that's a mimo but then |
---|
0:34:52 | the other thing is |
---|
0:34:54 | were also offering these the same functionality when api about a rest based api |
---|
0:35:00 | that |
---|
0:35:01 | you're able to well |
---|
0:35:03 | give this creates sessions and you and users and give this real time updates so |
---|
0:35:08 | that and then you can query for what is the most relevant you can also |
---|
0:35:13 | select the different data sources and so it any given point you can ask for |
---|
0:35:17 | what are modeled thing system most relevant set of articles |
---|
0:35:22 | with a certain parameters for ranking et cetera so we're having |
---|
0:35:27 | already a system |
---|
0:35:29 | degree of well of scoring |
---|
0:35:31 | how lots |
---|
0:35:32 | with comes |
---|
0:35:33 | for example some or all of our backers which include by the way google ventures |
---|
0:35:38 | and also sums and |
---|
0:35:40 | intel top twenty car |
---|
0:35:42 | liberty mutual |
---|
0:35:44 | they're all in the backers that we're trying to do some prototypes with |
---|
0:35:49 | so |
---|
0:35:50 | i'm character to try it out and |
---|
0:35:53 | i was thinking that |
---|
0:35:55 | because i'm actually gonna be missing the launch party that is happening in san francisco |
---|
0:35:59 | i'm gonna take our banquet that the bishop's palace as the ones party for might |
---|
0:36:04 | know |
---|
0:36:10 | that's what i want to say and we have some time for questions |
---|
0:36:32 | was at all |
---|
0:36:34 | the i was wondering i'll the track the users they the example the key we |
---|
0:36:41 | want to eat something and then |
---|
0:36:44 | is it is still sticking to the restaurant domain and me and no |
---|
0:36:49 | what the example you show that's all you're adding information and how about you change |
---|
0:36:55 | information that you previously used switch to another domain |
---|
0:37:00 | how you jack use |
---|
0:37:03 | there's to right of information that we use for that one is simply time right |
---|
0:37:08 | that sort of as time passes you can of so you decay certain previous and |
---|
0:37:12 | trees |
---|
0:37:13 | the other one is some |
---|
0:37:15 | kind of topic detection clustering the we're doing so that |
---|
0:37:19 | sentences that still seem to relate to the same topic kind of you know how |
---|
0:37:24 | help a |
---|
0:37:25 | sort of ground round that topic |
---|
0:37:28 | and then there's also |
---|
0:37:31 | some user modeling about you know you're previous sessions so that we have certain |
---|
0:37:37 | prior weights |
---|
0:37:48 | what |
---|
0:37:53 | well so you know there there's |
---|
0:37:58 | i'm not gonna sitting some specific algorithm that we use but you can imagine there's |
---|
0:38:03 | some you know statistical techniques to |
---|
0:38:06 | to do that modeling |
---|
0:38:09 | where small startup we can not like reveal everything |
---|
0:38:15 | so like very much so it's great another question so |
---|
0:38:21 | i one point you happened mentioned |
---|
0:38:26 | asr you and all the modes probably enough came out as a that's are you |
---|
0:38:31 | a slu and columbus |
---|
0:38:34 | no it's |
---|
0:38:36 | it would same |
---|
0:38:38 | the really what |
---|
0:38:39 | that what you've shown us are ways of organising information at the output and the |
---|
0:38:44 | process |
---|
0:38:45 | but also same particularly not example when the beanie the |
---|
0:38:50 | not only that it's actually well it does know exactly where you work it's without |
---|
0:38:55 | map |
---|
0:38:56 | and it might even figure out that you're at this thank all layers are you |
---|
0:39:00 | but this things we're not being reflected in the lower level transcription process so i |
---|
0:39:06 | was wondering how the mites you don't have to tell us anything it's buster's father |
---|
0:39:12 | figured and to train nice things |
---|
0:39:15 | well it's obviously a that the research issue of how you |
---|
0:39:20 | make the most of the contextual information and unfortunately |
---|
0:39:24 | asr specially the well the these cloud based asr |
---|
0:39:30 | de at this point doesn't |
---|
0:39:32 | fully support the kind of adaptation and comp and dynamic modification that would like to |
---|
0:39:38 | do |
---|
0:39:39 | but that's kind of a and an obvious thing to do in the same way |
---|
0:39:43 | that you constructs larger contexts and fine you know the all the people that you're |
---|
0:39:47 | related to and at that you're specific lexicon having something like the location and the |
---|
0:39:52 | towns nearby or would be something |
---|
0:39:55 | very no sort of natural to do |
---|
0:39:58 | but we're not being this |
---|
0:40:02 | i have to say your search for better more innocent implement because |
---|
0:40:06 | the previous one used to be a and the step and it has so that |
---|
0:40:10 | you made |
---|
0:40:11 | so when you search for permanent you go okay so this is better |
---|
0:40:16 | well that the asr was no hundred percent accuracy |
---|
0:40:20 | which one to use |
---|
0:40:23 | actually we use the writing including a new ones and cools |
---|
0:40:35 | sex for a talk also pitch wondering about privacy occurrence i was on those impression |
---|
0:40:41 | that's the more |
---|
0:40:43 | i want to |
---|
0:40:45 | interact with this mind meld at some live in or a need to be transparent |
---|
0:40:50 | for the for that and my personal data |
---|
0:40:57 | well i have a |
---|
0:40:59 | actually a philosophical reflection |
---|
0:41:02 | that |
---|
0:41:02 | as a society with this technology we are going to words what i'm calling with |
---|
0:41:07 | transparent bring |
---|
0:41:08 | a and |
---|
0:41:10 | if you think closely about it up |
---|
0:41:12 | the better we are collecting data but users and modeling their thing intentions |
---|
0:41:18 | we can get to a point where |
---|
0:41:21 | you can almost all of complete your thought |
---|
0:41:24 | right assume that you start typing the query and gonna be knows what you might |
---|
0:41:27 | one |
---|
0:41:28 | and of course is just a little bit of science fiction but |
---|
0:41:31 | we're kind of getting there and so i think the way to address that is |
---|
0:41:35 | by doing very transparent about this process |
---|
0:41:40 | and giving you full control that what is it that you wanna share for how |
---|
0:41:43 | long |
---|
0:41:44 | because |
---|
0:41:45 | that's really the only way to modulated it's not just say one gonna opt out |
---|
0:41:49 | and not just gonna use |
---|
0:41:50 | any of these |
---|
0:41:52 | anticipate research because basically will be unavoidable right but so i think it's |
---|
0:41:58 | it's |
---|
0:41:59 | what we need to do is how well some clear |
---|
0:42:03 | settings about |
---|
0:42:04 | what you wanna share with this out for how long |
---|
0:42:07 | and then insuring the back and that that's really |
---|
0:42:09 | the only way the only usage |
---|
0:42:11 | of that information |
---|
0:42:15 | but as an example |
---|
0:42:16 | we're not recording |
---|
0:42:18 | this |
---|
0:42:19 | the voice rate |
---|
0:42:20 | and is the only thing that is permanent in this particular mind all application |
---|
0:42:24 | are this the articles that you've specifically share |
---|
0:42:28 | that's the only think that |
---|
0:42:34 | so i'm happy that maybe if you're looking at pedro if you are task pedro |
---|
0:42:40 | in police record would you see something |
---|
0:42:44 | it may be you wouldn't wanna see so is there anyway i like when you're |
---|
0:42:47 | looking at your space |
---|
0:42:48 | do you have certain |
---|
0:42:52 | contexts that you're searching for things when you bring information back like let's say you |
---|
0:42:57 | know this descending order social setting or some other context |
---|
0:43:02 | yes so what one of the shortcomings of the little demo idea is first of |
---|
0:43:06 | all you was only one speaker it's always more interesting when israel conversation |
---|
0:43:10 | and the second is it wasn't really a long ranging conversation about certain topic where |
---|
0:43:16 | mine mel excels at least in say you wanna |
---|
0:43:20 | planned application with you know some of the frames are somewhere else and you sail |
---|
0:43:24 | well gonna go here then you explore different places you can stay things you can |
---|
0:43:27 | do and you share that when you have |
---|
0:43:30 | a long range in conversation with this with the kind of overarching goal |
---|
0:43:34 | that's where it works the best if you keep sort of switching around then it |
---|
0:43:38 | becomes more like a serial like search that doesn't have much |
---|
0:43:42 | in just a quick question so how do you build your pronunciation so if you |
---|
0:43:46 | look at asr you would spell line out that if you look at icassp you |
---|
0:43:50 | actually see it doesn't work |
---|
0:43:52 | that's it's mostly in the lexicon there's certain |
---|
0:43:56 | abbreviations there are more typically |
---|
0:44:00 | separated like you know i guess are you or some other ones like need to |
---|
0:44:02 | alright guys that would be a spoken is a war |
---|
0:44:05 | it's so it's becomes in the pronunciation lexicon pretty much |
---|
0:44:12 | you more questions |
---|