so as i mention the globally for the presentation was to tell you are there
is life young darpa
plaintiff able to it is lots of interesting stuff
some of the things we do in companies like a will is this is very
different the scale is
a unique
and there are different challenges
and i think is it really exciting i'm to be in interspeech
because suddenly finally i speech is becoming useful if you have told me
four years ago and just for us to all that speech was something that will
be really use by mainstream people a i wouldn't believe it i mean and in
fact there are before i can't the well i was oriented my carrier more into
the area of data mining in speech mining because i thought that was the only
way
what is speech could be useful i really did not believing interactivity you know humans
talking to computers
so at the same very surprised and is working
so
to tell you a little bit
about the a little the history of speech i will just kind of interesting
people ask me all the time
the multimodal you once
so
the team is that they're not around two thousand five
and i think we
there what discusses internally whether
which would be a lower am systems or was to a license technology and
the decision was another plane at the same i going on the time was more
in favour of licensing a but my trial a me we were pushing for now
we need to be a lot of stuff so we one and i'm glad we
we post and convince K
but on the other hand that was very see what has these cultures where we
do every we build everything even our own hardware
a on data centre so i wasn't is a decision i two thousand six we
basically a started to build a what at that time wasn't a state-of-the-art system
and we look very lucky because we got people like if you look any
we had a lot of experience building train their infrastructure and all that you can
sell week a lot of impostors bins and
building the colours
language modeling was easy for us because we could leverage all the work from the
translation be
and the three was basically and the challenge for us was to build this infrastructure
on top of a will distributed computing energy so that was really
different
so in two thousand and seven
and these are to the reflex the mindset of the team we build the system
and was called who for one and it was really a directory assistance you will
call the telephone number and then you could ask like a
i
tell me what is the closest based on
a so everything was going through the voice channel
and similarly two thousand and seven we also study the voicemail transcription project because we
had a product called still available we will voice is like a it's like a
telephone numbers assigned to you and that's for working
so a voicemail transcription was also relevant
and they are seven eight a
things are there was a radical change with a
the appearance of a
as much telephones with and rate i mean
but i wait a if you hear me saying describing something in this talk it
doesn't really mean i i'm not meaning to say that we did it before
just see that we really "'cause" in fact you know there is a rich history
of mismatch telephones before nokia probably microsoft and apple but any in any case for
us
that was like a an eye opening experience and we decided to basically the top
any kind of work with the L D was going through the telephone channel and
switch to directly to smart phones and levitated operating system and send everything that channel
in two thousand and nine we also had L
a project to youtube transcription an alignment and actually initially i was working in that
area because that was what i have been doing before
the idea was to facilitate a be the transcription a lot's got some alignment
and that is still around
hi just here has been doing a lot of work in that area rate interesting
stuff
a and in two thousand nine we basically went from voice search another telephones to
dictation
in the keyword you will see sometimes a tiny microphone that allows you to dictate
in two thousand and then we also enabled what we call the intended pi basically
is to allow developers to leverage our servers so they can build speech into that
applications on hundred
and in two thousand ten we started in this part of adding going beyond transcription
into semantics basically the nexus that understanding why from voiced collection
in eleven we went into this top layer was just are bringing search into your
laptop
in two thousand eleven we started this speech project this was then be analysis on
the team is london with us
and in two thousand eleven we study the our language expansion that's something i have
been doing for the last three years tell you a little bit about that
perhaps because it's a little bit relevant to the work on
but there was no pooling
i in two thousand twelve we started in earnest activities in the speaker and language
identification
with the goal of building instead of their system and i think we're pretty much
there
and this year we basically probabilistic a web interface for speech so that
you can basically that is that you can inject
you can call our recognizers from any web page
and in two thousand and thirteen also this here we have a
go on into these and i would really like a little bit about that the
in this transformation of who will from
from the transcription
not will
the speech the role of the speech in globalphone providing transcriptions to basically going into
understanding on the system
people ask me all the time what's how is this it's team organise so i
also wanted to tell you so you don't ask me anymore
so
but i want to say is that we focus exclusively on the speech recognition we
don't focus on semantic understanding or in the be there are other things for that
i mean with a little bit of an emptiness matters to help us put out
a generate that
language models for data mining of is like that
so that the group is organized size headed by difference we can this organising several
subgroups there is the acoustic modeling thing
this is have i mean Q by kenny and you know they basically work on
acoustic model algorithms a bayesian
and robustness
i would say that probably the most research see oriented they really at the edge
of new eigen things all they work in the nns is that in that group
then there's another group which
we call it the language is modeling
acoustic recent interest in name because we would with this team is the result of
merit in the language model in the amount they internationally say something so we in
my school maybe
so it's language languages
and we put a lot of work related to build lexicons a language modeling we
take care of
keeping our acoustic models for asr dimension that a little bit
we develop new languages and we are in terms of improving the quality of our
systems there you bring in the quality of improving you things we also have a
speaker and language are the activities in this you and then this is really large
now is headed by me in new york france what form montague
we must be
once we
thirty probably no
then there is the services group they take care of all the infrastructure you know
the set of errors that are continuously running our products
they take it of deployment scalability
this is the more software engineering activities are core activities in the team and then
we had a platform and the colour team there in terms of new recall that
are worth aims
the colours that can run from the device to be stupid to system
there is activities of eighty two word spotting and speaker that either also
then
we have a large data operations and posting
well
this is the team lattices and of front end that must analysis on the front
really their goal is to the data collections on the patient's transcriptions
i once you start doing so many languages and so many problems you need
that has to be annotated
and under the in of course they need tools so that you see too much
hassle this data
and of course we have at the tts activities
so everybody see the man to be or need in you are
probably rough happen have and then the tts guys that
i think they the makeup of the team is
probably fifty percent so when the nearest
fifty percent a
speech sound there's
lately we have been growing a lot of the so when you need inside i
think we should really grown more noun the speech signs
and in addition to this court thing we have it is to him of linguists
there to spread all over i mean we what we do you we often bring
up a linguistic team in a country like we have for example with the mean
in island
they have been helping us with speech recognition and we just
and but i like to see that everybody code
even some of how a linguist
when i doing well i remember there was this job that even the lawyers up
with what able to write code
a anyway so let me tell you a little bit more about some of the
technologies
you will basically everything we do in the speech team is big data
which is kind of bothers me because i think that it's community has been a
big data for a long time so i don't know why noise fashionable has a
new name but whatever
so i will not tell you about they
the nns because the you know
you know more than me and we have fit into that the topic
you know we have any stupid infrastructure and
and i can tell you takes a week train on the few thousand hours of
data things busloads of make it faster
but there is something that is that a fundamentally different when it comes to acoustic
modeling a well
which is that will not transcribed data
and this my come out a system price but there are reasons for that
the main result is that there is a lot of data to process so i
think
i think you don't have done they compilations back of envelope i think everyday our
servers process from five to ten years of for more or less way sounds like
a lot but you talk to people like marcel who has a background in and
a data centre
you know what they process telephone calls and that
it's not that much actually they probably more
but when you have so much data is
it's like drinking from the five or so we have this humid shallow
so you have the speech file course for window that you and you have to
figure out how to get something out of it right because this a lot of
data
so
and this to you just an idea
we break down our languages into T O one language is the important ones that
one's identity traffic and then the tear to
and then there is that you're three like icelandic and still generate a little later
but even
they you know that you're one his unit and then it more like thousand hours
per day of traffic
a problem only depends on the language
but even they entity or two
the ones that a little bit less important or that we launched recently like vietnamese
for all ukrainian basically an eight hundred hours per day
so
so for us a i was thinking about that yesterday
our and the results probably is not the data a like a lot of the
word about the sponsoring our and the results probably use the people to look at
all these data coming from
so
so really what we try to do is we try to automate the held out
of it as much as we can i personally this like
to have any engineer looking at the language for more than three months because it's
not scalable
i prefer that we come up with the solution and we put into what automatic
infrastructure
so
in our team we have been investing a lot of what we call our expressed
pipeline
are you wasting
maybe around in that it for
and what the steaming rephrase
so
basically any way to comes into voice search is log like what is time for
spoken
so we keep you know what looks
a lots of late that's i mentioned from five to ten years of all you
per they
and you want like well
unlike in a typical acoustic modeling training set up you know the one you learning
the school
we don't transcribing because it's impossible there's no way to transcribe all that
and then follow the traditional approach was what you supervised training
so what we do is we
basically look at the transcription produced by the collection and you
and then we must as that they'd that's what's as we can trying to decide
when can we trust transcription when we cannot
we do a lot of data selection
it's a lot of work into that
apply combinations of statistics are
confidence scores
and then we train
so
that's the image would like to use the audio comes from applications goal was in
this you know that the transcription is provided to the user or the developer we
longer
and then goes into our or infrastructure what we must us the data extensively
and actually scoreboard
so this is what we call
all the other a desperation
and one of the things we have been looking at
so i think is interesting use
how do you sampled from this
ten years of data that you get that they what is that you apply
the you try to random selection
do you being the data according to confidence scores and then you try to select
particular mean right because the data you a organiser data according to confidence
you might be tempted to use the data that is
i guess what are computed
but then you could argue that you're not learning anything new right because if the
recognizer is correct is not much to learn
so you go a little bit deeper into the conference so select
a phrase is utterances that you tries but not too much
it's not obvious what do
it's and i think guys a very active area of work for us
the teletext and we look for example we do a lot of what we call
distribution flattening
so
and with this for two reasons one reason we do it is to
increase the type of coverage
for example i can tell you we discover this problem where somebody was
asking for weather in so it
well enough and
the recognizer was failing all the time and when we investigated we discover that particular
five triphone
score but it was not you know what or so
so that are good reasons to really not train your systems can model the head
of the distribution when it comes triphones or what is but the flooding
in other is on is that you have to be careful because on what is
a very popular for example in korea
you just select high confidence scores and select the utterances by that
a ten or fifteen percent of your core right is going to be composed of
three queries one is a down
but of research and using a clinic or you are not available
last one is bound
abilities forgotten
so i you are building a if you are not careful you're building a it's
three word recognizer
so you really need to flooding
not trust
the distribution completely
but of course a unsupervised training is landers right and you probably have heard a
stories about it i know people in apple have worked with these
this is what we call the P problem so you hear sound clear
talk about the P problem i would tell you what it is
so the history is that we launched accordion system
and we start to collect topic complex and we're going to into
retraining with this unsupervised approach
and of course
when you have someone state that you look at the you look at the locks
right you just push it to the system but at some point with that look
you know the logs like
we notice that thirty percent of the topic
is a they talk and B
where a little bit mystified way that so we listen to the data and we
note is that when this wayne or of talked about the effect
or
goddess passing by
the recognizer is maxim that we that okay P
you know what sounds like that which our lexicon will provide a transcripts phonetic sequence
like
so it's a it's a possible so it matches explicit noise
and we will do it with high confidence
so
it is that's hypothesize in that it that the P R talk in becomes like
i think it starts to capture more
so
so this is the P probably on
are you have to be vigilant for
i we have observed that every language seems to have a P talking
so
we have found for example while we have P talk and in other languages
a feasible
sometimes it
sometimes is a like a sequence of can sometimes right because they kind of matched
noise about of the times is something else
and there are some but examples here
so
so you know we deal with this in many ways
some of the phase we do it for example in terms of transcription flattening or
triphone flattening
those help a lot
they start to filter out these people transcriptions
but another simple thing we will use we have at this is said that contains
a lot of noise
cars passing by an air blowing to the microphone and we always i evaluate what
we call our project set so
if the rejects set us that's providing a lot of those transcriptions then
it's really nice because you number one identify what that of the new P tokens
and you can filter again slows
we also model these noise explicitly we have noise models to capture this kind of
problems and get it get rid of it in the transcription
from time to time i think when you only the one supervised transcription there is
the danger that the system is going to going to sound court in their behaviour
so from time to time is not a well yet retranscribe corpora
a really use a new model so you kind of remove it from this but
corner cases and then study and
but at the same this is unsupervised this every active area for us
and that a lot of very interesting problems to be with
so given all the so we stick some safeguards
we basically select
something like
four thousand hours also
and we retrain our acoustic models and we tend to do this
we started doing every six months now we i think we have a monthly cycle
i don't hoping we get into it to week cycle so every two weeks our
acoustic models are retrained
and there is on that are two reasons for that
one reason is that we need to track the change in fleet of hundred devices
on unlike able
that are many telephones with different hardware different microphone configurations so it's important to track
those changes and every week there is a new model
so you need to track that
i think you also want to change to track user behavior a new ways for
example there is
that are different uses of our system initially it was base or queries
now that are longer what is more conversational and you know the
the acoustic change a little bit so we want to track goes
and it does still
discontiguous acoustic model training
basically allow us to not only to track but to improve the performance and indeed
in this particular about that are more things bigger acoustic model in that are we
actually tracking but
but it does help
i also have to say that
they repress pipeline
i mean and talking mostly about acoustic models
but we also use email ideas for language modeling
for pronunciation learning
so that i think we do is we obviously our work with acoustic modelling thing
so whenever there are based practises new maybe yes
and i say set for some recently really like to walk to prototype on icelandic
well matched me white
so whenever they discover something that works really when a nice in icelandic we bring
it into our pipeline and we basically we track the work of the acoustic modeling
thing
something works well
and we try to encode these into amassing work flow
that as i said every two weeks it does everything trains
you'd evaluates with a testing sets
i would tell you a little bit more later about our matrix
and the other thing it does is that it actually creates this is this is
really need it creates a change least basically telling you okay this is all i
change and ready after it doesn't evaluation so
you do like the model the only thing you want to say yes i like
this multiple simple X and
so we still have a human same yes or no
we could train a neural network to do that for us i guess
and another thing we had we now is to we have been thinking of how
can we improve these even more
so we have been that following another thing we check all the that they will
also help approach
a show you why they
maybe of the david hassle of approach is that you have a very good looking
i'm fast
i will think which is the production system
maybe it's a little bit done based would look you know fast
and then you have it really is mar i will thing where you know it's
which is likely the computed in the U Cs
and they the here is that we can
re process a lot of our laws we reach acoustic models
we treated language models with deeper means things you put in doing products because the
system will be real time
and the goal is that we can read instead of taking just a transcriptions from
the collection system
re process all you
and if we do that we immediately see reductions in the transcription error rate of
ten percent
and then you can understand that sticks select data
retrained acoustic model
and actually one of the things we're starting to do is to
have
products and acoustic models and really reach acoustic models
was only porpoises to a preprocessing data that they might be a slower
and you can it that it is probably applies we had in this process right
now
and the aim is that through
all these tricks we can read reviews the error rate on our transcriptions hundred training
and again the goal is really not to transcribe we can avoid
and as i said similar ideas are used in language modeling
and in pronunciation modeling
i where we try to learn from all the
let me tell you a little bit about the medics
because surprisingly whatever rate is not
the only thing with the
voice search basically exceed it's a similar behavior to search you know it's a
he said distribution with it really long tail
if the only thing you do is a major the task the word error rate
on the desk instead you transcribe like months ago or two months ago
there are several problems for that one is that most likely you're going to be
majoring have of the distribution
the talking as like face things like that
so i mean after a while you look very well on the common tokens
but you really care about the tape those tokens that you're three times today how
well you know
i'm and what test sets don't over the
i is not also practical to transcribe every single day i know optimal loving but
not possible
and the queries are changing
it's a evolving all the time whatever and testing said that was really one month
ago might not be
but i don't know
and you know even the best on speech transcribers there is only time between in
the data packing it's in the you get in so you can use of those
among
so we used to a
identity matrix i mean
we still use whatever rate
but we also used to alternative metrics
one is that what we call side by side testing so the idea is that
you just want to measure the difference between two systems the product some system and
a candidate system the candidate system could be a system that has a new acoustic
model
or a new language model or any pronunciation lexicon or a combination of the three
whatever it is
and what we do is we basically look at the
we select like
i'll thousand utterances or three thousand utterances from yesterday
we have the transcriptions that the collection system gave us
and then we re process those with the new candidate system
we look at the differences i mean if a hypothesis out the same in but
we don't care
i don't the differences we do in a B S and we'll has had the
search thing has had from many years
any infrastructure think of it like a small
mechanical turk with a user's distributed everywhere in the world
which are pretty familiar and the only thing you really as they miss listen to
the leon tell us which one is
more closely matches the only
so this is bases very fast to do
so you can you can get resulting in a couple of our sometimes less
the other thing with the which is i think is even more interesting is what
we call like experiments
so that he has that you have a new candidate system
and that you feel is pretty good
and we need policing deploy this system into products and it it's that's taking a
little bit of one percent ten percent
and
and we basically track if U matrix that beset in the deck metrics like the
creek the click through rates
whether the users are picking on the result more or not
whether the users are corrected by hand out the transcription we provide
whether they use it stays with days the application or
also way
and you know of course there is a lot of a statistic out processing to
understand when the results of signal if you can i when they're not
but this is just really useful because
it allow us to
believe a systems quickly before we increase the topic to
one two percent
and of course that i think is that the user that's even know that he's
been subject experiment
so our kinds of metrics used are
this is kind of hundred related to how is this system doing
that i think is growing a lot basically in the last
but in the last three years
it seems like it doubles every six months or what i think i don't see
i don't we don't see the train went down
i mean that our results for that is not just that speech is becoming more
use what is that we have been adding more languages
their role of english in our applications had has been diminishing used to be of
course the so that i think now is a little bit less than fifty percent
and the top ten longing non you were single is languages they generated now more
than fifty percent of our topic
and the other thirty seven they didn't less
but
but what we actually have seen in the past is that once the quality of
a language improves
a things begin to show that
another thing that is very interesting is there
the percentage of what is where there has being a semantic interpretation instead of just
task transcription we are providing
we have parts in the output is increasing and i think this is only for
english is beginning to be around twenty percent of the time we act on the
query
we parts
we understand what it's S so these are some graphics
here is for example if we attack an improvement in france
where you know we keep adding the amount of things better pronunciations lighter acoustic models
at own language model
a language model trained with clean data
we do a lot of data much hassle of
our queries before we build language models for but we happen to tell you about
that
a larger language model so on and so forth so this is a continuous process
limited a little bit about the language is therefore i thought
might be relevant to the aliens
and also having working on that for the well
so in two thousand eleven we decided to focus on you languages
i mean was a necessity because hundred was becoming global and we need to bring
body as much as we could
so we went through the initial analysis and like many of you out we went
to at the lower bound we got these or some pictures of all languages and
you know
organising activities some level a real cool
and then you look at the statistics like everybody else you see one seven that's
all languages
and so forth families of languages and
and then you look at the statistics
six percent of so it's a spoken by
more than six percent of the languages are spoken by more than a million people
and a six percent of spoken by less than a hundred people so probably will
not bother with those
and then you call with adults we can't and more or less
you basically cover
ninety nine percent of population so
we internally keep talking about that our goal is to build voice search in the
top three hundred languages i think
it's a good selling point it's a good sounding bit
i think in reality
probably after we reach at which are they
we at that's a lot
we brought it would have to rethink what to do with the next ones
"'cause" that there are many sentiment for
the only two languages are there is no where is nothing to search so
but you could argue that it's a it's a look back right when you have
you have a speech recognition technology maybe you facilitate the recent content so we need
to break this
this loop somehow
well that is we are very problem is our approach to rapid prototyping of languages
i think rather than an algorithmic approach which is
that would have been
i would in itself
a thinking we decided to take a more process approach focus on process
we basically focus on solving the two main problems
which obviously may i here this week we need you want to which is how
the held way get data
so we basically spent a bit of time developing tools to collect data very quickly
very efficiently
other hand
we build software tools that run on the telephones that allow asked to send a
team can collect data in a week a two hundred hours of data
we also be of the lot of webpage a web based tools to go annotations
so then result is that in three years we collected more than sixty language is
around sixty languages and at any time we have teams in collections so right now
we have teams see
more we're planning at and also be what is an outspoken and all that stuff
it's a so
we're starting in it for indian languages
we
and farsi we're going to collect in L A
so little bit is still you need a
so this is how a lot of data collection application looked is called data how
and there is actually an open-source where some of this that
idiom button are
put together with the ground so i think what you to talk to him
because you can also do it
and this is how our web based to for annotations look like
this is the tool we use with our vendors
with our on linguistic teams festivity worldwide so they can give us and i think
this is a phonetic transcription to a task
or they can do for example this is opinions selection task
or they can do a transcription task
for test set something like
and
you know in
just to talk about more a bit more about rapid prototyping
for lexicons i think lexicons is an area where we're still do not as fast
as i like
a lot of our lexicons are base which is good because now that we have
trained linguists they can probably put together to a lexicon
for set for four languages about regular
i like sponges or swahili in it they most likely
i'm for low that are
more difficult then we
we thereby cumbersome lexicon support we collect a seed corpora with our tools and then
with changing to be
for language modeling that has never been a problem for us because we just mine
web page is a we have the advantage of the what it's doing what people
search in the particular language that's very useful
of course every language has its no one says and you know whether it's a
segmentation what don modeling or inflection we and that building a tool for any language
like we have been working on inflection morning for us and
but the boosting is a once you build the tools you can deploy the for
any language so i'm hoping at some point we ran out of we're
linguistic phenomena
and we have to swallow and
lots of data must aztec civilisations that we about having place one problem you have
is mixed languages in the queries
so you have to classify or something like that
but the process is pretty automatic and
a for acoustic modeling is the most automatic thing once you have the data rather
you push the button on
and typically and they later you have a neural network training
and the way we develop a language is now is
we basically have a date operations the in domain this data collection some we a
lot of preparation and then we made in these we call it works on languages
we meet for a week in the room
and we typically have a success rate of fifty or seventy percent
meaning that in a we get a system that is but lacks and very
reluctantly forest means and the right of
around ten percent
and some languages are quite a little bit more work and you know six months
later we go back to the
so we have been lots in languages at an average of
four five last year we more we like the thing
very so this is their language coverage we have
a
so you can i mean he's forty eight in production
somebody asked me that they why we have basque coliseum
and gotten and spanish
you can figure it out
we have all these languages in preparation time i might maybe in the i'm hiding
the innocence was clearly
and that's a set our teams are collecting more data so we will keep going
an interesting is that we
we have gone into that languages i think we still have leading
although we run into well also delayed
and we had bill imaginary languages so this is my challenge for there are private
lessons
see you can tell me what language is this
let me see
somebody downloading a movie
i can try to do
you know if not outright in
okay we'll i think will try sometime today
i wanted to briefly mention atis and running out of time
basically all these languages are available into a P ice one is a the under
api this is a pointer just look for
speech hundred
and there's also a web api disobey simple api used in the way from we
give you the transcripts and we're thinking about in reading then a little bit more
but a lot of developers have been building a
applications on top of recipients and i think for as a P this create yes
of course is pretty
there are very important because
for two reasons when we launched a new language data really provide us with more
data and at the beginning more latest book
and i think it also exposes users and developers to the idea that hey i
can bill applications with the speech recognition and this is good for us
recent which is
sometimes are
the developer some faster in doing things that are useful like for example when we
started working on will not which is a large
kind of semantic assist and system our
we didn't have data but because we have this api and developers have been building
cd like applications in under four years we could leverage that data and you was
really good semantic annotations
and just to finish the little bit
i think we are now in
in the middle of this big transducer in the speech recognition at least within from
transcription two or more conversational interface
and you know there's all these new features that i did this is not speech
is in the be done by other teams
but you know seems like a core reference resolution so it becomes more conversational
a problem resolution weighted refinements my voice
a more to come
they make the application will be more interesting and
we really i think that the company
is in the middle of this transformation where will goes from these white box where
you type
into one of an assistant where you
you talk you engaged in a conversation
like to think of who a list
trying to become like bachelor you can talk to
on you know that changes everything these long term be single based on the computer
what i hope with
a little bit better personality but
that's a little bit where R
where we are trying to go this
pervasive
role of a speech not only not you're under telephone but on your that's your
car
in your appliances of home
you know assistant that
even makes access to information which is what mobile is about leaving easier and less
intimidating for many years as
so
the aim is to have a
and this is related to speech technologies are not only the microphone here about various
microphones are always on always listening to you we have maybe steps with this thing
call okay well
a signal less
so you can talk to your device get home talk to your refrigerator whatever it
is i know it's about the conversation
predicated not so what you
about your data
with really high quality speech recognition we try to get the time better
and so on and so forth and really conversation
so that was just want to tell you
the questions with a little bit late but
a
i don't becomes R
we with
it is concerned i think there are four
collect as much data and it was really a philosophical choice
to spend more money on more careful annotations especially for translation
where we actually did not sell
it's not always the body it's the call
first the common
many students are university use
google transcriptions part of projects actually works very nicely
it's been a great
source
work but i have one question which is
you cannot recognise my name
why
i is that you know we have a
in intra lingual engineering this thing we call yellow's which is
when we identify problem and we get together in a room and we don't stop
and in resolving
so named recognition was identify in july
and i actually i wasn't in that it for
we came up with the solution so it's been deployed a like
to they actually products
well chuck tomorrow
no but at the know this you notice that the serious question there utility in
some words of the but actually don't see to show opener speech systems so how
do we do the so that my name recognition is difficult to excel in a
because the space is pretty much infinite
so we do a variety of things a dynamic language models
based upon your data
so i mean to do name recognition of your names
the names you talk to you know that you can do but it's when you
have as a generic system
that actually somehow believe so it's
there that's ultimately problem for the i mean we operate typically with a million pockets
in our vocabulary we are going to two million with the song
but still a that is way more the only way to handle this kind of
problem is with
more personal essays so we know about you
so we can do you need to you
i think you