so as i mention the globally for the presentation was to tell you are there

is life young darpa

plaintiff able to it is lots of interesting stuff

some of the things we do in companies like a will is this is very

different the scale is

a unique

and there are different challenges

and i think is it really exciting i'm to be in interspeech

because suddenly finally i speech is becoming useful if you have told me

four years ago and just for us to all that speech was something that will

be really use by mainstream people a i wouldn't believe it i mean and in

fact there are before i can't the well i was oriented my carrier more into

the area of data mining in speech mining because i thought that was the only

way

what is speech could be useful i really did not believing interactivity you know humans

talking to computers

so at the same very surprised and is working

so

to tell you a little bit

about the a little the history of speech i will just kind of interesting

people ask me all the time

the multimodal you once

so

the team is that they're not around two thousand five

and i think we

there what discusses internally whether

which would be a lower am systems or was to a license technology and

the decision was another plane at the same i going on the time was more

in favour of licensing a but my trial a me we were pushing for now

we need to be a lot of stuff so we one and i'm glad we

we post and convince K

but on the other hand that was very see what has these cultures where we

do every we build everything even our own hardware

a on data centre so i wasn't is a decision i two thousand six we

basically a started to build a what at that time wasn't a state-of-the-art system

and we look very lucky because we got people like if you look any

we had a lot of experience building train their infrastructure and all that you can

sell week a lot of impostors bins and

building the colours

language modeling was easy for us because we could leverage all the work from the

translation be

and the three was basically and the challenge for us was to build this infrastructure

on top of a will distributed computing energy so that was really

different

so in two thousand and seven

and these are to the reflex the mindset of the team we build the system

and was called who for one and it was really a directory assistance you will

call the telephone number and then you could ask like a

i

tell me what is the closest based on

a so everything was going through the voice channel

and similarly two thousand and seven we also study the voicemail transcription project because we

had a product called still available we will voice is like a it's like a

telephone numbers assigned to you and that's for working

so a voicemail transcription was also relevant

and they are seven eight a

things are there was a radical change with a

the appearance of a

as much telephones with and rate i mean

but i wait a if you hear me saying describing something in this talk it

doesn't really mean i i'm not meaning to say that we did it before

just see that we really "'cause" in fact you know there is a rich history

of mismatch telephones before nokia probably microsoft and apple but any in any case for

us

that was like a an eye opening experience and we decided to basically the top

any kind of work with the L D was going through the telephone channel and

switch to directly to smart phones and levitated operating system and send everything that channel

in two thousand and nine we also had L

a project to youtube transcription an alignment and actually initially i was working in that

area because that was what i have been doing before

the idea was to facilitate a be the transcription a lot's got some alignment

and that is still around

hi just here has been doing a lot of work in that area rate interesting

stuff

a and in two thousand nine we basically went from voice search another telephones to

dictation

in the keyword you will see sometimes a tiny microphone that allows you to dictate

in two thousand and then we also enabled what we call the intended pi basically

is to allow developers to leverage our servers so they can build speech into that

applications on hundred

and in two thousand ten we started in this part of adding going beyond transcription

into semantics basically the nexus that understanding why from voiced collection

in eleven we went into this top layer was just are bringing search into your

laptop

in two thousand eleven we started this speech project this was then be analysis on

the team is london with us

and in two thousand eleven we study the our language expansion that's something i have

been doing for the last three years tell you a little bit about that

perhaps because it's a little bit relevant to the work on

but there was no pooling

i in two thousand twelve we started in earnest activities in the speaker and language

identification

with the goal of building instead of their system and i think we're pretty much

there

and this year we basically probabilistic a web interface for speech so that

you can basically that is that you can inject

you can call our recognizers from any web page

and in two thousand and thirteen also this here we have a

go on into these and i would really like a little bit about that the

in this transformation of who will from

from the transcription

not will

the speech the role of the speech in globalphone providing transcriptions to basically going into

understanding on the system

people ask me all the time what's how is this it's team organise so i

also wanted to tell you so you don't ask me anymore

so

but i want to say is that we focus exclusively on the speech recognition we

don't focus on semantic understanding or in the be there are other things for that

i mean with a little bit of an emptiness matters to help us put out

a generate that

language models for data mining of is like that

so that the group is organized size headed by difference we can this organising several

subgroups there is the acoustic modeling thing

this is have i mean Q by kenny and you know they basically work on

acoustic model algorithms a bayesian

and robustness

i would say that probably the most research see oriented they really at the edge

of new eigen things all they work in the nns is that in that group

then there's another group which

we call it the language is modeling

acoustic recent interest in name because we would with this team is the result of

merit in the language model in the amount they internationally say something so we in

my school maybe

so it's language languages

and we put a lot of work related to build lexicons a language modeling we

take care of

keeping our acoustic models for asr dimension that a little bit

we develop new languages and we are in terms of improving the quality of our

systems there you bring in the quality of improving you things we also have a

speaker and language are the activities in this you and then this is really large

now is headed by me in new york france what form montague

we must be

once we

thirty probably no

then there is the services group they take care of all the infrastructure you know

the set of errors that are continuously running our products

they take it of deployment scalability

this is the more software engineering activities are core activities in the team and then

we had a platform and the colour team there in terms of new recall that

are worth aims

the colours that can run from the device to be stupid to system

there is activities of eighty two word spotting and speaker that either also

then

we have a large data operations and posting

well

this is the team lattices and of front end that must analysis on the front

really their goal is to the data collections on the patient's transcriptions

i once you start doing so many languages and so many problems you need

that has to be annotated

and under the in of course they need tools so that you see too much

hassle this data

and of course we have at the tts activities

so everybody see the man to be or need in you are

probably rough happen have and then the tts guys that

i think they the makeup of the team is

probably fifty percent so when the nearest

fifty percent a

speech sound there's

lately we have been growing a lot of the so when you need inside i

think we should really grown more noun the speech signs

and in addition to this court thing we have it is to him of linguists

there to spread all over i mean we what we do you we often bring

up a linguistic team in a country like we have for example with the mean

in island

they have been helping us with speech recognition and we just

and but i like to see that everybody code

even some of how a linguist

when i doing well i remember there was this job that even the lawyers up

with what able to write code

a anyway so let me tell you a little bit more about some of the

technologies

you will basically everything we do in the speech team is big data

which is kind of bothers me because i think that it's community has been a

big data for a long time so i don't know why noise fashionable has a

new name but whatever

so i will not tell you about they

the nns because the you know

you know more than me and we have fit into that the topic

you know we have any stupid infrastructure and

and i can tell you takes a week train on the few thousand hours of

data things busloads of make it faster

but there is something that is that a fundamentally different when it comes to acoustic

modeling a well

which is that will not transcribed data

and this my come out a system price but there are reasons for that

the main result is that there is a lot of data to process so i

think

i think you don't have done they compilations back of envelope i think everyday our

servers process from five to ten years of for more or less way sounds like

a lot but you talk to people like marcel who has a background in and

a data centre

you know what they process telephone calls and that

it's not that much actually they probably more

but when you have so much data is

it's like drinking from the five or so we have this humid shallow

so you have the speech file course for window that you and you have to

figure out how to get something out of it right because this a lot of

data

so

and this to you just an idea

we break down our languages into T O one language is the important ones that

one's identity traffic and then the tear to

and then there is that you're three like icelandic and still generate a little later

but even

they you know that you're one his unit and then it more like thousand hours

per day of traffic

a problem only depends on the language

but even they entity or two

the ones that a little bit less important or that we launched recently like vietnamese

for all ukrainian basically an eight hundred hours per day

so

so for us a i was thinking about that yesterday

our and the results probably is not the data a like a lot of the

word about the sponsoring our and the results probably use the people to look at

all these data coming from

so

so really what we try to do is we try to automate the held out

of it as much as we can i personally this like

to have any engineer looking at the language for more than three months because it's

not scalable

i prefer that we come up with the solution and we put into what automatic

infrastructure

so

in our team we have been investing a lot of what we call our expressed

pipeline

are you wasting

maybe around in that it for

and what the steaming rephrase

so

basically any way to comes into voice search is log like what is time for

spoken

so we keep you know what looks

a lots of late that's i mentioned from five to ten years of all you

per they

and you want like well

unlike in a typical acoustic modeling training set up you know the one you learning

the school

we don't transcribing because it's impossible there's no way to transcribe all that

and then follow the traditional approach was what you supervised training

so what we do is we

basically look at the transcription produced by the collection and you

and then we must as that they'd that's what's as we can trying to decide

when can we trust transcription when we cannot

we do a lot of data selection

it's a lot of work into that

apply combinations of statistics are

confidence scores

and then we train

so

that's the image would like to use the audio comes from applications goal was in

this you know that the transcription is provided to the user or the developer we

longer

and then goes into our or infrastructure what we must us the data extensively

and actually scoreboard

so this is what we call

all the other a desperation

and one of the things we have been looking at

so i think is interesting use

how do you sampled from this

ten years of data that you get that they what is that you apply

the you try to random selection

do you being the data according to confidence scores and then you try to select

particular mean right because the data you a organiser data according to confidence

you might be tempted to use the data that is

i guess what are computed

but then you could argue that you're not learning anything new right because if the

recognizer is correct is not much to learn

so you go a little bit deeper into the conference so select

a phrase is utterances that you tries but not too much

it's not obvious what do

it's and i think guys a very active area of work for us

the teletext and we look for example we do a lot of what we call

distribution flattening

so

and with this for two reasons one reason we do it is to

increase the type of coverage

for example i can tell you we discover this problem where somebody was

asking for weather in so it

well enough and

the recognizer was failing all the time and when we investigated we discover that particular

five triphone

score but it was not you know what or so

so that are good reasons to really not train your systems can model the head

of the distribution when it comes triphones or what is but the flooding

in other is on is that you have to be careful because on what is

a very popular for example in korea

you just select high confidence scores and select the utterances by that

a ten or fifteen percent of your core right is going to be composed of

three queries one is a down

but of research and using a clinic or you are not available

last one is bound

abilities forgotten

so i you are building a if you are not careful you're building a it's

three word recognizer

so you really need to flooding

not trust

the distribution completely

but of course a unsupervised training is landers right and you probably have heard a

stories about it i know people in apple have worked with these

this is what we call the P problem so you hear sound clear

talk about the P problem i would tell you what it is

so the history is that we launched accordion system

and we start to collect topic complex and we're going to into

retraining with this unsupervised approach

and of course

when you have someone state that you look at the you look at the locks

right you just push it to the system but at some point with that look

you know the logs like

we notice that thirty percent of the topic

is a they talk and B

where a little bit mystified way that so we listen to the data and we

note is that when this wayne or of talked about the effect

or

goddess passing by

the recognizer is maxim that we that okay P

you know what sounds like that which our lexicon will provide a transcripts phonetic sequence

like

so it's a it's a possible so it matches explicit noise

and we will do it with high confidence

so

it is that's hypothesize in that it that the P R talk in becomes like

i think it starts to capture more

so

so this is the P probably on

are you have to be vigilant for

i we have observed that every language seems to have a P talking

so

we have found for example while we have P talk and in other languages

a feasible

sometimes it

sometimes is a like a sequence of can sometimes right because they kind of matched

noise about of the times is something else

and there are some but examples here

so

so you know we deal with this in many ways

some of the phase we do it for example in terms of transcription flattening or

triphone flattening

those help a lot

they start to filter out these people transcriptions

but another simple thing we will use we have at this is said that contains

a lot of noise

cars passing by an air blowing to the microphone and we always i evaluate what

we call our project set so

if the rejects set us that's providing a lot of those transcriptions then

it's really nice because you number one identify what that of the new P tokens

and you can filter again slows

we also model these noise explicitly we have noise models to capture this kind of

problems and get it get rid of it in the transcription

from time to time i think when you only the one supervised transcription there is

the danger that the system is going to going to sound court in their behaviour

so from time to time is not a well yet retranscribe corpora

a really use a new model so you kind of remove it from this but

corner cases and then study and

but at the same this is unsupervised this every active area for us

and that a lot of very interesting problems to be with

so given all the so we stick some safeguards

we basically select

something like

four thousand hours also

and we retrain our acoustic models and we tend to do this

we started doing every six months now we i think we have a monthly cycle

i don't hoping we get into it to week cycle so every two weeks our

acoustic models are retrained

and there is on that are two reasons for that

one reason is that we need to track the change in fleet of hundred devices

on unlike able

that are many telephones with different hardware different microphone configurations so it's important to track

those changes and every week there is a new model

so you need to track that

i think you also want to change to track user behavior a new ways for

example there is

that are different uses of our system initially it was base or queries

now that are longer what is more conversational and you know the

the acoustic change a little bit so we want to track goes

and it does still

discontiguous acoustic model training

basically allow us to not only to track but to improve the performance and indeed

in this particular about that are more things bigger acoustic model in that are we

actually tracking but

but it does help

i also have to say that

they repress pipeline

i mean and talking mostly about acoustic models

but we also use email ideas for language modeling

for pronunciation learning

so that i think we do is we obviously our work with acoustic modelling thing

so whenever there are based practises new maybe yes

and i say set for some recently really like to walk to prototype on icelandic

well matched me white

so whenever they discover something that works really when a nice in icelandic we bring

it into our pipeline and we basically we track the work of the acoustic modeling

thing

something works well

and we try to encode these into amassing work flow

that as i said every two weeks it does everything trains

you'd evaluates with a testing sets

i would tell you a little bit more later about our matrix

and the other thing it does is that it actually creates this is this is

really need it creates a change least basically telling you okay this is all i

change and ready after it doesn't evaluation so

you do like the model the only thing you want to say yes i like

this multiple simple X and

so we still have a human same yes or no

we could train a neural network to do that for us i guess

and another thing we had we now is to we have been thinking of how

can we improve these even more

so we have been that following another thing we check all the that they will

also help approach

a show you why they

maybe of the david hassle of approach is that you have a very good looking

i'm fast

i will think which is the production system

maybe it's a little bit done based would look you know fast

and then you have it really is mar i will thing where you know it's

which is likely the computed in the U Cs

and they the here is that we can

re process a lot of our laws we reach acoustic models

we treated language models with deeper means things you put in doing products because the

system will be real time

and the goal is that we can read instead of taking just a transcriptions from

the collection system

re process all you

and if we do that we immediately see reductions in the transcription error rate of

ten percent

and then you can understand that sticks select data

retrained acoustic model

and actually one of the things we're starting to do is to

have

products and acoustic models and really reach acoustic models

was only porpoises to a preprocessing data that they might be a slower

and you can it that it is probably applies we had in this process right

now

and the aim is that through

all these tricks we can read reviews the error rate on our transcriptions hundred training

and again the goal is really not to transcribe we can avoid

and as i said similar ideas are used in language modeling

and in pronunciation modeling

i where we try to learn from all the

let me tell you a little bit about the medics

because surprisingly whatever rate is not

the only thing with the

voice search basically exceed it's a similar behavior to search you know it's a

he said distribution with it really long tail

if the only thing you do is a major the task the word error rate

on the desk instead you transcribe like months ago or two months ago

there are several problems for that one is that most likely you're going to be

majoring have of the distribution

the talking as like face things like that

so i mean after a while you look very well on the common tokens

but you really care about the tape those tokens that you're three times today how

well you know

i'm and what test sets don't over the

i is not also practical to transcribe every single day i know optimal loving but

not possible

and the queries are changing

it's a evolving all the time whatever and testing said that was really one month

ago might not be

but i don't know

and you know even the best on speech transcribers there is only time between in

the data packing it's in the you get in so you can use of those

among

so we used to a

identity matrix i mean

we still use whatever rate

but we also used to alternative metrics

one is that what we call side by side testing so the idea is that

you just want to measure the difference between two systems the product some system and

a candidate system the candidate system could be a system that has a new acoustic

model

or a new language model or any pronunciation lexicon or a combination of the three

whatever it is

and what we do is we basically look at the

we select like

i'll thousand utterances or three thousand utterances from yesterday

we have the transcriptions that the collection system gave us

and then we re process those with the new candidate system

we look at the differences i mean if a hypothesis out the same in but

we don't care

i don't the differences we do in a B S and we'll has had the

search thing has had from many years

any infrastructure think of it like a small

mechanical turk with a user's distributed everywhere in the world

which are pretty familiar and the only thing you really as they miss listen to

the leon tell us which one is

more closely matches the only

so this is bases very fast to do

so you can you can get resulting in a couple of our sometimes less

the other thing with the which is i think is even more interesting is what

we call like experiments

so that he has that you have a new candidate system

and that you feel is pretty good

and we need policing deploy this system into products and it it's that's taking a

little bit of one percent ten percent

and

and we basically track if U matrix that beset in the deck metrics like the

creek the click through rates

whether the users are picking on the result more or not

whether the users are corrected by hand out the transcription we provide

whether they use it stays with days the application or

also way

and you know of course there is a lot of a statistic out processing to

understand when the results of signal if you can i when they're not

but this is just really useful because

it allow us to

believe a systems quickly before we increase the topic to

one two percent

and of course that i think is that the user that's even know that he's

been subject experiment

so our kinds of metrics used are

this is kind of hundred related to how is this system doing

that i think is growing a lot basically in the last

but in the last three years

it seems like it doubles every six months or what i think i don't see

i don't we don't see the train went down

i mean that our results for that is not just that speech is becoming more

use what is that we have been adding more languages

their role of english in our applications had has been diminishing used to be of

course the so that i think now is a little bit less than fifty percent

and the top ten longing non you were single is languages they generated now more

than fifty percent of our topic

and the other thirty seven they didn't less

but

but what we actually have seen in the past is that once the quality of

a language improves

a things begin to show that

another thing that is very interesting is there

the percentage of what is where there has being a semantic interpretation instead of just

task transcription we are providing

we have parts in the output is increasing and i think this is only for

english is beginning to be around twenty percent of the time we act on the

query

we parts

we understand what it's S so these are some graphics

here is for example if we attack an improvement in france

where you know we keep adding the amount of things better pronunciations lighter acoustic models

at own language model

a language model trained with clean data

we do a lot of data much hassle of

our queries before we build language models for but we happen to tell you about

that

a larger language model so on and so forth so this is a continuous process

limited a little bit about the language is therefore i thought

might be relevant to the aliens

and also having working on that for the well

so in two thousand eleven we decided to focus on you languages

i mean was a necessity because hundred was becoming global and we need to bring

body as much as we could

so we went through the initial analysis and like many of you out we went

to at the lower bound we got these or some pictures of all languages and

you know

organising activities some level a real cool

and then you look at the statistics like everybody else you see one seven that's

all languages

and so forth families of languages and

and then you look at the statistics

six percent of so it's a spoken by

more than six percent of the languages are spoken by more than a million people

and a six percent of spoken by less than a hundred people so probably will

not bother with those

and then you call with adults we can't and more or less

you basically cover

ninety nine percent of population so

we internally keep talking about that our goal is to build voice search in the

top three hundred languages i think

it's a good selling point it's a good sounding bit

i think in reality

probably after we reach at which are they

we at that's a lot

we brought it would have to rethink what to do with the next ones

"'cause" that there are many sentiment for

the only two languages are there is no where is nothing to search so

but you could argue that it's a it's a look back right when you have

you have a speech recognition technology maybe you facilitate the recent content so we need

to break this

this loop somehow

well that is we are very problem is our approach to rapid prototyping of languages

i think rather than an algorithmic approach which is

that would have been

i would in itself

a thinking we decided to take a more process approach focus on process

we basically focus on solving the two main problems

which obviously may i here this week we need you want to which is how

the held way get data

so we basically spent a bit of time developing tools to collect data very quickly

very efficiently

other hand

we build software tools that run on the telephones that allow asked to send a

team can collect data in a week a two hundred hours of data

we also be of the lot of webpage a web based tools to go annotations

so then result is that in three years we collected more than sixty language is

around sixty languages and at any time we have teams in collections so right now

we have teams see

more we're planning at and also be what is an outspoken and all that stuff

it's a so

we're starting in it for indian languages

we

and farsi we're going to collect in L A

so little bit is still you need a

so this is how a lot of data collection application looked is called data how

and there is actually an open-source where some of this that

idiom button are

put together with the ground so i think what you to talk to him

because you can also do it

and this is how our web based to for annotations look like

this is the tool we use with our vendors

with our on linguistic teams festivity worldwide so they can give us and i think

this is a phonetic transcription to a task

or they can do for example this is opinions selection task

or they can do a transcription task

for test set something like

and

you know in

just to talk about more a bit more about rapid prototyping

for lexicons i think lexicons is an area where we're still do not as fast

as i like

a lot of our lexicons are base which is good because now that we have

trained linguists they can probably put together to a lexicon

for set for four languages about regular

i like sponges or swahili in it they most likely

i'm for low that are

more difficult then we

we thereby cumbersome lexicon support we collect a seed corpora with our tools and then

with changing to be

for language modeling that has never been a problem for us because we just mine

web page is a we have the advantage of the what it's doing what people

search in the particular language that's very useful

of course every language has its no one says and you know whether it's a

segmentation what don modeling or inflection we and that building a tool for any language

like we have been working on inflection morning for us and

but the boosting is a once you build the tools you can deploy the for

any language so i'm hoping at some point we ran out of we're

linguistic phenomena

and we have to swallow and

lots of data must aztec civilisations that we about having place one problem you have

is mixed languages in the queries

so you have to classify or something like that

but the process is pretty automatic and

a for acoustic modeling is the most automatic thing once you have the data rather

you push the button on

and typically and they later you have a neural network training

and the way we develop a language is now is

we basically have a date operations the in domain this data collection some we a

lot of preparation and then we made in these we call it works on languages

we meet for a week in the room

and we typically have a success rate of fifty or seventy percent

meaning that in a we get a system that is but lacks and very

reluctantly forest means and the right of

around ten percent

and some languages are quite a little bit more work and you know six months

later we go back to the

so we have been lots in languages at an average of

four five last year we more we like the thing

very so this is their language coverage we have

a

so you can i mean he's forty eight in production

somebody asked me that they why we have basque coliseum

and gotten and spanish

you can figure it out

we have all these languages in preparation time i might maybe in the i'm hiding

the innocence was clearly

and that's a set our teams are collecting more data so we will keep going

an interesting is that we

we have gone into that languages i think we still have leading

although we run into well also delayed

and we had bill imaginary languages so this is my challenge for there are private

lessons

see you can tell me what language is this

let me see

somebody downloading a movie

i can try to do

you know if not outright in

okay we'll i think will try sometime today

i wanted to briefly mention atis and running out of time

basically all these languages are available into a P ice one is a the under

api this is a pointer just look for

speech hundred

and there's also a web api disobey simple api used in the way from we

give you the transcripts and we're thinking about in reading then a little bit more

but a lot of developers have been building a

applications on top of recipients and i think for as a P this create yes

of course is pretty

there are very important because

for two reasons when we launched a new language data really provide us with more

data and at the beginning more latest book

and i think it also exposes users and developers to the idea that hey i

can bill applications with the speech recognition and this is good for us

recent which is

sometimes are

the developer some faster in doing things that are useful like for example when we

started working on will not which is a large

kind of semantic assist and system our

we didn't have data but because we have this api and developers have been building

cd like applications in under four years we could leverage that data and you was

really good semantic annotations

and just to finish the little bit

i think we are now in

in the middle of this big transducer in the speech recognition at least within from

transcription two or more conversational interface

and you know there's all these new features that i did this is not speech

is in the be done by other teams

but you know seems like a core reference resolution so it becomes more conversational

a problem resolution weighted refinements my voice

a more to come

they make the application will be more interesting and

we really i think that the company

is in the middle of this transformation where will goes from these white box where

you type

into one of an assistant where you

you talk you engaged in a conversation

like to think of who a list

trying to become like bachelor you can talk to

on you know that changes everything these long term be single based on the computer

what i hope with

a little bit better personality but

that's a little bit where R

where we are trying to go this

pervasive

role of a speech not only not you're under telephone but on your that's your

car

in your appliances of home

you know assistant that

even makes access to information which is what mobile is about leaving easier and less

intimidating for many years as

so

the aim is to have a

and this is related to speech technologies are not only the microphone here about various

microphones are always on always listening to you we have maybe steps with this thing

call okay well

a signal less

so you can talk to your device get home talk to your refrigerator whatever it

is i know it's about the conversation

predicated not so what you

about your data

with really high quality speech recognition we try to get the time better

and so on and so forth and really conversation

so that was just want to tell you

the questions with a little bit late but

a

i don't becomes R

we with

it is concerned i think there are four

collect as much data and it was really a philosophical choice

to spend more money on more careful annotations especially for translation

where we actually did not sell

it's not always the body it's the call

first the common

many students are university use

google transcriptions part of projects actually works very nicely

it's been a great

source

work but i have one question which is

you cannot recognise my name

why

i is that you know we have a

in intra lingual engineering this thing we call yellow's which is

when we identify problem and we get together in a room and we don't stop

and in resolving

so named recognition was identify in july

and i actually i wasn't in that it for

we came up with the solution so it's been deployed a like

to they actually products

well chuck tomorrow

no but at the know this you notice that the serious question there utility in

some words of the but actually don't see to show opener speech systems so how

do we do the so that my name recognition is difficult to excel in a

because the space is pretty much infinite

so we do a variety of things a dynamic language models

based upon your data

so i mean to do name recognition of your names

the names you talk to you know that you can do but it's when you

have as a generic system

that actually somehow believe so it's

there that's ultimately problem for the i mean we operate typically with a million pockets

in our vocabulary we are going to two million with the song

but still a that is way more the only way to handle this kind of

problem is with

more personal essays so we know about you

so we can do you need to you

i think you