0:00:16okay so the what should follow we should be the up on L on the
0:00:22application the end we should have the selected posters
0:00:25and i as i have found out D V somehow didn't manage to organise the
0:00:29think well so we don't we didn't have an exactly posters so i quickly around
0:00:32and i was searching for the best posters on that would fit the supplication the
0:00:37N actually found that we have them here so i found it best posters all
0:00:40these
0:00:41google new ones
0:00:43microsoft research for next we also i would invite deep
0:00:47of this so these are probably not all sorts of these posters but
0:00:50i would invite some people to this point L so you can discuss the application
0:00:54issues but maybe let me do that way that we i don't in white again
0:00:58the i'm sorry with the database the last
0:01:00speaker a still here so i if i can there i would just invite all
0:01:04day speakers that we had here today to the cd here and the of the
0:01:09people that we see on the on the posters you like somebody from nuance microsoft
0:01:14i dunno we have anybody here if you if you want to join us to
0:01:18you are just our company joint also and
0:01:21can i keep you you're for a little longer so that and then we should
0:01:25we should the
0:01:27well i help that the audience we'll help me to ask be important questions that
0:01:32we can ask the people from industry and
0:01:36a wicked the people that build the application we had several talks about applications
0:01:43do we haven't nikon lever because they here we have you mural source of the
0:01:46people are talking all the people are talking about application because also talking about how
0:01:51to calibrate system that they work for all the operating points and we can use
0:01:55them for all the different applications
0:01:59that's so the first think that i well i want to talk to think that
0:02:01may cost of was the most interesting today
0:02:04i shouldn't have probably all this question about the i mean they my question here
0:02:08will be i did we actually find this a they useful and the real and
0:02:12something from the people that the presented out what present that presently that's some think
0:02:17about what they were common and you do we want to organise such sessions maybe
0:02:23at some
0:02:25other conference do we think that this was actually some review lance anything useful or
0:02:29the what the people at the parallel thing that
0:02:34we should have learned from that more maybe you have now sounds even to
0:02:38to tell us what should have been the take out the message from your talks
0:02:42and again in a short summary and what you think that we should have lunch
0:02:47from your data research you should have planned for you
0:02:59i mean
0:03:00numb
0:03:06very interesting because you kind of to me
0:03:13okay i mean
0:03:16and technology
0:03:18product
0:03:23we had wrote the mean and i think it's
0:03:28one for researchers that are working
0:03:33to be able to
0:03:35explain what we do one shows the importance and ultimately the fact that can
0:03:52and we now we have all these like this talk and we get using of
0:03:56them but i think that have also and
0:04:00did you notice how much they thought they had that's not very right only result
0:04:03we have so much actually for that
0:04:08a better so that you are collecting how much like two thousand
0:04:12hours per second or what was it by our
0:04:17i haven't done in my no i
0:04:22my lack of envelope estimate is
0:04:28but once you once you told me that with a
0:04:37there some speech and six companies that process
0:04:41of thousands of hours of audio for a right you matching although all reported in
0:04:47call centres
0:04:48when you say these all maybe my dream order is always recorded reliability purposes right
0:04:53so that
0:04:58not much of it is processed except for
0:05:00more and more thinking
0:05:03industry companies that are lines we know the mean and you know but that means
0:05:12thing is really
0:05:14for tens of thousands of hours
0:05:17so
0:05:19it sounds like to see
0:05:26really but i i'm never will be well the privacy issues but you might model
0:05:31you really collect something like thousand hours so i
0:05:37our
0:05:39i guess that you could even do the things like in negotiating with your customers
0:05:43that they would be willing to give us one second per hour for free and
0:05:49if you were willing to share that we thought that would be actually now nine
0:05:53thousand hours per year and it would be pretty happy about that so
0:05:58you know this comes out of
0:06:01that problem is that
0:06:03the you got framework
0:06:05the signal and you know many people i don't know if i would like
0:06:10boy samples to be available
0:06:12E it's a lost battle a there's no way that the cost was reworked for
0:06:17no one's for us for whatever is doing a speech at this scale
0:06:22is not in favour i was telling somebody that before that i think that's actually
0:06:29we do collect this initial databases that you know at least in the case of
0:06:33we send people to a country and we collect like a couple hundred hours
0:06:37those are collected with consent from the uses
0:06:42that those databases might be feasible to open sort the problem is that and not
0:06:48sure that the consent agreement that the wording of the consent agreement says that
0:06:54you know the data will be available outside i don't know
0:07:01anybody in the audience any
0:07:03only opening
0:07:06it does help me push them that if they should be possible
0:07:12okay so i think we sort of where you sort of no work we want
0:07:16from you just data
0:07:19and i was curious that mouse sensor sitting on the other side of this terrible
0:07:22what is that you would like to see this community really be working on
0:07:29from your perspective
0:07:35i mean that's a little all the work done on neural networks is great i
0:07:39mean and we have been actively participating in that
0:07:44there's another thing google that
0:07:46just funding we use pen
0:07:51unlike few million dollars evaluating grants many of what many of which are go to
0:07:56places like cmu i don't know you're word about one i know
0:08:00people seem you get them
0:08:03so it's not just
0:08:06the we have they we keep money
0:08:08a
0:08:10a joint here listening to me
0:08:12we might
0:08:13a
0:08:15i'm not sure i will
0:08:18have you know a nystrom suggestions i think of the work that designed a common
0:08:21at least relevant
0:08:22it is true that i
0:08:24the kind of things we care about
0:08:27in more big data and we can also would you so that that's a problem
0:08:34we need to think about some mechanism to
0:08:37to help i mean we have listings likely they'll art n-gram corpora
0:08:44because in all those are wanted to statistics on it is text and its not
0:08:48so
0:08:50subject to all these
0:08:52privacy considerations
0:08:56i think they in a work related to semantic understanding composition systems
0:09:02it's just really want to us
0:09:05i wouldn't call it a universities to send proposals from that area i think that
0:09:10will resonate well
0:09:12they were they working in languages i have to say that
0:09:16we don't feel is that relevant to us because
0:09:19i mean we care about language is that have everything system
0:09:24a lot of the limitations that us are operating are kind of self imposed
0:09:29right we can collect two hundred hours in that we store a lot of the
0:09:33stuff is not available on
0:09:36lexical mean for example that's interesting
0:09:39you know learning pronunciations from data
0:09:42but we have a lot of research in the area to
0:09:47i'm not what does
0:09:49i
0:09:52i have another comment about sharing of data this is not directly relevant for speech
0:09:58recognition but it works for a speaker and also for language recognition
0:10:04so
0:10:07many of you probably already know what the i-vector is you take a whole segment
0:10:11of speech possibly even a few minutes long and
0:10:14you
0:10:16basically trained at that the gmm model to reflect what's happening in the speech and
0:10:22you projects the parameters of the gmm model onto a relatively small vector maybe four
0:10:29hundred six hundred dimensions
0:10:31and
0:10:32that works really well for recognizing languages and speakers so
0:10:38people are or less reluctant to ship data in that form so people will give
0:10:43you
0:10:45that allow you to type of their sites
0:10:47a bunch of i-vectors because you cannot your what is being said
0:10:52so one example is there is currently nasa's has just launched a new
0:10:59speaker recognition evaluation
0:11:01i've made a whole bunch of i-vectors available this is data which that are normally
0:11:06shabbily with the world it's the it's the
0:11:11that's the some ldc data i believe
0:11:13so that a strings attached to the ldc data but they're giving away these i-vectors
0:11:18basically without conditions
0:11:22so
0:11:27i like to implement and a lexus question
0:11:31i think there's actually disconnect between the research and then the in this is going
0:11:38with regards to the applications are actually the driving the speech work might be
0:11:46and most of the in a bigger companies the going off the conversational systems
0:11:53this a design example google now and then a there's a microsoft as experts
0:12:00so what i see even though this is that actually a speech recognition and understanding
0:12:04workshop
0:12:05and that only a handful of papers on understanding and everyone is working on speech
0:12:09recognition
0:12:11that is what you know it's that it's not balanced right now and i look
0:12:16at the em an L P A C L
0:12:19you know who all this at a data model on the theoretical side you know
0:12:23they're not as much since this is a application i see that this is the
0:12:28community we should be investing more because this is the right people but i know
0:12:32we're not doing that
0:12:33and the second piece is there there's search why we observe that expert actually launch
0:12:39the T V signal it's free for natural conversational search in entertainment search you look
0:12:45at the most frequent scabies people are using single bird to word cured is then
0:12:51not really using
0:12:52you can say show me movies with tom hanks from nineteen eighties
0:12:58today don't search even though the system handles it so there is the barium now
0:13:02in a keyword based search and more and alan conversational a typo search and of
0:13:08course the you know a search in keyword search voice search those of the blockers
0:13:13all the priors on people's mine
0:13:15and how are we going to get over this in is the going to take
0:13:18time or what do we need to do about that
0:13:37i will make comment so what on the a question about the amount of the
0:13:41data the latter speaking hit a ball right about the internet there is a lot
0:13:47of data is
0:13:50given that of the proposed to be sure to
0:13:53on the you to one another
0:13:56or close
0:13:58the people are about that this database public we should to find of a how
0:14:02to use this the
0:14:05source
0:14:09i will figure at ibm in your position and i understand the problems of sharing
0:14:14data
0:14:14but
0:14:15and also on the side and apply them are a little bit about
0:14:19problems with models
0:14:21and i must say from my perspective
0:14:23the things that you could do for us
0:14:26is you could share the error analysis of your data
0:14:30now i must so
0:14:33and i can say
0:14:35as strongly as i can
0:14:36i don't know any scientific endeavour
0:14:39the made progress but how big the number of errors
0:14:42that that's that simply counting
0:14:45but i'd analysis of the kind types of errors that you see
0:14:49types of conditions under which those errors happen would be very helpful for the entire
0:14:53community you guys see a tremendous amount of data and i'm sure that you categorise
0:14:58the errors of that data
0:15:00we would love to see the categorisation
0:15:19some jewel if i don't know if it's here
0:15:24he argued earlier that
0:15:27the quality was much more important than quantity of data of that we have the
0:15:31quality guys out there and all that with the back
0:15:35could you argue this is the way
0:15:45i think you need both right
0:15:49and
0:16:05that the long run that's useless
0:16:09activity
0:16:09i wouldn't call it useless
0:16:12but you know then within a willis each team we
0:16:17we have a little bit of these quality because of our acoustic modeling team for
0:16:22the most part they use a annotated data
0:16:26transcribed data while a on my team we don't do it because we have it
0:16:32once in charge of maintaining
0:16:35forty eight languages anything all the training room so
0:16:40so i always argue that
0:16:43some of the techniques that they
0:16:46or improvements that they manage to get my not be
0:16:52translatable to the other situation where you are in a supervised weights all
0:16:58i think realistically
0:17:01i
0:17:02personally i would argue that are unsupervised
0:17:05is the way and i would work only the community
0:17:11could get more and more a research in this area because this is very open
0:17:17we still don't know
0:17:19you talk to people in my children in a about the way we do training
0:17:23and it will be shock
0:17:25like what the herald we have because we're getting i mean you think about it
0:17:28is a lot of all
0:17:31scan all we are right you're using a system and you are using the prophecies
0:17:35tend to train itself
0:17:37a this something bizarre and four and there were a but it works right
0:17:45and if i was
0:17:47trying to organise some a word so but
0:17:52with high i mean we thought about it about this particular topic unsupervised
0:17:55acoustic and language anymore lexical modeling
0:17:59for the next interspeech you know
0:18:01in singapore i just
0:18:03it was a little work on and just lazy but that i would encoded somebody
0:18:07to organise got or so and i will make scroll wheel and help
0:18:13so i
0:18:15should be up there but here
0:18:17tired
0:18:19there is that the elephant in the room
0:18:23we heard a little about it
0:18:25but in the this we used to say that a we're looking for the keys
0:18:29on the white and that's why we use cepstrum
0:18:33and now for doing very well and asr about the real
0:18:39problem is not asr this semantics
0:18:42and that it's not being addressed at all
0:18:45this
0:18:46community supposed to be with you are in the U is very important
0:18:53you wanna get very good the transcribing in a them on the bigger the to
0:18:58transcribe as well as the amount of data that you work training well never be
0:19:03able to be read by anybody you really need to go much further and going
0:19:08to
0:19:09language understanding some sort remember before this becomes
0:19:19so i'd like to follow a primer comment there
0:19:23all of you seen lots of great papers and presentations here at asr you still
0:19:27have to mark to take place a year from now we'll have S L T
0:19:32and like to how and so i'd like to ask if anyone i'm handle here
0:19:38might have some suggestions on your challenges are things that you sign here
0:19:43that might motivated challenge or some type of collaborative effort that it might take things
0:19:49that we've learned from this meeting
0:19:51and maybe try to deal planning for next december
0:19:55to train addressing issues that may come up from this discussion
0:20:08no one says
0:20:14i mean if it's some of the things i mention anything our would be very
0:20:18valuable such as distant
0:20:20speech recognition in fact just being able to recognise that this speaker is too far
0:20:26away let alone correctly recognized what they're saying would be useful i just anything at
0:20:31the relates to finding stuff
0:20:34realising that the speaker is in a sub optimal condition that'll be useful
0:20:47okay
0:20:49ten fifteen years ago when i started of the speech samples lot of work multimodality
0:20:52seems to be
0:20:54totally data
0:20:55heard the word once or twice today
0:20:58is that something that universities could work on the rest of something that you guys
0:21:02of honour
0:21:03drive down with thousands of hours of
0:21:06annotated or unannotated data are as well and we shouldn't even bother to look at
0:21:09it again
0:21:13multimodality use robots or
0:21:15video material
0:21:20i mean we have an application that has video feed constantly on our user and
0:21:25i think that would be useful for us to be able to make use that
0:21:29kind of data
0:21:30to improve speech or any number of other
0:21:34types of inputs from are users
0:21:38that being said we have devices like that now that have a camera aimed at
0:21:41users all the time i don't know that was necessarily true fifteen years ago that
0:21:45was always count
0:21:46now we cameras and microphones carry around in our pockets constantly so
0:21:50from my perspective be lovely the inverse is to solve the problem for me it's
0:21:54like it just take a nice black box employed in a get twenty percent better
0:21:57success and everything
0:21:59that the same time just saying you got thousands of hours of
0:22:02that they know that we won't have
0:22:04also you have ten a hundred grad students i don't have so
0:22:11where
0:22:12maybe not right there but i know there are a lot of grad students at
0:22:15cmu
0:22:17all slave them for you
0:22:24i wasn't to say that i think microsoft has done it very good job with
0:22:27that they can and right
0:22:29where you can capture adjusters
0:22:32i found that really interesting because
0:22:34you know home environment
0:22:37i
0:22:37maybe you can even compensate
0:22:40for everything the recognizer so i personally think is interesting but i would like to
0:22:44you can so as to say
0:22:49so it is also my within that it is connected so it's a device that
0:22:54can be easily used for data collection and the committee gonna buy a voice and
0:23:00the by a human and likes and the like bodies they shows so if the
0:23:06research is very important
0:23:10quicker corporate you know how to or comments
0:23:14if a we're here for actually are why don't have a simple right
0:23:27yes so for our language model training we use
0:23:31a lot of sources as i mentioned
0:23:33i'm one of the sources we use is also the transcriptions of the record
0:23:38after some filtering
0:23:40i actually you do some sort of into voice down
0:23:44a standard place in techniques and you look at which data source contributes the most
0:23:49of the quality of the language model then supervised data source a contributes a lot
0:23:55so we will use
0:24:00not quite there are here for training a company wide or compare from this one
0:24:09from agnitio information silence
0:24:12okay yes we will have access to other are what i call that there are
0:24:18a little information for example whether they use their click on the result meaning they
0:24:24accepted they hypothesis we provide
0:24:28or whether the user to stay in a conversation seems like that
0:24:31a
0:24:33it's can actually this whole thing is surprise to us initially we look at this
0:24:37kind of data and we figured this is going to be great because we will
0:24:41be able to sample
0:24:43from
0:24:44regions in the confidence distribution where the confidence is lower
0:24:50i'm compensate because the user click right basically is telling us
0:24:55we did something right but we haven't seen any improvement i turns out that at
0:25:01least so far that confidence scoring placidly states and things like that works pretty well
0:25:06so i mean it has being a bit of a disappointment to us that this
0:25:09latter signals don't seem to have much
0:25:15thank you
0:25:18the normal
0:25:20questioned the moment let me may be written to D what you were talking about
0:25:23before there was the what's rarities i-vector mentioned so actually what i have seen just
0:25:29during the approach of idiot that you were
0:25:32people working with us
0:25:34from google he can with interesting problem that he wants to train neural network on
0:25:41on i-vectors but since you have you could extract i-vectors from a thousand millions of
0:25:49for of
0:25:51recordings then he could use completely different technique and eventually he was successful for short
0:25:57duration is something that possibly we would be also interested and if you had available
0:26:02though those i-vectors and
0:26:05we could eventually be interested in running something on such data because at the end
0:26:10the only thing that we care about is that the next asr you will be
0:26:13again on some nice sunny place and we need to write paper for that
0:26:17so and so perhaps the components could be more proactive in this sense that you
0:26:23maybe you see this interesting problem so maybe you could think of
0:26:27how to generate something that you can actually share with us which is actually no
0:26:31real value for us in the sense that we could train our system on that
0:26:35but generating these kind of challenges that you give us these i-vectors and just play
0:26:40and whatever you want with that and because this is something that we are interested
0:26:44in
0:26:45in fact we know that such problem would exist for google or we could guess
0:26:49but it wouldn't know what kind of i thought how short segments and
0:26:53what kind of data are you interested in running a language identification that and i
0:26:58guess the similar problem would be even maybe natural language understanding you would have some
0:27:02sparsity problems you could possibly extract something information from the data ensuring with us
0:27:07we can maybe people are not working on such problems because we again we don't
0:27:11have this they also this is so you say that maybe we should sign up
0:27:15for the we should think of some
0:27:18some project that google would be even willing to pay for but maybe people don't
0:27:23even think of such project because they didn't have the initial data play with and
0:27:27then to find that there is actually some interesting problem
0:27:37anybody else's anything close like to
0:27:40i knew that the problem is that you have and then we use a lot
0:27:45number that i think what the locations saying is it's a matter of a mindset
0:27:51then we give an example from my side but not my mindset is the mindset
0:27:55of incorporate department
0:27:57no says that this is the danger and doesn't make compensate analysis it's really important
0:28:02but
0:28:03no need so maybe i should give an example rate so i'm johns hopkins and
0:28:07while i think we a little bit speech and language groups in the movie actually
0:28:10known for the hospital not medical school
0:28:13and that is gobs and gobs of medical data which is similar to extremely valuable
0:28:18and anytime a large medical dataset is collected belief into the work on it they
0:28:23every look for bayes to make it available in other words that tendencies not of
0:28:28the large decrease in the not so that's not bothered about it they were clearly
0:28:31had to figure out how to the an animal i do but anonymized it'd be
0:28:35identified or whatever they call it
0:28:37and so that's and i have guided of saying this data we can get good
0:28:41things out of it but maybe someone out that in the world will get something
0:28:44more out of it so let's see how we can make it available and like
0:28:47and the cosine but speaker id language id dataset like it turned out that given
0:28:52the state-of-the-art it might be enough to give people i-vectors i've seen other examples of
0:28:57this
0:28:57does a lot of jean had a essays and things like that better you take
0:29:01into the be identified and then you give it out so if you started thinking
0:29:05that and start pushing back because he these liars as the same know their first
0:29:10answer little bit no
0:29:11right so it don't take no for an answer
0:29:14and just try to explore what will pass legal master because it is really in
0:29:19that addresses the community to expose students to these kinds of datasets and problems and
0:29:25again of innovative next breakthroughs gonna come from
0:29:28so i think they should satisfy commit yourselves to say
0:29:33let's try and they cannot for example a lot of gaily google in particular there's
0:29:36a big commitment open source
0:29:38and that didn't come about easily i mean you remember the days when companies are
0:29:42the copyright everything in a local used to go out
0:29:45but that change in the same way i think we should actively push
0:29:49these lawyers and say it this is necessary to go
0:29:53i think that is another aspect
0:29:56but it it's definitely as i see your point and at some level i say
0:30:03so there is the legal aspect is that privacy aspect a day
0:30:09the trouble that will
0:30:11goes the perception that all their collecting data privacy these privacy that so
0:30:17there is the public relations aspect this is have to be managed very carefully because
0:30:21you'd only takes and generally saying all goal is collecting data and setting you with
0:30:26everybody
0:30:27analysis us that of that i remember some years ago a well i can't remember
0:30:32quite what they did
0:30:34but we try to italy some chat data and some audible happened then somebody found
0:30:39out something about a woman has a huge P R disaster and things like that
0:30:43make these large scale so you just saw
0:30:46so it's difficult at you know i have to be honest is very difficult to
0:30:51two pass to these
0:30:53all these barriers and then and then the other thing you have to deal with
0:30:57is we executives that sound of then they look at
0:31:02i data as a competitive advantage
0:31:05so
0:31:06it is possible it has been blinded pass like when we will use these n-gram
0:31:11corpus
0:31:13but it requires a lot of work been all non on or been taught
0:31:19a
0:31:20well i during the students here
0:31:22so they can work money or whether you by that fact wanted to spend
0:31:27so what we got with this
0:31:29and
0:31:30it is difficult
0:31:31i know the success stories so i don't live many people know this but and
0:31:35then but we started working on penalty he was at microsoft
0:31:39and microsoft initial reaction was to we can keep it all in house and i
0:31:43believe just like
0:31:44for really heart and that gives jeff created for making sure that kaldi state open
0:31:50source so i didn't know that
0:31:52examples where we have succeeded should try
0:31:59i agree with that i really would look like me to work on child speech
0:32:03and we have a dataset that we've been collecting that we would love to be
0:32:07able to release a the problem we have decide legal is you know word twenty
0:32:12percent company
0:32:14we have a problem like that we're gonna doing
0:32:16that they will just be gone
0:32:19because we get to we're gonna be crushed we have you know you're wanting left
0:32:23and if someone's users because we still their kids voice and then knows what happened
0:32:29i mean we're spurt completely and i think from a cost benefit analysis like that
0:32:35risk is just we to be to take for a company of our size
0:32:39but that doesn't mean that we would not love to have
0:32:42the bright minds in this room around the world working on children speech we think
0:32:46that's a wonderful problem that has
0:32:49interesting and unique issues that are not present an adult speech
0:32:54especially the conversational aspects that you generally don't see very much of a with love
0:32:58to be able to do it
0:33:01getting that
0:33:03if the identification is challenging because the regulation the us that if it has maybe
0:33:08a child's voice on digits personally identifiable there's no way to de identified and still
0:33:13have audio
0:33:15that's challenge
0:33:27and a large amount of data to drive the research i don't remember and i
0:33:32think the this should start with the end the in an S F or darpa
0:33:36red and they should i know create the next babble or something about along the
0:33:42lines almost the model
0:33:44information search using speech as the main interface
0:33:49they should generated data rather than looking up the global or microsoft
0:33:53that won't happen now the thing is that you're to push the envelope so it's
0:33:57i'll give an exact another example the google in the microsoft and gram carb i
0:34:02and show you can harvest trillions of web pages be kind and you say to
0:34:06be very useful so in other words
0:34:08let's start by finding point solutions and hopefully a act in the limit individually the
0:34:14liars we get the message that these kinds of thing okay but i think we
0:34:18really should take an expectation say can we have this problem by giving it can
0:34:22be a that maybe that's way to go
0:34:24so i will say that one there is a will there is a way
0:34:29and
0:34:30corpora
0:34:31the corporations like google and microsoft really are hiding behind the lawyers
0:34:36and i have a very specific case
0:34:39which is in our
0:34:42program
0:34:43to read documents
0:34:45i don't
0:34:46we had made ldc generate data for us and that was good but we know
0:34:52that there would be other phenomena that would happen in the field
0:34:56that happens to their happen to be in a huge collection form that you're as
0:35:02your are core in nineteen ninety three
0:35:05that was actually released totally cleared and released but somehow somebody in the government decide
0:35:12that
0:35:13that it really could not be released and we classify the data put it away
0:35:19however
0:35:21through a lot of paints mostly me and my staff
0:35:25we manage to get that data we were least on the condition
0:35:30and that cost a bit of money that somebody would have to go through all
0:35:34release data and simply remove all the pi a personal information
0:35:40and once that was done we have an incredibly valuable corpus
0:35:46to work with
0:35:47a so
0:35:49it may be able to go over all microsoft
0:35:53amazon facebook a to go through some expense make sure that a the data is
0:35:59cleansed and then release it to the world so i give them the challenge to
0:36:03try to the
0:36:08i just thought of the suggestion
0:36:11that might help with these which would be
0:36:14if it comes from the user
0:36:16let's say that we allow the user to opt in
0:36:20and click as checkbox it says whenever use google voice i actually one these data
0:36:25to be shared with the research community in the same like that there's is on
0:36:30that you can decide whether you wanna be an organ donor right i'll you could
0:36:35and the thing is the new generations
0:36:38are also much more eager to should basically share everything right but i'm sure that
0:36:43the evil it is just one percent of the users would be happy to let
0:36:47that data used for any purpose that would be already you know millions of out
0:36:51of hours
0:36:52and so maybe it's not that far fetched and then there's no issues and so
0:36:56as more and more people quote unquote transparent if you've read the circle for example
0:37:03so it be an easy way to just have this state available and in fact
0:37:09it could even be
0:37:11kind of a requirement to say one donating this speech to well so i wanna
0:37:16actually needed to
0:37:17you know that the whole research community
0:37:25i like to ruin microsoft better wanna donated the work into your sorry a so
0:37:31if i can maybe i can make a know it
0:37:34challenge or something that's a for microsoft and google would you consider maybe you bring
0:37:39in you know some summer internship students and because even if you are to kinda
0:37:45go through megabit same type of data here in setup and work a nice piece
0:37:49they could be shared with the community because
0:37:51even if someone's gonna release in check out that box assembling to really sit there
0:37:56can still be sensitive information in there they do not thinking about when you're actually
0:38:00kind of doing this and so if there some way to kinda have like a
0:38:05litmus test of what
0:38:08constitute something beyond
0:38:10you know what would be publicly you available or something i i'm just trying to
0:38:15identify the space and if it's trays out of that remove it
0:38:19so would you consider supporting a couple of summer internships used to go bill that
0:38:24for the community
0:38:31i wear expect a small startup
0:38:37a i don't know i mean
0:38:40this is not something always can decide
0:38:44you think i have a lot of power i don't
0:38:46a
0:38:50not i and just on a
0:38:54i bring it up but i you know i have low expectations
0:38:58to be on this is a lot of work
0:39:07but with all this talk about data in back to a better they you had
0:39:12mentioned the fifty languages are so you've collected in one week at a time i
0:39:17presume they're sort of the network of contractors out there that are actually doing the
0:39:21crowd sourcing in providing some of the language expertise could you say something about that
0:39:29so
0:39:32when we just of the language is therefore we
0:39:37we basically made a conscious decision to not outsource
0:39:43the whole
0:39:45and for to
0:39:47to work still not companies
0:39:49because
0:39:51we realise it was easier faster for us to do it ourselves
0:39:56so we build this organisation to a lot of data collections and the linguistic annotation
0:40:01so it's a combination of actually so the smallest that is like five
0:40:06people full time
0:40:08and then there is a lot of contractors that we bring cap linguistic teams for
0:40:14three six months
0:40:16we have all the tool infrastructure so they can work remotely
0:40:21and a lot of the work from our stuff is managing this organisation because that
0:40:26at any time that is like a hundred and fifty full timers and it's only
0:40:29a contractors the linguistic annotations
0:40:32and then for some so we
0:40:36consciously made is the system to do it internally to have control of the whole
0:40:39thing so for things that are small annotations that will require
0:40:44to quickly we use that what on teams whistle so it so we have a
0:40:49linguist and they annotators
0:40:50and then when we require a large volume annotations then we use mentors we use
0:40:58a lot of vendors not just one
0:41:01mostly to keep a little bit of competitive person
0:41:04and we force then to use or tools
0:41:07so that the advantage of doing that is that as they're not if they use
0:41:11our tools
0:41:13you know the annotations come into our web tools and
0:41:15in this what based also immediately
0:41:19the comment or system and they we started then to our process
0:41:24but at least at that level you know you sounds like you are i don't
0:41:28i mean i sounds like you are
0:41:31applying a reasonable a lot of
0:41:34of annotation in quality control and is your process isn't all that different from what
0:41:39mary describes with a with the babel program
0:41:42is i mean is that reasonable
0:41:44to say to i mean a lot of this stuff is for testing sets right
0:41:51so it's not necessarily training corpora is mostly testing sets that because of the scale
0:41:55of languages is a lot of late that right evaluate if every quarter you transcribe
0:42:01thirty thousand utterances their language and then you focus on three or four domains
0:42:06but language model for the top the languages you are talking is only about
0:42:11i do not have a million
0:42:14utterances per month been transcribed just for testing purposes
0:42:18so
0:42:20lexicons
0:42:22in something which is we also
0:42:24i mean as i said lexicons is something that
0:42:28probably we need a little bit more work to automate but that the thing also
0:42:32is
0:42:34from the point of view of quality
0:42:36there are things you can the with money or that it is you can do
0:42:39investing a lot of a algorithms
0:42:41and
0:42:43and you know we have okay i want to sound we're more limited in engineers
0:42:47and a speech scientist that in money not as much or something but
0:42:51so it's easier not seriously it's easier for us to spend money and get data
0:42:56transcribed
0:42:57the and
0:42:59to hire are
0:43:00a lot people sometimes
0:43:03so it
0:43:04i all the way it is
0:43:12this conversation because it still staying
0:43:15with all let's get a lot of data
0:43:18and let's get by better asr unit
0:43:21and one of the problems and i saw that in the past
0:43:27one we had lots of computing powers forces people with didn't when you got corrupted
0:43:31by all this data keep working the same paradigms lately have a slight paradigm shift
0:43:37and nobody bothers to
0:43:40so that
0:43:41think
0:43:42come up with new methods of dealing with that
0:43:45and
0:43:46the entire black all of semantics will not be solved in the matter how much
0:43:51data are going to
0:44:00so it's i just the ldc you delete all the database is that we have
0:44:03at the moment and we start from scratch and you're it should start thinking about
0:44:07what kind of data we should actually start collecting now because i think again the
0:44:11data that we have at the moment would be boring would be the same thing
0:44:23so i have one question
0:44:26the biggest part of this community i think is the graduate student
0:44:30or at least part of it and i see that
0:44:36the
0:44:39the work is more is heavily driven by what's happening in the industry there's you
0:44:45know it's very fast but it's very changing
0:44:48and we have and a very good banner good i think
0:44:52do so to tell us what we
0:44:55she wouldn't and worked
0:44:59the
0:45:00university programs where that you could recommend the steps that you good data for
0:45:06i was to so to get up to speed with
0:45:09what's going on
0:45:11but that's my first question
0:45:13and the second question is more to better your presentations very good
0:45:19i just wanted to ask how to do so to scale up from the university
0:45:24to
0:45:26to what it is that you doing so those are two questions thanks
0:45:31let the first one
0:45:33actually going back to the having
0:45:37maybe we should change the way we have no real expecting companies to do stuff
0:45:43for you for us
0:45:46i think this is a large can be the and you know i can collect
0:45:49the type of data that you and need and that crowd sourcing with the people
0:45:56here and there's a logical mean and i know
0:45:59if you look at interspeech i classes on the order of thousands of people one
0:46:04in this community so you know one can develop an application where you can get
0:46:08all the data i would trust sounds you for creation able to as my personal
0:46:12data
0:46:13so that's one layer perhaps getting data and rather then you know who's gonna give
0:46:19me the data can we generate the data
0:46:21and going back to the question as i said i think there's a disconnect real
0:46:27companies are going is you know the they had the data is the most important
0:46:33thing it's not really machine learning or techniques that you're using
0:46:37and they also all the devices to access the they on the have the they
0:46:43on the software they on the data to they want to control how you access
0:46:48just data and speech is the natural user interface one of the modalities that this
0:46:53and they want to control speech that's why you want you know you see apple
0:46:57use amazon microsoft other companies investing heavily in the city a that is a high
0:47:03would you know like to have the students working on and there are challenges
0:47:08and also there's another gap between you know search committee and language understanding speech community
0:47:16the new did action is actually falling in between them that slap scale language understanding
0:47:21and those are the areas i would in intended to focus
0:47:36a so i either a very statistical right is the relation between a because us
0:47:42speech and text to be because we had to domain of for a text processing
0:47:48for data mining cut some sort so we need to get the any data from
0:47:54B C doesn't need to be and i'll people but the analysis of the data
0:47:58and the
0:47:59analysis of correlation between the data those also so we can expect so anything from
0:48:04speech but there is so huge
0:48:07the possibility for the analysis
0:48:12i in this day so but very important topic
0:48:17or about solar this a big data analysis the system so it is here and
0:48:21you can delete
0:48:24there was the other half of the question for but
0:48:28okay to the other half of the questions about how to scale from my university
0:48:32to business that
0:48:35i would say that the
0:48:37the simple also these
0:48:40go outside and ask the user does who really needs we were able to do
0:48:46you use this really neat course if you do you go up to company so
0:48:52that the work so the speech data the data immediately tell you will target difficulties
0:48:58of would be to solve
0:49:01this and the user this companies have money so if you are able to save
0:49:07them some money or vq customers today i the if the money to
0:49:13that it would have anything today
0:49:14i guess that was originally
0:49:16multilingual you
0:49:19the group so that it goes from the university research to
0:49:23kl
0:49:25well i guess that was but a question compare draw originally like what i will
0:49:29go manage to scale up from the university research
0:49:34google
0:49:37the expertise better right now that's a i think everybody came from the induced
0:49:43the seed of this it's team is on industry people
0:49:47i be an identity labs
0:49:50a speech words
0:49:54can i speak
0:49:57so i just had a couple of
0:50:01thoughts about some of the various things are going on first like and i can
0:50:04agree that the connectors been a great resource to people doing multimodal research in universities
0:50:11it's really it's a nice piece of hardware that it's easy to using like gestures
0:50:18of those people in our lab and other places i know are using it
0:50:21as well as sort of or publicly available speech recognizers
0:50:26on the on the issue of the data i think
0:50:31i don't think anything's ever gonna happen of companies that are collecting the data for
0:50:35the reason to have been described all through the years even joe bell labs when
0:50:40they had all the data
0:50:43it wasn't share with the community sometimes these things later in time come out through
0:50:49the ldc
0:50:51but for the various reasons that pedro one others describe for
0:50:58privacy issues and potential competitive issues
0:51:04it's not gonna be really still take students there about the students work on the
0:51:08data as interns
0:51:09but having said that the techniques that they're using
0:51:13it's not impossible to collect data ourselves there are
0:51:18efforts to collect the data from different languages you can go out yourself and make
0:51:24a apps
0:51:24and have people read speech there's mechanisms to crowd sourced annotation if you really want
0:51:30to do that the community could do that we've deployed apps and
0:51:35you're not gonna collect data on the same scale but you can certainly as people
0:51:41said it there's away all you can make it happen so i don't think we
0:51:44should look to that be companies the feeders crumbs we can work on we can
0:51:50if something really important we can go out of the community and make it happen
0:51:55another thing talking about what research should people not of the company to be doing
0:51:59or what should students be looking at
0:52:01joe mention the analogy of she's under the spotlight well publicly available corpora sure they're
0:52:08spotlight some people tend to work on those problems and the problems that companies are
0:52:13working on also tend to be spotlights and you think about that but there's a
0:52:17lot of heart problems out there
0:52:19joe mentioned semantics
0:52:22there are plenty of others that maybe are not commercially viable better are really heart
0:52:29and interesting problem and i think would come back and benefit a more conventional thing
0:52:34so people shouldn't just look at what's out there right now as what they should
0:52:39be working on but think about
0:52:42what are people not working on that are interesting are problems
0:52:47so that's my two cents
0:52:50so what i'd also like the question
0:52:54it does seem to me that industrial research is really development it tends to be
0:52:59in your term
0:53:01and universities should be doing basic research and possibly things they could feed into development
0:53:08type work
0:53:09all i personally think universities and industry have to find a way to partner
0:53:15in order to make sure that there is relevancy in terms of the research but
0:53:19that you don't
0:53:21for the basic research that has to go on at the university level and the
0:53:25question is i think there's attention there data is an aspect of the data certainly
0:53:29does drive problems people will go and participate in and in an open evaluation because
0:53:35of the data the question i have is
0:53:38what do you see is the ideal partnership between you are companies in universities because
0:53:44ideally it shouldn't just a matter of recording there has to be a reason why
0:53:48you wanna come to these conferences
0:53:50and you have a potential to be able to shake that future students the future
0:53:55phd students in a wide variety of countries and it does seem like something along
0:53:59those lines seems an important thing to do
0:54:03so at the content but i also think i would like to hear a little
0:54:07bit about your thoughts about what the ideal partnership might be
0:54:27i think there has to be an incentive
0:54:31there enough problems
0:54:34we had a size team in working in the product group
0:54:38and we another that you know if you are not in research you are not
0:54:42really setting your agenda in terms of the time schedule
0:54:46you have certain deliverables you have a great ideas but you just it's not really
0:54:51the priority because of the next deadline so that as a summer intern that actually
0:54:56lifeline for so we have these great problems we just don't have time to what
0:55:01a hand them and we have the summers to that's working but that's not really
0:55:04the solution the solution is
0:55:06you know the problems of a are all this and the i can then you
0:55:11will be a hand those it's just what is the incentive on the university side
0:55:17then we'll engage them working on these problems to me that is missing
0:55:25and also say that there has been some more shift
0:55:29so that you have been and research
0:55:32when i first started long term research was about fifteen years
0:55:37the a long term research is three years
0:55:40and that's a real problem
0:55:42and
0:55:43to answer your question mary i'm not sure that industry should rivalries
0:55:49i think
0:55:50if the heart problems
0:55:52artifact and possibly solve
0:55:54eventually they'll find their work research if you're wall
0:55:58the industry to do that the research
0:56:01most likely the heart problems will never get done
0:56:07i wasn't to sit
0:56:09but idea where is to
0:56:11and a lot of this in summary than right i mean a
0:56:15induced response or things like a johns hopkins also
0:56:19in this true sense employees there a on the company salary i mean i know
0:56:25everybody that C
0:56:28we sponsored conferences
0:56:30students through some of programs
0:56:34and actually that's an indirect way of influence i think many idea to this is
0:56:38a initiated because of the student grams and they work with rookie or
0:56:43whoever and they say hey that's like this at the end in it might expanded
0:56:49there are university grounds that most companies ones they have a size they used to
0:56:55ready to
0:56:57the research and it is the care about
0:57:00a son not sure
0:57:02there is anything extra to be that and then of course it is that personal
0:57:06connection right
0:57:07a
0:57:09i mean the fact that i'm afraid with some fact that the
0:57:13we definitely it's totally it definitely works
0:57:17so
0:57:18and the coming here i always say that when i come to these conferences
0:57:22but this particular one is a small enough that i can actually see that posters
0:57:26but a larger conferences like i guess for me to value is to two cats
0:57:31a without people in academia and see what they're doing in a dog and drink
0:57:37a beer
0:57:38i sit more kind of informal
0:57:40way of an and sometimes tell than a weather you submit the world around we
0:57:44would be interested in that
0:57:47so as you know the more indeed it was of influence i don't think we
0:57:51need to formalise it
0:57:52so much
0:57:54there i think they have been exceptions where
0:57:57who'll a research lab something created with the sponsors it phone
0:58:03university
0:58:04i from the company i know
0:58:08for example bewilderment set typically small seventy thousand dollars fifty thousand a list but they
0:58:14have been cases where have the million dollars million dollar something given to university
0:58:19to see in you centre
0:58:24so i mean sometimes that happens
0:58:27but that again is not at midas at my little pieces of the security vehicle
0:58:32comes from a some foreign all these guys a then they given half a million
0:58:35dollars
0:58:38so i guess we have done the time that would result for this panel discussion
0:58:42so i we should remember actually the idea that the next i guess maybe there
0:58:47should be special discount for the people that are willing to record a conversation and
0:58:51then we can collect the data and i'm not a of course also the conversation
0:58:55ended maybe there should be special discount for the people that make this conversation at
0:58:59the end of the blanket which would make it
0:59:02would be more difficult condition and i guess that we know should all go and
0:59:06practise for that
0:59:09so let me thank all the all the speakers again