0:00:15 | so i mean i don't have this is all one solely basically let's go back |
---|
0:00:19 | to the they to the first day all the way and what is that you |
---|
0:00:24 | didn't ask what is it whatever it is that you didn't say |
---|
0:00:28 | you sort of each that maybe you said |
---|
0:00:31 | that now is a good opportunity because it happens very often and their questions |
---|
0:00:36 | but of comes out of don't know how fast but then you come home and |
---|
0:00:41 | say all i wish that be so this so alex definitely yes it is one |
---|
0:00:46 | thing to say |
---|
0:00:49 | sorry for saying something but |
---|
0:00:55 | well i'm close the circle of the |
---|
0:00:58 | basically at the beginning of the meeting we |
---|
0:01:01 | learned a lot |
---|
0:01:03 | well it's i want a lot about what happens and people's brains and so forth |
---|
0:01:09 | that's |
---|
0:01:11 | i think that something systems prove that we can understand language |
---|
0:01:19 | well i thought that's my |
---|
0:01:23 | but it here the and you're basically tell us all that stuff but have read |
---|
0:01:29 | out from between right after that talk i'm right now is basically you know probably |
---|
0:01:34 | the wrong idea |
---|
0:01:35 | so |
---|
0:01:36 | are you buying well whatever you why are we do we need to learn something |
---|
0:01:41 | from those are |
---|
0:01:43 | and how can we do that |
---|
0:01:45 | that there are so real sounds so you have two choices a one choice you |
---|
0:01:52 | have it is about the models used all or something probably existence proof |
---|
0:01:57 | but |
---|
0:01:58 | the other is the think about computation |
---|
0:02:01 | and build a model that way |
---|
0:02:02 | models don't have to be the say so there's certainly used to pass |
---|
0:02:07 | and i think learning for hence is a very bright |
---|
0:02:12 | i pay some attention to that |
---|
0:02:15 | recipes to which a lot a lot from the altar system |
---|
0:02:18 | to the extent to understand it and we understand part of |
---|
0:02:23 | so i think there's two avenues of information to mine once |
---|
0:02:30 | physiology the others computation |
---|
0:02:33 | but still there's also the model suggests also so what kind of the evidence we |
---|
0:02:39 | can take advantage of |
---|
0:02:41 | so i have a couple sense for this question too so you know regarding i |
---|
0:02:47 | don't and tuesday the low resource data and we had some zero-resourced arts and that |
---|
0:02:51 | sort of right most work and it turns out when you actually start removing supervision |
---|
0:02:57 | from the system |
---|
0:02:58 | the things that actually allow you to discover units as a speech automatically are not |
---|
0:03:04 | the same features that we use for supervised process is not the same models that |
---|
0:03:08 | we use this process |
---|
0:03:10 | and so somehow |
---|
0:03:12 | it is the case that i think well a lot of people might not be |
---|
0:03:15 | interested in sort of that extreme of research because it might not always be practical |
---|
0:03:19 | from one you don't insist that you can sell and so forth i think that |
---|
0:03:23 | style of work where you're forced to sort of connect yourself to something like for |
---|
0:03:28 | example of the and many was talking about with human language acquisition and make something |
---|
0:03:31 | consisting |
---|
0:03:32 | between those things can send you to new classes and models and you representations that |
---|
0:03:37 | you're forced into that i think could eventually be that can be fed back into |
---|
0:03:42 | the supervised case for forgetting |
---|
0:03:49 | i'm glad that you go also back to the early days like monday and tuesday |
---|
0:03:53 | be more skillful of optimism and not for the two for thursday where we all |
---|
0:03:59 | i like of the coach our |
---|
0:04:01 | i like to remind you that indeed i think that a community is diving into |
---|
0:04:07 | new types of models |
---|
0:04:09 | well below for worse because of course always then you start some new paradigm everybody |
---|
0:04:14 | chime suddenly turned quickly you may get also discouraged but additionally these nonlinear systems each |
---|
0:04:21 | of these scores neural networks or something they are very good in being able to |
---|
0:04:26 | construct all kinds of |
---|
0:04:29 | architectures highly parallel architectures |
---|
0:04:32 | we have to think about the new a select up think the models the maximum |
---|
0:04:36 | likelihood is gone right and this ordinance along i think there is a plenty of |
---|
0:04:41 | work to do that if i may speak for myself i i'm big deal believer |
---|
0:04:47 | in highly parallel assistance a layer |
---|
0:04:50 | there is a many the use of speech being provided |
---|
0:04:54 | and then the big issue is how do you pick up the most appropriate you |
---|
0:05:00 | which is which might be appropriate for the situation so adaptation not abide by adapting |
---|
0:05:06 | the parameters of the model by adapting like picking up the right processing stream very |
---|
0:05:12 | much along the lines i was quite impressed |
---|
0:05:14 | what chris while it was telling us that when he added a lot of noise |
---|
0:05:18 | of course many few euros where good but the ones who were what were still |
---|
0:05:24 | a very good so essentially my purse i'm speaking for myself no my view is |
---|
0:05:29 | like that system should be highly parallel |
---|
0:05:33 | the trained on the whatever data are available but not like one global model on |
---|
0:05:39 | many parallel models |
---|
0:05:41 | and it is possible different and independent models and then the big issue is that |
---|
0:05:45 | you pick up a good one so this is that one direction i'm thinking about |
---|
0:05:49 | i don't know what other people think about it |
---|
0:05:52 | but i think that there is a whole i think that whole new a whole |
---|
0:05:56 | new area of research is and whole possibility for new paradigms is coming |
---|
0:06:02 | i mean that's what we see all the past few years with the re you |
---|
0:06:06 | re invention or rediscovery of alternatives to gmm models |
---|
0:06:14 | i didn't mean to speak i mean i just one mean and give you some |
---|
0:06:18 | space for thinking what you want to sail you want to ask |
---|
0:06:33 | so i would just like to |
---|
0:06:35 | pos ask a question about the possible eventual test of the field in a feature |
---|
0:06:42 | so it happens i mean i'm not old enough to see this but for example |
---|
0:06:45 | for |
---|
0:06:46 | for coding it happened it after their strong technology transfer understand much more established |
---|
0:06:53 | the research fields |
---|
0:06:54 | i it didn't die |
---|
0:06:56 | trips freak terribly right |
---|
0:06:58 | and this will happen one day with automatic speech recognition |
---|
0:07:00 | we have some stop these methods and then |
---|
0:07:03 | they won't be that much things to research i this is going to happens some |
---|
0:07:07 | are applied |
---|
0:07:08 | and i was wondering how much time do we have |
---|
0:07:11 | because |
---|
0:07:12 | we are already seen a very strong twenty times try and there's a lot of |
---|
0:07:16 | investment by |
---|
0:07:17 | all the major |
---|
0:07:19 | technology by using the market |
---|
0:07:21 | so are we close to really sorting is not i don't mean sorting semantic context |
---|
0:07:26 | that's not condition |
---|
0:07:28 | but are we close to |
---|
0:07:31 | study some standards |
---|
0:07:32 | and then is done |
---|
0:07:33 | because what i we got the research on |
---|
0:07:35 | how close are we |
---|
0:07:37 | and years twenty years because the but my carrier right maybe it's side effect |
---|
0:07:48 | i've life yes for the |
---|
0:07:51 | i |
---|
0:07:54 | i think i people that |
---|
0:07:57 | that's good |
---|
0:08:00 | this is average spectral for your funding sources |
---|
0:08:06 | it's a can be all close hope that there is going to do and we |
---|
0:08:10 | will that |
---|
0:08:11 | stick i think i tell my students i come i still is that they are |
---|
0:08:15 | they getting the speech recognition they are safe for life that this was my experience |
---|
0:08:24 | somehow i think |
---|
0:08:26 | comparing speech coding to |
---|
0:08:29 | speech recognition just doesn't fly at all |
---|
0:08:32 | i mean speech coding |
---|
0:08:35 | unless you're going to try for their |
---|
0:08:39 | utopia of three hundred bits per second which does then requires synthesis coding |
---|
0:08:45 | there's just no comparison |
---|
0:08:48 | very straightforward and eventually yes |
---|
0:08:52 | standards with set |
---|
0:08:54 | the field i same |
---|
0:08:58 | could be |
---|
0:08:58 | to about a coding of pictures |
---|
0:09:01 | very trivial to cover pictures |
---|
0:09:04 | we have an impact three impact for |
---|
0:09:06 | it's all done |
---|
0:09:10 | picture understanding which is very much like |
---|
0:09:14 | is the thing |
---|
0:09:16 | sort of book |
---|
0:09:19 | i do think |
---|
0:09:20 | that to |
---|
0:09:21 | the feel this very far from that |
---|
0:09:26 | but i think the field |
---|
0:09:28 | will kill it |
---|
0:09:30 | if it assumes that it as the solutions |
---|
0:09:33 | and then continue |
---|
0:09:35 | to plough through just working the solutions that we have right now |
---|
0:09:41 | all done so one other thing that i would probably a |
---|
0:09:47 | like to see happen is are |
---|
0:09:51 | rather than sitting around and talking about what's wrong with the field |
---|
0:09:56 | is possibly construct certain experiments |
---|
0:10:01 | that could point |
---|
0:10:03 | to what's going on |
---|
0:10:07 | just |
---|
0:10:09 | for example when steve was storing before |
---|
0:10:13 | i was thinking |
---|
0:10:15 | so you have a mismatch in acoustics and you have a mismatch and language |
---|
0:10:19 | try to fix one without the other |
---|
0:10:22 | and C |
---|
0:10:23 | what is the result where it falls |
---|
0:10:29 | so i think it's a wonderful want to remind people jump ears was advising us |
---|
0:10:35 | to design a clear experiments with the answers |
---|
0:10:40 | so that science can of speech can grow steadily step by step |
---|
0:10:46 | rather than the rapture for computers and unproven theories |
---|
0:10:52 | i have are |
---|
0:10:54 | maybe a couple happens observations |
---|
0:10:56 | we talk about neural nets |
---|
0:10:58 | right now as an improvement and i'm sure it's obviously an improvement |
---|
0:11:03 | it actually goes in the opposite direction |
---|
0:11:08 | what we're all advising ourselves to do that is it does nothing about any independence |
---|
0:11:13 | assumption it's just building a better gmm which is the place where you said that |
---|
0:11:17 | wasn't a problem |
---|
0:11:18 | it's not modeling dependence |
---|
0:11:19 | except to the extent that we model longer feature sequences which we tried to do |
---|
0:11:25 | with the gmms also |
---|
0:11:28 | in terms of |
---|
0:11:30 | where we will you know when will we sell but obviously not five years but |
---|
0:11:35 | that doesn't mean ever |
---|
0:11:38 | so it would be nice if we could come up with the right model obviously |
---|
0:11:40 | that would be the best answer |
---|
0:11:42 | i'm not sure that |
---|
0:11:45 | speech coding and image coding i don't believe they were saw by coming up with |
---|
0:11:51 | the right answer i think they were saul by coming up with good enough |
---|
0:11:56 | answers that |
---|
0:11:59 | wouldn't have been practical |
---|
0:12:02 | twenty five years ago because the computing was not enough to |
---|
0:12:06 | implement those solutions but they are now |
---|
0:12:09 | and so those |
---|
0:12:11 | fairly simple fairly brute force |
---|
0:12:15 | expensive methods now we're practical and work just well enough |
---|
0:12:19 | so i think speech recognition could go the same way it doesn't you know it |
---|
0:12:23 | could i if we if someone is very smart pick the right answer that's great |
---|
0:12:27 | but if you |
---|
0:12:30 | look at how much we've improved over say the last twenty five to fifty years |
---|
0:12:36 | there's been a big improvement |
---|
0:12:40 | say and twenty five years |
---|
0:12:43 | and if you imagine the improvement from twenty five years to now ago to now |
---|
0:12:49 | maybe two more times |
---|
0:12:51 | and the so this is next you know grows exponentially so fifty years from now |
---|
0:12:55 | i think we could say with almost absolute certainty |
---|
0:13:00 | speech recognition will be completely cell to all intents and purposes that is it'll work |
---|
0:13:06 | for all the things you want to do little work very well it'll be fast |
---|
0:13:09 | it'll be cheap there will be no more research in it |
---|
0:13:13 | because you will have |
---|
0:13:16 | computers with |
---|
0:13:18 | i don't know what the right term is but change of the ninth |
---|
0:13:21 | memory and computation where you know ten to the fifteenth computation and you'll have modeled |
---|
0:13:29 | all those differences |
---|
0:13:31 | by brute force it won't it still would never work to train on one thing |
---|
0:13:38 | and then tested another but you want have to you will have trained on everything |
---|
0:13:43 | you know you will of trained on samples of everything so that it just works |
---|
0:13:47 | so |
---|
0:13:49 | the doom and gloom doesn't have to work that way it would just be nicer |
---|
0:13:52 | to find a more elegant solution sooner |
---|
0:13:55 | bcmvn this is also positive value there is a just for fast |
---|
0:14:00 | i don't know nine is probably this probably few more data people in this room |
---|
0:14:04 | this is a actually would point there's a ten to nine some neurons in auditory |
---|
0:14:08 | cortex so that must be turned to the nines |
---|
0:14:12 | tend to the nines away so first solving the problem and maybe it is the |
---|
0:14:16 | right way to go |
---|
0:14:19 | i think there is another aspect that's missing which is a |
---|
0:14:23 | looking at is speech recognition this is a little |
---|
0:14:28 | no acoustic signal and you're model |
---|
0:14:31 | model for |
---|
0:14:32 | i think we need to bring in the context and |
---|
0:14:35 | we are moving towards that |
---|
0:14:39 | feature where the palestinians about the context about your personality |
---|
0:14:44 | but the personalisation all these things should be |
---|
0:14:49 | incorporated into whatever model |
---|
0:14:51 | and that will be used some of these ambiguities that if you just looking at |
---|
0:14:55 | the acoustics |
---|
0:14:56 | that's another you know feature you know it |
---|
0:15:02 | actually i would also like to continue on what she was telling us that there |
---|
0:15:08 | is another one solution to speech recognition there is many right i mean there are |
---|
0:15:12 | some just like there is many cars and many bicycles and many what side i |
---|
0:15:16 | mean is something solutions we need solution to a problem |
---|
0:15:21 | and of course what we keep thinking about all the time is that we will |
---|
0:15:24 | so you can find peace i think it's okay to find many other so many |
---|
0:15:29 | smaller solutions it is not questioning my mind that recognition made enormous progresses i mean |
---|
0:15:36 | actually even i use it here and there i mean of informal will go voice |
---|
0:15:40 | and this is this is already quite something say so google voice is a good |
---|
0:15:44 | example since we have a over here i mean i where the solution came to |
---|
0:15:50 | the point where it's becoming use for just like a car used for do we |
---|
0:15:55 | all agree that this is not ideal way of |
---|
0:15:58 | moving people from one place to another it works to some extent so i maybe |
---|
0:16:03 | we should also think not only about this solution but about many |
---|
0:16:08 | solutions to |
---|
0:16:10 | i wasn't those say that |
---|
0:16:15 | and this relates to |
---|
0:16:18 | about data |
---|
0:16:19 | one thing we see anything this is that |
---|
0:16:23 | given our models language acoustic models |
---|
0:16:27 | young a particular size |
---|
0:16:29 | with a C V |
---|
0:16:31 | and |
---|
0:16:32 | and in that sense what you say about what was also somewhat |
---|
0:16:39 | you were kind of suggesting and symbols of classifiers and rocky suggesting a personalisation their |
---|
0:16:44 | estimate well because |
---|
0:16:47 | we also and all that if i build the model just for you |
---|
0:16:50 | and acoustic model just for you are language models just for you it really works |
---|
0:16:54 | well |
---|
0:16:55 | and |
---|
0:16:56 | maybe is not the most a layer and solution but |
---|
0:17:00 | given enough data and enough context |
---|
0:17:02 | and in of computational resources that works really well |
---|
0:17:06 | and i think don't want to see a lot of work in that direction the |
---|
0:17:10 | prize will have to pay is that |
---|
0:17:12 | you have to let a whoever's building the recognizer for you what there is no |
---|
0:17:16 | one's or microsoft whatever |
---|
0:17:19 | you have to let them access your data |
---|
0:17:22 | and without that you will have to label within a speaker in the and then |
---|
0:17:26 | a context system which might be good but not as well as it can be |
---|
0:17:30 | or you may also provide the means for the user to a modified to technology |
---|
0:17:35 | in such a way that it works best for that even user and a given |
---|
0:17:38 | task right you don't have to the i'd necessarily of on the big brother whatever |
---|
0:17:43 | for me thanks but if you provided technology |
---|
0:17:46 | which is that have a just like actually most of the technology which we are |
---|
0:17:50 | using thing about the car i mean you know you can drive it fast you |
---|
0:17:53 | can drive it slow you can drive you crazy you can drive it safely and |
---|
0:17:57 | it's a little bit up to you technology basically was provided in such a way |
---|
0:18:01 | that user can adopt |
---|
0:18:03 | it in due to its knees i'm use i think that it so this is |
---|
0:18:08 | one way you the other ways you need we are trying to build is big |
---|
0:18:12 | huge model which will and the income parse everything i'm more like |
---|
0:18:18 | believer in many parallel models very much along the lines that human perception in general |
---|
0:18:23 | because you need wherever you're looking the sensory perception typically always find many channels each |
---|
0:18:31 | of them looking at the problem before and way |
---|
0:18:34 | and of course what we have available to us is to pick up the best |
---|
0:18:38 | way and any given time and this is something which we have two and perhaps |
---|
0:18:42 | you know but i don't want to push physical direction which i'm thinking about i'd |
---|
0:18:45 | like to |
---|
0:18:48 | my belief is that it just building one solution for everything is maybe not also |
---|
0:18:53 | the best the best way of |
---|
0:18:56 | quite |
---|
0:18:59 | so i just wanted to say that |
---|
0:19:01 | that the world is a dramatically different place |
---|
0:19:05 | now that it was in nineteen so |
---|
0:19:10 | and that |
---|
0:19:11 | that the constraints |
---|
0:19:14 | that row |
---|
0:19:16 | of the current sort of formalism they don't exist anymore and i think chip you're |
---|
0:19:21 | in shell but says that and i agree that you know if somebody didn't know |
---|
0:19:26 | anything about what the way we do this and they started |
---|
0:19:30 | a fresh |
---|
0:19:31 | and thought about it in the current context it would be remarkable |
---|
0:19:37 | that person came up with the formalism that we do have now |
---|
0:19:41 | and |
---|
0:19:42 | i think that |
---|
0:19:44 | we should spend more time i don't know we should do i certainly will thinking |
---|
0:19:50 | you know about how to do this in a different way given what we have |
---|
0:19:54 | and what we know about the brain i mean it's remarkable how much |
---|
0:19:59 | more we know about humans |
---|
0:20:15 | just comment concerning the speaker-dependent stuff that you put gets it seems year |
---|
0:20:22 | but it's not really solving the problem i mean you can make really very good |
---|
0:20:26 | speaker dependent model but then the person i don't know switch the microphone and you |
---|
0:20:30 | are again most or he's called alright of no use some obscure digital coding which |
---|
0:20:34 | is completely cleared for the human beings but because of some strange digital artifacts your |
---|
0:20:40 | whole algorithms break again |
---|
0:20:41 | so this is i think this is somehow for the people each i'm i mean |
---|
0:20:46 | to help get business in the i completely speaker-dependent environment |
---|
0:20:49 | and i assume that for the people reach are in the i don't know in |
---|
0:20:53 | the environment which is completely speaker independent it must be kind of the power of |
---|
0:20:57 | these you know because you have a huge amount of the data which a speaker |
---|
0:20:59 | dependent so |
---|
0:21:01 | but it's not really sort of the problem is making the problem we came out |
---|
0:21:05 | of our error rate and everything obviously because you can train to the speaker but |
---|
0:21:08 | it's not really dissolution |
---|
0:21:10 | that you're looking for |
---|
0:21:12 | this just commands and then also somehow my |
---|
0:21:15 | intuition or feeling is that the |
---|
0:21:18 | i just i just know that if i understand what the people are talking about |
---|
0:21:22 | it easier to me all the to perform a speech recognition |
---|
0:21:26 | so it has to do something with semantic and it has to case to do |
---|
0:21:30 | something that semantic and with the with the intelligence and the and |
---|
0:21:35 | i don't know on so we use but this is the C just the kind |
---|
0:21:39 | of intuition |
---|
0:21:43 | i have a common about the semantics |
---|
0:21:46 | my perception is that |
---|
0:21:49 | in any many groups |
---|
0:21:51 | i mean many companies not so low resource |
---|
0:21:55 | they tend to treat the recognition as a black box |
---|
0:21:58 | and semantic models are built on top of it |
---|
0:22:01 | maybe they do a little bit of accounting like or maybe let's go phonetic matches |
---|
0:22:07 | just in case the recognizer makes a mistake |
---|
0:22:09 | and i |
---|
0:22:11 | and it that's okay to get something up and running but i think that's a |
---|
0:22:15 | stupid mistake |
---|
0:22:17 | that the semantics and the recognition so be closer together |
---|
0:22:24 | i have to say it's difficult to convince some of the people doing |
---|
0:22:29 | semantics that don't have any speech background |
---|
0:22:33 | that since would be done differently but i believe |
---|
0:22:36 | this would be influenced |
---|
0:22:37 | back and forth |
---|
0:22:51 | was mentioned that is |
---|
0:22:54 | someone starting fresh |
---|
0:22:57 | start with the approach we do |
---|
0:22:59 | and it probably really true |
---|
0:23:01 | one of you hear it |
---|
0:23:04 | the someone E mailed out so gone into that once is |
---|
0:23:08 | now we apply all the in that station the speaker adaptation or all the compensation |
---|
0:23:14 | development features now neural networks someone have that right |
---|
0:23:19 | it's just not gonna work right out by |
---|
0:23:22 | and you can i |
---|
0:23:24 | compensate for thousands of hours that on in its current a broken |
---|
0:23:37 | the renaissance neural networks so morgan |
---|
0:23:47 | using neural networks in the in their fibre formalism because nobody |
---|
0:23:55 | you know |
---|
0:23:56 | was that interested because of all the other things that we're working so well and |
---|
0:24:01 | why would why would anyone in their right minds what it right |
---|
0:24:04 | but then all of a certain work back to you know we're back in this |
---|
0:24:08 | zone where people are doing it so i'll all i'm saying is that the less |
---|
0:24:11 | and i take from that is |
---|
0:24:13 | you know if you can if you can work in if you can get something |
---|
0:24:16 | that is that is that makes sense and is and that is demonstrated really good |
---|
0:24:23 | on a small problem |
---|
0:24:25 | well then maybe that would be pretty compelling |
---|
0:24:28 | i mean i agree with you though it's a it's the success is pretty are |
---|
0:24:33 | you know if i have it is something that i am i gonna say what |
---|
0:24:36 | we think about this for forty years know exactly |
---|
0:24:42 | we all know thirty six |
---|
0:24:44 | and maybe they are like to do something that we should do dishes designing experiments |
---|
0:24:48 | where we say |
---|
0:24:50 | i will show you on the state-of-the-art systems that my method works a little bit |
---|
0:24:55 | better |
---|
0:24:55 | because that's it itched it is not really such a very scientific is it i |
---|
0:25:00 | mean assigned to the experiment is that you isolate one problem and you sort of |
---|
0:25:03 | try to change the conditions and see the things go up postings go down into |
---|
0:25:09 | the goodwill design experiment if you get worse and you predicted be worse |
---|
0:25:14 | given your hypotheses i think you are meaning right we are almost never |
---|
0:25:20 | report results i that because our belief is that the only way to convince our |
---|
0:25:25 | peers that what you are doing is used to use was used for is that |
---|
0:25:30 | you get a low word error rate is possible on the state-of-the-art systems with the |
---|
0:25:35 | optimal accepted task whatever it is at the moment |
---|
0:25:38 | so i designing good experiments again going back it seems seriously to jump beers be |
---|
0:25:44 | designed a clear definite experiments so that science can grow step by step by step |
---|
0:25:50 | i seen that we have to learn how to do that and since you mentioned |
---|
0:25:54 | in new networks i want to share with you might personal experience |
---|
0:25:58 | it's different houses here is going to be and he may not even remember |
---|
0:26:02 | but a long time ago once the post postdoc at icsi here on the experiment |
---|
0:26:07 | very he had a context independent a hmm-model a context independent phoneme and the you |
---|
0:26:14 | wanna model and you wanted model was doing twice as good as the hmm and |
---|
0:26:20 | that can means to be i mean you know that we stick to neural nets |
---|
0:26:23 | throughout the dark ages on you of neural nets N I partially because we invent |
---|
0:26:28 | have a so but in hmms an lvcsr as but as a partially because i |
---|
0:26:32 | truly believe that because that was an experiment which was very convincing to me if |
---|
0:26:36 | i have a simple a gmm model |
---|
0:26:39 | without any context-dependency to try easy to of course building to do system and context |
---|
0:26:44 | the i mean context independent hmm model which was the only way which we between |
---|
0:26:49 | you have to be noted at a time |
---|
0:26:51 | and you and that is doing twice as good as the hmm why wouldn't i |
---|
0:26:56 | stick to this at you are like model i'm glad that we did |
---|
0:27:00 | i don't know steep if you remember this experiment i say good but i think |
---|
0:27:03 | it actually got a piece even in transactions eventually right |
---|
0:27:10 | you know what one other where you can get use of out of a local |
---|
0:27:13 | optimum is change the evaluation criteria right and i think and i think that's i |
---|
0:27:19 | mean and part what mary's than what the babel program you know have keyword searches |
---|
0:27:22 | the task in atwv well extracted word error rate it's not always perfect and i |
---|
0:27:26 | think another thing that |
---|
0:27:28 | people we seems to me really are to be reporting when you put a word |
---|
0:27:32 | report a word error rate is not just the mean word error rate but the |
---|
0:27:35 | variance across the utterances because you can have a five percent word error rate but |
---|
0:27:39 | if a quarter of your utterances are essentially you know eighty percent word error rate |
---|
0:27:43 | which can happen then you know that's a good way to start figuring out how |
---|
0:27:47 | to get your |
---|
0:27:48 | technology a little more reliable |
---|
0:27:51 | i was hoping you would have a comment |
---|
0:27:54 | i feel |
---|
0:27:56 | i feel obligated to |
---|
0:27:58 | to |
---|
0:28:00 | talk about ancient history since i'm getting a little older now |
---|
0:28:05 | i remember when hmms started and we were certainly not the first to use them |
---|
0:28:11 | we were sort of in the middle of that |
---|
0:28:13 | of that previous |
---|
0:28:15 | a revolution |
---|
0:28:17 | the big criticism there were two big criticisms of hmms |
---|
0:28:22 | relative to the previous method the previous method was just write the rules because we |
---|
0:28:26 | all know about speech and say how it works and those systems which i wrote |
---|
0:28:31 | systems like that back and the early seventies because i was a late adopter of |
---|
0:28:35 | hmms |
---|
0:28:37 | those systems were very simple easy to understand extremely fast |
---|
0:28:44 | needed no training data |
---|
0:28:46 | that sounds nice right |
---|
0:28:49 | and they could do very well on set on simple problems without training data and |
---|
0:28:54 | the hmm is the government argued in other people argued and sometimes we argued hmms |
---|
0:28:59 | were too complicated require too much storage too much training too much memory and would |
---|
0:29:06 | never be practical |
---|
0:29:09 | well obviously things changed and it wasn't only computing power that was a big factor |
---|
0:29:15 | but it was also learning how to make it more efficient and we do a |
---|
0:29:22 | combination of all of those not being |
---|
0:29:25 | re so rigid just to say we have to do it with zero data and |
---|
0:29:29 | just what i learned in my acoustic phonetics class |
---|
0:29:33 | we could use data |
---|
0:29:34 | more data always helped |
---|
0:29:36 | learning to do speaker adaptation rather than speaker dependent models |
---|
0:29:42 | okay neural nets |
---|
0:29:44 | neural nets work done simple problems but not on more complicated problems |
---|
0:29:50 | and what was need i'd say the reason it works now is because we can |
---|
0:29:55 | now do you know it two three years ago the things that we're working we're |
---|
0:29:59 | requiring two months of computation which is just you know unacceptable completely unacceptable some bold |
---|
0:30:06 | people did that that's great and then they figured out how to get better computers |
---|
0:30:12 | that all of this argues that each revolution which happens that at twenty five years |
---|
0:30:18 | cycle |
---|
0:30:20 | is the realisation that all of the intelligent things that we thought we knew |
---|
0:30:27 | can stevens would tell us what happens with formant frequencies and i learned all those |
---|
0:30:31 | things all of those were not the way to go the real understanding was not |
---|
0:30:36 | the way to go with bothered us because we'd like to think about |
---|
0:30:42 | we like to think about you know the them phonemes and things like that |
---|
0:30:48 | but we know that phonemes are abstractions |
---|
0:30:51 | we know that formants are an oversimplification |
---|
0:30:54 | everything that we learn is an oversimplification and computers are just simply more powerful than |
---|
0:31:00 | we are |
---|
0:31:02 | then we can anything we can write the not more powerful than the brain but |
---|
0:31:06 | the right more powerful than anything that we can write in a in a program |
---|
0:31:10 | so i think |
---|
0:31:12 | that would argue against |
---|
0:31:16 | the i i'm not i'm not saying that you shouldn't keep trying to find the |
---|
0:31:21 | right answer but i think history has told us that the right answer is think |
---|
0:31:26 | about more efficient ways |
---|
0:31:29 | both you know computing will increase its increased by factor of a thousand and the |
---|
0:31:33 | last twenty five years both segments memory and storage and it will increase by a |
---|
0:31:37 | factor of a thousand every twenty five years forever |
---|
0:31:41 | and that's a big number in fifty years |
---|
0:31:46 | but at the same time we can think about algorithms that are a thousand times |
---|
0:31:51 | more efficient |
---|
0:31:52 | that had that has happened and it will happen |
---|
0:31:57 | it a little you know collects that's of data other people can collect parts of |
---|
0:32:01 | data i think it will happen that we will have corpora that include the speech |
---|
0:32:06 | of millions of people from |
---|
0:32:09 | hundreds of languages in hundreds of environments |
---|
0:32:15 | and if you just imagine that it was let's just pause it that it was |
---|
0:32:21 | simple and easy to collect millions of hours from all these environments and memorise all |
---|
0:32:26 | of it and learn what to do with it and compute it store it all |
---|
0:32:29 | in something that fits in your you know in the chip that's embedded in your |
---|
0:32:34 | in your hand or something or in your you in your head |
---|
0:32:39 | well in it just works you don't know why or how it works but it |
---|
0:32:42 | works |
---|
0:32:44 | so i |
---|
0:32:46 | while i have the same desire to understand |
---|
0:32:53 | intellectually what's going on i would that almost anything that will be of the solution |
---|
0:32:58 | that eventually works |
---|
0:33:04 | so i'd like to make the other side |
---|
0:33:07 | and the other side is if you look at the history of science |
---|
0:33:10 | what's happened is |
---|
0:33:11 | are truly |
---|
0:33:13 | stupendous advances have come from understanding where we are |
---|
0:33:18 | recurrent models don't work |
---|
0:33:20 | it's not |
---|
0:33:21 | that we shouldn't try to push models |
---|
0:33:23 | but the think that you're describing |
---|
0:33:26 | engineering |
---|
0:33:27 | i'm pam of engineering what truly understanding comes from looking at the places where our |
---|
0:33:33 | current models fail |
---|
0:33:35 | and all of the things that we've been doing for the past twenty years are |
---|
0:33:39 | data |
---|
0:33:40 | for the next |
---|
0:33:42 | and we should be paying attention to where we fail |
---|
0:33:45 | and that's where we're gonna find the success |
---|
0:33:49 | so a |
---|
0:33:51 | one the to it at a little bit |
---|
0:33:54 | it seems like this i think that i like which we always think |
---|
0:34:00 | a |
---|
0:34:01 | the old story is if you take |
---|
0:34:04 | an infinite number of monkeys and give them |
---|
0:34:07 | infinite number of typewriters eventually will i shakes |
---|
0:34:11 | and i think that's what you're suggesting |
---|
0:34:14 | a you have a few problems number one |
---|
0:34:18 | more is lower it did |
---|
0:34:20 | fairly much comes took came to an end |
---|
0:34:23 | and that industry is facing the same problem unless there is a dramatic |
---|
0:34:28 | technological shipped |
---|
0:34:30 | a you're not going to get |
---|
0:34:33 | the kind of doubling that we've seen every eighteen months |
---|
0:34:37 | in the future |
---|
0:34:38 | basically quantum mechanics eventually getting you way |
---|
0:34:43 | the alignments are so narrow now that there are not too many atoms or |
---|
0:34:48 | to allow for them to continue to be |
---|
0:34:52 | a |
---|
0:34:53 | somebody else said something about |
---|
0:34:56 | well what happen if people started a |
---|
0:35:00 | doing this research all over again would be find the same solution |
---|
0:35:05 | a i'm waiting now a marvellous what paul designed the nature tries to explain evolution |
---|
0:35:12 | not just of humans but rivers and everything else in terms of |
---|
0:35:18 | physical laws |
---|
0:35:19 | i highly suggest reading it it's very entertaining a but basically |
---|
0:35:24 | and then going back to the coding i think when the coding what was done |
---|
0:35:28 | it really was fundamental in the sense that we understood |
---|
0:35:33 | a page and spectrum where the essence so for example the coding that works on |
---|
0:35:40 | yourself on which is really meant to code speech if this is like in the |
---|
0:35:45 | background it totally the stories because it really as adopted to the speech signal |
---|
0:35:51 | so wasn't just a random brute force process it really depended on first lpc then |
---|
0:35:59 | are is a coding the residual and all of that and that's why we have |
---|
0:36:03 | such good coders and i think |
---|
0:36:06 | a |
---|
0:36:07 | the theory behind that was of course much more trivial then it is and in |
---|
0:36:12 | language |
---|
0:36:13 | so i do think that |
---|
0:36:15 | we need to continue the work that we're doing but on the other hand do |
---|
0:36:21 | a lot for some paradigm shifts a that would be more than just are increasing |
---|
0:36:27 | a that's stochastic ability by introducing neural nets and |
---|
0:36:34 | from where i said i thousand miles at a neural nets essentially are a generalization |
---|
0:36:40 | of hmm their boats stochastic models it's just that in hmm you have essentially a |
---|
0:36:47 | single it later |
---|
0:37:00 | so i think the point about how much data and we need to solve the |
---|
0:37:03 | problem by brute force comes down also to the question of |
---|
0:37:07 | artificial intelligence right |
---|
0:37:08 | so contain with these two stage scenarios one even scarier is that one day we're |
---|
0:37:14 | going to get a activity in to use right |
---|
0:37:17 | and so this process this when this happened or in the way so that moment |
---|
0:37:21 | we're going to lose control of abstraction right machines are going to be better than |
---|
0:37:25 | this ad created their own map structures so all this prior knowledge we want to |
---|
0:37:30 | put into our models |
---|
0:37:31 | is going to be are way you've seen things but machines are going to have |
---|
0:37:35 | their way of seeing things |
---|
0:37:36 | and when is it is discussions about saying |
---|
0:37:39 | when we have to look at the problem and things like humans and |
---|
0:37:42 | i think well |
---|
0:37:43 | i is already happening that machine to create in they don't obstructions and they are |
---|
0:37:47 | not into due to less but since they are two going to do better than |
---|
0:37:50 | as in the long term we're done we might be better of just think you |
---|
0:37:54 | know how the so much in sync up on the not how like to think |
---|
0:37:57 | on this |
---|
0:37:58 | how i can express the problem okay you at generative model that see it is |
---|
0:38:02 | to me |
---|
0:38:03 | maybe it should be intuitive to the machine |
---|
0:38:05 | or to the harder right and deep neural networks |
---|
0:38:08 | to some extent |
---|
0:38:09 | okay |
---|
0:38:11 | doing this i would very far away from that similarity right but when we will |
---|
0:38:16 | reach that so maybe we'll webbetter of thinking |
---|
0:38:20 | and i |
---|
0:38:33 | that they are really always looking in the light and basically after fifty years over |
---|
0:38:38 | artificial intelligence essentially of developed |
---|
0:38:42 | tremendous methods for optimization and classification there is very little more can inference and logic |
---|
0:38:50 | so i'm very good the to field is alive and well the si can see |
---|
0:38:55 | from this discussion it really reminds me of which it reminded us that for one |
---|
0:39:02 | of the first the asr you the workshops and i will also remember that even |
---|
0:39:07 | in my introduction |
---|
0:39:08 | where people were discussing fighting and it always the desire to move the field further |
---|
0:39:15 | and i'm very happy that i think that we use exceeded too large extent in |
---|
0:39:19 | this asr you to so let's just keep it's going i think otherwise i will |
---|
0:39:24 | i will pass of the microphone to one zap who has a |
---|
0:39:28 | a sound |
---|
0:39:29 | since to say about is it is it the time for post the room or |
---|
0:39:33 | basically i estimate i one commander is discussion i think |
---|
0:39:38 | what we were discussing with the data that models the adequacy of models monitored by |
---|
0:39:43 | C |
---|
0:39:43 | i think well it turned little bit speech centric |
---|
0:39:47 | so a little bit too selfish i fine so i think we forgot about the |
---|
0:39:52 | users have a four technologies because i have the impression |
---|
0:39:55 | that the well rarely people would just ultimately use the output the of asr and |
---|
0:40:00 | say this is the output them your it finishes is most of the time is |
---|
0:40:04 | just some meat product that would be further used by someone so actually |
---|
0:40:08 | i like the way that the better what so speaking about that the well for |
---|
0:40:13 | you would be the wer is not the automated metric but is the click through |
---|
0:40:16 | rate wer of foreign call center traffic it might be the customers of destruction so |
---|
0:40:21 | they have measures forty |
---|
0:40:23 | for a government agency it might be the number of court |
---|
0:40:27 | but the guys |
---|
0:40:28 | and so on and so on so i think actually there is still quite some |
---|
0:40:31 | work to do in propagating these target metrics |
---|
0:40:34 | back to our field that i'd i don't know if there was like sufficient work |
---|
0:40:38 | on this maybe they are not that only interested |
---|
0:40:41 | in at W or wer and stuff like this just the just need to get |
---|
0:40:46 | there were done |
---|
0:40:51 | okay so we cook is sorry i didn't i didn't mean that the |
---|
0:40:55 | find technical common and in the i did so no |
---|
0:41:01 | no comments on this |
---|
0:41:02 | one |
---|
0:41:04 | lost |
---|