0:00:13 | bush water on |
---|
0:00:15 | don't ask me to carry in one |
---|
0:00:19 | but okay no shot and so uh but and for uh |
---|
0:00:24 | three sessions |
---|
0:00:26 | um |
---|
0:00:27 | for |
---|
0:00:28 | after lunchtime |
---|
0:00:30 | uh i think we're |
---|
0:00:32 | as |
---|
0:00:32 | because shall i we to and of about five to six o'clock |
---|
0:00:36 | and family |
---|
0:00:37 | and took pictures of uh of the poster sessions |
---|
0:00:40 | and the sessions that had |
---|
0:00:43 | i two people at the end |
---|
0:00:45 | i and of speech and language processing so people in speech and language or or or or or or is |
---|
0:00:50 | dedicated to kind of stay to the yeah so |
---|
0:00:52 | i can much for a for coming to it |
---|
0:00:55 | so uh uh i just go |
---|
0:00:56 | inter |
---|
0:00:57 | so uh i was to close have to the |
---|
0:00:59 | first just a couple |
---|
0:01:01 | about that the suspension line which technical can |
---|
0:01:04 | most of you have a saying that one can had a like a icassp i my above ten |
---|
0:01:09 | to a can use |
---|
0:01:11 | for special which to |
---|
0:01:12 | i the low just to fifty three members the |
---|
0:01:15 | uh a a for the notion of a should |
---|
0:01:18 | i because we have a a large number of people per it's made at a cost so |
---|
0:01:23 | still |
---|
0:01:24 | uh sum of about some papers |
---|
0:01:27 | a a a a a a a "'cause" spanish trapped on the last |
---|
0:01:31 | a a couple of hours passed because |
---|
0:01:33 | we have a separate a constant of filter |
---|
0:01:36 | a that focus on which much processing and and spend increasing a rather significant way |
---|
0:01:42 | a |
---|
0:01:43 | uh |
---|
0:01:43 | the paper try since you can can |
---|
0:01:45 | and |
---|
0:01:46 | initial |
---|
0:01:47 | we have we about a the of the papers that that's except at i cast |
---|
0:01:52 | and suspension which processing field and so |
---|
0:01:54 | uh and discussed like to cover |
---|
0:01:58 | uh i seven hundred submissions and a three and four are some papers |
---|
0:02:03 | i in for it's when five we could just one thing |
---|
0:02:06 | a session of those and that were to thirty minutes by itself |
---|
0:02:11 | um |
---|
0:02:12 | but we just them not to do that |
---|
0:02:14 | okay |
---|
0:02:14 | uh that's true |
---|
0:02:16 | a uh i was spent |
---|
0:02:19 | a under a giant |
---|
0:02:22 | a a but uh |
---|
0:02:24 | a a a a from a a a a has a big impact of interest |
---|
0:02:28 | was not able to attend a of the input here so |
---|
0:02:32 | a lot of uh chin folks kinda again |
---|
0:02:35 | um we encourage questions is or we're going to have to kind of try to cover to to uh |
---|
0:02:40 | uh uh i have tried to make sure that are we get to have a microphone well so |
---|
0:02:45 | uh i think |
---|
0:02:46 | german one going for |
---|
0:02:48 | okay |
---|
0:02:50 | um um be very quickly so if anything |
---|
0:02:54 | i white he's |
---|
0:02:55 | um if anything i set wrong or a miss anything point to where |
---|
0:02:58 | and i you need three |
---|
0:03:00 | to Q is set |
---|
0:03:01 | just |
---|
0:03:01 | i'm i |
---|
0:03:03 | why is that |
---|
0:03:04 | a because there is a were three hundred paper a C to really hard |
---|
0:03:07 | uh to goes through a them them a summarise be case |
---|
0:03:10 | section if die on it up now of a see that yeah |
---|
0:03:13 | um B |
---|
0:03:16 | um this is |
---|
0:03:17 | you thank you people like couple that that talk to try to get a job of U |
---|
0:03:21 | um so a is three hundred twenty five paper is uh it is true of an them are on speech |
---|
0:03:26 | and it seventy five non on the language processing |
---|
0:03:29 | oh according to how the conference |
---|
0:03:31 | i sign of me to different section is the be case is arguable second papers |
---|
0:03:34 | can be a both |
---|
0:03:36 | um |
---|
0:03:37 | so they they are to part out wall um i what cover an now one handed out well not |
---|
0:03:42 | uh the um |
---|
0:03:43 | uh they or departing in language processing and uh and speech processing by not T T S and a a |
---|
0:03:48 | and this speech |
---|
0:03:49 | and that will a cover to speak i D uh including a speech |
---|
0:03:54 | a speaker verification and recognition and test speaker tiring is |
---|
0:03:58 | diarisation |
---|
0:03:59 | and and and up yeah what touched they speech nice thing has to |
---|
0:04:03 | and the little B |
---|
0:04:05 | and so this is a language modeling |
---|
0:04:07 | um on their right top of the the to show shows they number of papers on the big field F |
---|
0:04:11 | i of the language processing |
---|
0:04:13 | so uh a couple of things worth mentioning here is that a for X Y the at the the like |
---|
0:04:18 | the model um a a a model M based exponential model um the class based in then your a network |
---|
0:04:23 | model a language model non spam model it a dynamic language model adaptation |
---|
0:04:28 | um discriminate models |
---|
0:04:31 | and uh i think there many others just a couple |
---|
0:04:33 | and i as i C O goes through a paper is so what's common in the |
---|
0:04:37 | um the um |
---|
0:04:39 | uh the a cup of the a sum up paper on computing |
---|
0:04:42 | um uh optimize asian time to do you know how to |
---|
0:04:45 | train a language model a large scale data uh so it distributed uh ch training a fast |
---|
0:04:51 | you are not recommended model training training how to manage in long span |
---|
0:04:54 | and there is a common um comic data set people work |
---|
0:04:58 | and uh spoken document a processing here the task it try to do to documents some summarization classification a speaker |
---|
0:05:05 | role identification um give a kind approach is a typical of machine learning |
---|
0:05:09 | um at the A C rap and now the motion any classification and reasons |
---|
0:05:13 | and the uh and translation and the semantic a classification or set |
---|
0:05:17 | um two sessions that route paper is on this topic |
---|
0:05:20 | including a um the standing you i search though um different paper sick probably use different times to by or |
---|
0:05:27 | of folks out how to use a lot the how to uh um and ten to carries for search |
---|
0:05:32 | and then and that there is a um |
---|
0:05:34 | well i'm paper use in T B N of a i think it you see lots of paper some T |
---|
0:05:38 | B the used to be a language |
---|
0:05:40 | um uh and is standing this the for car car routing |
---|
0:05:43 | um a speech translation uh cab are saying how you can and tied that |
---|
0:05:48 | a speech recognition and south |
---|
0:05:50 | um and translation together whether a yes are a word accuracy it's not probably is not good metric of four |
---|
0:05:56 | for uh um for speech translation |
---|
0:05:59 | um the bilingual audio uh subtitle extraction there's |
---|
0:06:02 | many others i think a |
---|
0:06:04 | i probably didn't not list L |
---|
0:06:07 | um |
---|
0:06:08 | uh power linguistic in an a linguist give features |
---|
0:06:11 | uh did this are very interesting and be case |
---|
0:06:14 | you could think case of what to you can you know for a from speech language the motion detection |
---|
0:06:18 | um recognising and lexical but yes and now |
---|
0:06:21 | um the cognitive load a correct classification a |
---|
0:06:25 | um trying to |
---|
0:06:26 | um um i to guess |
---|
0:06:28 | when there is a one to talk to compute a to compute trying to guess you know how much you |
---|
0:06:31 | are thinking |
---|
0:06:32 | um |
---|
0:06:34 | the perceptual difference so four innings this if for language learning be or speech at things that says |
---|
0:06:39 | um you generating a a traffic trucks pressure |
---|
0:06:41 | it's are on this stick to topic |
---|
0:06:43 | um some of them operating new |
---|
0:06:46 | um spoken term and recognition |
---|
0:06:48 | it's trying to um now um you know given the huge uh |
---|
0:06:52 | each file was so audio file was or of video trying to be trying a list of |
---|
0:06:55 | of spoken utterances giving a voice query boast term which you you just speak |
---|
0:07:01 | um the approach is are of dynamic whopping sub word recognition rate to one graph based approaches |
---|
0:07:07 | um there's comedy a set out from this |
---|
0:07:12 | and the dialogue um |
---|
0:07:13 | there are um this is a i we don't have |
---|
0:07:16 | for this calm face and there's only five people |
---|
0:07:19 | well i to but i mean to dialogue um by a uh uh |
---|
0:07:22 | you know a at the train is before you know if you look a back |
---|
0:07:25 | a couple years ago probably there not many approaches are the disk approach |
---|
0:07:28 | now is most papers to focus on this disc of there two |
---|
0:07:32 | oh fights |
---|
0:07:33 | wise you you you track a distribution of all the all the possible state |
---|
0:07:37 | the second part is the being from an any you put to again it become palm to P i think |
---|
0:07:42 | a um uh there are several papers on the conference on problem |
---|
0:07:46 | um this so |
---|
0:07:47 | there's a events this is so specifically so we're not go through those |
---|
0:07:51 | um language identification um |
---|
0:07:54 | yeah six a a a a a a paper is an in and one session |
---|
0:07:58 | they skate trying use phonetic a prosodic feature is the combination of them |
---|
0:08:02 | and and did to do that to to identified the language if you look and approach um how to do |
---|
0:08:07 | at it's a a you know um use a classification |
---|
0:08:10 | uh i i can i seek up a paper sound logistic regression in and grams |
---|
0:08:14 | there's a set to or was the same data |
---|
0:08:17 | or this uh |
---|
0:08:19 | a trying you guess the language what use thing |
---|
0:08:21 | um a lexicon modeling is trying to use the much line tight to automatically generate the pronunciation from the from |
---|
0:08:28 | the given word |
---|
0:08:29 | um that you to be there is a couple question lining uh and here have |
---|
0:08:33 | uh approach is introduced in that |
---|
0:08:35 | um are to lingo on a multichannel processing |
---|
0:08:38 | it task you is that you know you have a mixed them and to each input how do you |
---|
0:08:42 | uh to the asr um to uh index and search and um what mac thing you can set up |
---|
0:08:48 | the approach the very died verse by a so i did not list them on here |
---|
0:08:51 | um |
---|
0:08:52 | speech analysis this a few i really um don't know much so i uh |
---|
0:08:55 | um |
---|
0:08:57 | just trying to cover what topics that were kind motion detection |
---|
0:09:00 | um |
---|
0:09:01 | i you know on sing this level you kind to |
---|
0:09:03 | um |
---|
0:09:05 | it |
---|
0:09:05 | sec come up if the change you can see today the relationship between motion in F zero range |
---|
0:09:10 | um |
---|
0:09:10 | you motion you know including detect the anger and so and so far |
---|
0:09:14 | and duration modeling for for log block of account for zero |
---|
0:09:18 | um um yes are peachy frequency estimation so on |
---|
0:09:22 | um |
---|
0:09:23 | the approach is i i i sees you know couple of things probably um not new buys the comic class |
---|
0:09:28 | papers |
---|
0:09:29 | um singularity generic the i have a the phase locked loops |
---|
0:09:33 | and there is coming a set on that |
---|
0:09:35 | um |
---|
0:09:36 | as i said this is good i don't know much so if for the second chair is our or would |
---|
0:09:40 | know this a better place a calming know what's |
---|
0:09:42 | what's what's missing here that i didn't cover |
---|
0:09:45 | uh speech enhancement |
---|
0:09:46 | um |
---|
0:09:48 | a a time know the task it trying to just |
---|
0:09:50 | as separate a speech of versus not speech and noise |
---|
0:09:53 | um |
---|
0:09:55 | a there is |
---|
0:09:56 | okay |
---|
0:09:57 | um you can be the slides but uh there's is i i think i can i at |
---|
0:10:00 | compared to produce the conference is is a be more apt it's a music noise |
---|
0:10:05 | um |
---|
0:10:06 | there as many approaches here um |
---|
0:10:09 | you know |
---|
0:10:09 | somehow of mark and tuning to in uh |
---|
0:10:12 | they well no poke just like when you filtering in who the calm through train C C |
---|
0:10:17 | um have a long list T here i thing i don't have time goes through an all |
---|
0:10:22 | and uh |
---|
0:10:23 | you can vacation |
---|
0:10:24 | um we have over all there's a forty eight paper on this topic in clean |
---|
0:10:29 | speaker diarization and um but i think that we more than previous to conferences probably one the reason |
---|
0:10:36 | um i don't know what is the relevant to the nist to be recognition evaluation |
---|
0:10:40 | um |
---|
0:10:42 | a crop but he's through the |
---|
0:10:44 | the paper is there is a |
---|
0:10:46 | a couple of things just |
---|
0:10:47 | just highlights i think E |
---|
0:10:49 | very very is hard to summarise |
---|
0:10:51 | um i back to space and uh a probabilistic lda |
---|
0:10:54 | and uh the evaluation papers the from nist R |
---|
0:10:57 | are the are used to use in in a fusion that if you would us several uh speaker recognition system |
---|
0:11:04 | to you fuse the results |
---|
0:11:05 | um |
---|
0:11:08 | okay second one here speaker that issuing is first if who |
---|
0:11:11 | spoke when you in audio stream i'm meeting |
---|
0:11:14 | um |
---|
0:11:15 | yeah |
---|
0:11:16 | a just a which is a second and this into |
---|
0:11:17 | uh top down bottom class three |
---|
0:11:20 | um |
---|
0:11:20 | how to uh exposed features be close to give features there's is by new keys approach |
---|
0:11:25 | um there uh information bottleneck the based approach the couple course quite |
---|
0:11:29 | i knew hence |
---|
0:11:30 | is is me on this field |
---|
0:11:32 | um you bass so S I can in this uh |
---|
0:11:35 | um |
---|
0:11:36 | there's lots of people is here i search will miss something |
---|
0:11:40 | um the so i put i mean you several kind can't three the first one uh processing |
---|
0:11:44 | and signal up i think set compressed the sensing |
---|
0:11:46 | i you can use a compressed sing on on the other parts of a by S are two |
---|
0:11:50 | um now net to magic a factorization |
---|
0:11:53 | um how to use of a to transform in that the spectrum |
---|
0:11:57 | and then there other approaches i have give a long list to here |
---|
0:12:00 | and |
---|
0:12:01 | a feature is so how to and you know lots of features six we shouldn't based ten antenna |
---|
0:12:06 | um |
---|
0:12:07 | there's say a you a of the cup of papers how to use T N to two genders the tandem |
---|
0:12:11 | features |
---|
0:12:12 | um |
---|
0:12:13 | logistically smart mapping |
---|
0:12:15 | and noise it's feature normalization |
---|
0:12:17 | uh there is a a and different model |
---|
0:12:19 | um |
---|
0:12:20 | um i is |
---|
0:12:22 | quite diapers |
---|
0:12:23 | a collection |
---|
0:12:24 | so i don't think of will we can |
---|
0:12:26 | um |
---|
0:12:27 | maybe after this of can put a slice somewhere where thing wants to take a look |
---|
0:12:31 | um |
---|
0:12:32 | given at a wall and you know worked |
---|
0:12:41 | a i'll try to cover |
---|
0:12:42 | can everyone hear me |
---|
0:12:44 | so i like to cover all of the uh papers that but generally included in large vocabulary speech recognition and |
---|
0:12:51 | acoustic modeling and adaptation technique |
---|
0:12:53 | um can any that were a lot of a you can see |
---|
0:12:56 | the asr lie |
---|
0:12:58 | and so we try to split it in a manner that matched well with the sessions so let's for start |
---|
0:13:03 | with adaptation |
---|
0:13:04 | the problem here is basically to say how well can you adapt your existing models |
---|
0:13:09 | do is a specific speaker or environment |
---|
0:13:12 | and the most recent trend we been seeing is how can you and force sparsity or structure on the transforms |
---|
0:13:18 | we line |
---|
0:13:19 | and how can you do a better optimization |
---|
0:13:21 | now in general the ideas that have been floating it on in this field include discriminative transforms |
---|
0:13:27 | how can to find something that will learned rapidly or rapidly adapt to minimal amounts of data |
---|
0:13:33 | and so and now you see these things are adapted to more real well tasks such as a waste H |
---|
0:13:40 | and you starting to see as some impact from one of these techniques and this new of data |
---|
0:13:45 | and we did see some for is now on a rapid adaptation for uh like is said to a what |
---|
0:13:50 | test |
---|
0:13:51 | and how you can include um |
---|
0:13:53 | convex optimization methods in situations where your objective function is not convex anymore |
---|
0:13:58 | i if you want to read more about uh these bit where as i've listed that element section here |
---|
0:14:06 | that was not a |
---|
0:14:06 | good idea job |
---|
0:14:18 | we we have small problem |
---|
0:14:20 | and so i i think that do modeling now yeah are modeling was split the as many many sessions |
---|
0:14:26 | uh basically but all talking about statistical modeling of speech signals yeah |
---|
0:14:30 | uh the more recent trends have been along the line of how can i use machine learning technique |
---|
0:14:36 | in large vocabulary speech recognition we all know they were on certain class of problems like envision and handwriting recognition |
---|
0:14:43 | uh uh which are |
---|
0:14:45 | really difficult but but like i have small do it is sets so we're not looking to see how we |
---|
0:14:49 | can apply an and these techniques to speech problems |
---|
0:14:53 | and a lot what that comes the task of speeding up these learning algorithms to deal with large quantities of |
---|
0:14:58 | data |
---|
0:14:58 | and that guy here we saw more applications to real well uh tasks |
---|
0:15:02 | and including die play evaluation |
---|
0:15:05 | yeah i this like yeah your some of the i D as we saw most if you are familiar with |
---|
0:15:09 | these things |
---|
0:15:10 | um |
---|
0:15:12 | i i'll some of the key components yeah why at a a a we saw some papers and capturing long |
---|
0:15:16 | H on that |
---|
0:15:18 | uh critically more use of this do you has to like clean either an hmm framework or in other forms |
---|
0:15:23 | of coding |
---|
0:15:24 | uh a how can you use the psd there to type from class classifiers intelligently maybe we've using them in |
---|
0:15:30 | deep belief nets or maybe they are using them directly the hmm framework |
---|
0:15:34 | a a can you intelligently like acoustic units |
---|
0:15:37 | whether that it's for english or any other form of language |
---|
0:15:41 | and do you'll have enough data and now to pick these acoustic units which didn't white the before |
---|
0:15:45 | um also we have seen some papers that use language id accent and dialect identification in incorporating them to improve |
---|
0:15:53 | speech recognition accuracy |
---|
0:15:54 | so you a bunch of a as people are working on |
---|
0:15:57 | um we want to see in some recent interesting wake on uh last functions and busting methodologies that improve the |
---|
0:16:04 | quality of the classifiers of the learners an acoustic model |
---|
0:16:08 | uh this this particular yeah that that the meant in the the section title modeling for a a |
---|
0:16:22 | uh moving on a at but to why sessions which covered acoustic modeling these line it's topics and statistical that |
---|
0:16:28 | that |
---|
0:16:28 | and these do fall under the category of general asr type problem |
---|
0:16:32 | um |
---|
0:16:33 | that was some more i yes yeah which include complex models |
---|
0:16:37 | which include long spend board language modeling an acoustic modeling technique |
---|
0:16:41 | uh we see some applications of C i i have to be so multiple stream the nation's |
---|
0:16:46 | um a i thought is an using this D D as as some sort of any then to that there |
---|
0:16:50 | and thinking out how to model these posterior |
---|
0:16:53 | a a a a few a where is that where a uh uh derivation from the johns hopkins workshop which |
---|
0:16:58 | is that every summer focused john |
---|
0:17:00 | how you can use some of these posteriors in some sort of a segment of framework |
---|
0:17:05 | a a more recently if you see what the training is |
---|
0:17:08 | uh we see a lot of |
---|
0:17:10 | now and and sparse representations example are based methods |
---|
0:17:14 | how you can capture higher-order statistics using deep belief networks |
---|
0:17:18 | um you have a point process models are can you to spectro-temporal patterns |
---|
0:17:23 | uh i so we are saying a wide range of novelty here in this field |
---|
0:17:31 | uh a continuing on and modeling which is also included in discriminative techniques for asr |
---|
0:17:37 | a of the is you was mostly on how can i use just limit of training for both acoustic model |
---|
0:17:42 | as well as for adaptation |
---|
0:17:44 | uh i we saw some papers on training full covariance models |
---|
0:17:47 | uh we also saw a if you break it down into specific to saw some feature selection voices |
---|
0:17:53 | better are like it is the in your model parameters that was interesting |
---|
0:17:56 | and people also to present a different kinds of training criteria do you use an objective function that models see |
---|
0:18:02 | what at a rate or do you use an objective mark function that model something else related uh |
---|
0:18:07 | to to the likelihood or the ad or in some computer in some other fashion |
---|
0:18:16 | um the last session that a cover on asr was uh a tight to large vocabulary speech recognition |
---|
0:18:22 | uh the focus it was mostly and bowling large systems uh large systems for the galley value evaluation in different |
---|
0:18:28 | languages |
---|
0:18:29 | and that are also if you like six systems that were built on real world tasks |
---|
0:18:33 | and |
---|
0:18:34 | so of the key idea here are how can you exploit large quantities of unlabeled data and to the class |
---|
0:18:40 | of unsupervised training |
---|
0:18:41 | a a do you use better methods for lattice based training |
---|
0:18:45 | uh we also saw that's a is the best farming techniques and algorithms for building acoustic and language models |
---|
0:18:51 | and typically we and then like in tasks like mandarin then a big which were part of the gale evaluation |
---|
0:18:57 | oh also system combination strategies played an important role |
---|
0:19:01 | uh we also some that that it's to do unit selection |
---|
0:19:04 | particularly in language just like a man and polish we sell some methods to improve the quality of transcripts when |
---|
0:19:10 | you're don't have a uh manually transcribed data |
---|
0:19:13 | how you can improve the performance of your acoustic models of the training by |
---|
0:19:18 | getting better transfer |
---|
0:19:20 | uh that was a like a on in of decoding schemes to better optimize memory consumption and to make things |
---|
0:19:27 | go faster |
---|
0:19:28 | and B so a large presence of deep belief networks all over the place |
---|
0:19:38 | which still anyway somewhat |
---|
0:19:39 | and we saw lots of papers on acoustic modeling out so |
---|
0:19:43 | um |
---|
0:19:44 | this you can break it down into some a couple of that as |
---|
0:19:47 | one which includes or tended to features for hmms in addition to traditional mfccs and plps |
---|
0:19:53 | and the other in the modeling paradigms itself we saw a lot of but are starting from from recognition to |
---|
0:19:59 | lvcsr |
---|
0:20:00 | a a a a a few things to point out we saw energy based feature |
---|
0:20:05 | a lot of articulatory trajectories a hot can you do it uh include nonstationary features term and page for set |
---|
0:20:11 | languages |
---|
0:20:13 | uh we saw some efficient parameter estimation that captures phonetic variability |
---|
0:20:18 | i am not capturing everything and every session but these are sort of uh to get you what motivated to |
---|
0:20:23 | look at gender trends and ideas and bring in |
---|
0:20:26 | ideas from other feels that but perhaps help acoustic modeling better |
---|
0:20:30 | uh we did see a lot of like linear models for covariance model |
---|
0:20:34 | and particularly this time we set some work on a uh or of a lap speech detection |
---|
0:20:39 | and non-audible audible but detection which is useful in uh |
---|
0:20:43 | situations adjust just monitoring in the public domain |
---|
0:20:46 | uh the set relevant sessions are acoustic modeling one and two |
---|
0:20:54 | um |
---|
0:20:55 | that first session some speech synthesis |
---|
0:20:57 | so this is just a very brief summary and speech synthesis |
---|
0:21:00 | uh uh we sell a focus on well that two categories and synthesis hmm based in concatenative uh unit selection |
---|
0:21:07 | based tts |
---|
0:21:08 | a a bunch of the like on hmm based synthesis focused mainly on the underlying parameterization majorization and do construction |
---|
0:21:15 | and that included a work on X duration modeling |
---|
0:21:19 | how you can incorporate this technology and embedded system |
---|
0:21:22 | a a so impact of machine translation i meaning the number of errors the translation system makes and the fluency |
---|
0:21:29 | of the output the impact that has on speech synthesis |
---|
0:21:33 | um that tying like that of parameter estimation for hmms this is was also there |
---|
0:21:38 | uh i think that that of section we saw work on a prosody prediction how you can do better prosody |
---|
0:21:44 | prediction how you can do better uh annotation of pitch axe |
---|
0:21:49 | uh uh we also saw a new constraints being introduced used for unit selection in concatenative tts systems |
---|
0:21:56 | and the but all the relevant sessions are listed yeah there but also a few posters on |
---|
0:22:01 | in the machine dining section a speech and audio applications that cover synthesis |
---|
0:22:05 | so that's basically a broad overview i have for asr and since |
---|
0:22:14 | i know or |
---|
0:22:15 | no no seen over three hundred papers and thirty minutes |
---|
0:22:18 | a note |
---|
0:22:19 | maybe feel like a fire hose just at you |
---|
0:22:22 | um |
---|
0:22:23 | i about to see we could try to generate a few questions uh from folks |
---|
0:22:27 | um i will so that uh we use speech language to C |
---|
0:22:31 | uh we do put a newsletter news letter of any of you of of the local up through the paper |
---|
0:22:35 | a speech or you |
---|
0:22:36 | in this i care as we do reach a goal of we |
---|
0:22:40 | uh you know what dress for um |
---|
0:22:42 | uh all all of papers and speech and language area our group |
---|
0:22:46 | uh and and back to or |
---|
0:22:48 | uh a news letter or you mail lists so |
---|
0:22:51 | you good or a regular copy your of |
---|
0:22:54 | the newsletter for mark to C |
---|
0:22:56 | and we will include uh |
---|
0:22:57 | uh links to kind of down be slides if you like to get a copy of the |
---|
0:23:02 | right |
---|
0:23:03 | so can i you for the any questions here |
---|
0:23:07 | the river and may have to make a on its work |
---|
0:23:17 | and a can have the speakers |
---|
0:23:21 | no |
---|
0:23:21 | to to to get a |
---|
0:23:22 | i don't get all |
---|
0:23:24 | of of a three or four years ago |
---|
0:23:27 | speech technical to can be cut of organised itself so |
---|
0:23:31 | a a more |
---|
0:23:32 | text for you know spoken language so a is spoken language not your text |
---|
0:23:37 | alright spoken like |
---|
0:23:39 | or text |
---|
0:23:40 | processing a to spoken language processing from |
---|
0:23:43 | to try to sort of try those papers from your is you but they were generally going to |
---|
0:23:49 | also circle |
---|
0:23:51 | oh it's what's what's a room um |
---|
0:23:54 | uh |
---|
0:23:55 | solution is to actually things like a spring |
---|
0:23:59 | a are a put up your part of like are there more |
---|
0:24:04 | are more |
---|
0:24:05 | so if we more |
---|
0:24:06 | a set of the paper is that what is going to |
---|
0:24:09 | is your car |
---|
0:24:10 | he's coming here |
---|
0:24:11 | so i think we have a for about a hundred and term papers and spoken language should be push you |
---|
0:24:17 | or it's been sitting in the last two or three use roughly new were for about eighty two |
---|
0:24:21 | a a a little over a hundred and weeks a roughly you know average word forty |
---|
0:24:25 | six forty two to you percent of the people |
---|
0:24:29 | um close some of the work or uh but is presented in spoken language start also go to use em |
---|
0:24:34 | so i |
---|
0:24:35 | but it brings in more folks |
---|
0:24:37 | for um |
---|
0:24:37 | from that |
---|
0:24:38 | community so to speak |
---|
0:24:40 | um |
---|
0:24:40 | i just several also so that uh |
---|
0:24:43 | and the speech technical committee meeting we had on wednesday |
---|
0:24:47 | um um |
---|
0:24:48 | uh uh be up to short frames from the trains actions the number of paper submitted a are in spoken |
---|
0:24:53 | language was increase significantly uh spent a huge increase of the number of submissions |
---|
0:24:59 | and uh uh page count is actually local realms of some of you have |
---|
0:25:03 | a people sitting in volume ninety nine you'll know what i mean |
---|
0:25:07 | that's kind of a or or or going uh volume and are we can kind to do more |
---|
0:25:11 | a you to a kind of a or to the people but the us work there was a lot of |
---|
0:25:15 | people kind of coming in |
---|
0:25:17 | oh i a series |
---|
0:25:21 | of request |
---|
0:25:27 | no one see me was on in bands of sorry |
---|
0:25:30 | question |
---|
0:25:32 | uh so on from |
---|
0:25:34 | we use |
---|
0:25:36 | row |
---|
0:25:37 | i |
---|
0:25:38 | for |
---|
0:25:41 | sure |
---|
0:25:42 | no |
---|
0:25:44 | i |
---|
0:25:46 | i you |
---|
0:25:48 | so are was so i could so |
---|
0:25:51 | or do some channels as i'm not sure if you want to use one |
---|
0:25:53 | so |
---|
0:25:54 | for a was from the speech so it uh |
---|
0:25:57 | i think when people are looking at a uh there's a lot more work now you know real data |
---|
0:26:03 | uh and so what can you broadcast news uh working a real was to go search |
---|
0:26:08 | in you to write and videos an audio bits sparse of on the web |
---|
0:26:12 | um there's a lot more a you know play between music and speech |
---|
0:26:16 | um |
---|
0:26:17 | there was we used one people looking at speaker I D language i D |
---|
0:26:21 | uh in multiple languages |
---|
0:26:23 | uh |
---|
0:26:24 | with people singing |
---|
0:26:25 | uh and and so for a but we can use the past |
---|
0:26:28 | although occurrence but we're also seen uh |
---|
0:26:31 | yeah the morphing or transformation ear |
---|
0:26:33 | um |
---|
0:26:34 | i so not in this conference spoken in a pretty loose are cars or |
---|
0:26:38 | someone one to work a music video or a pop artist |
---|
0:26:41 | uh a in english and more of to than to spanish |
---|
0:26:44 | um |
---|
0:26:45 | and it would be sound of flawless and the of grammar |
---|
0:26:48 | really |
---|
0:26:49 | where is power being in english |
---|
0:26:51 | uh and to know spanish but you couldn't sell |
---|
0:26:54 | was really good |
---|
0:26:55 | so i i think you saying a lot of movement now |
---|
0:26:58 | a some of the tools of there exist for speech recognition |
---|
0:27:01 | a a speaker or do you'd are resolution and so forth |
---|
0:27:04 | trying to draw some challenges and music because a lot of a more realistic beater |
---|
0:27:08 | uh a the folks for getting access to |
---|
0:27:11 | uh have music in |
---|
0:27:12 | has become a big bit channel |
---|
0:27:15 | uh actually pitch tracking of this the speech analysis side |
---|
0:27:18 | pitch tracking where there's music is a real tough thing to work out |
---|
0:27:22 | and uh are some we'll folks that are or have been working about or |
---|
0:27:27 | and a quick comments i think a a do see a lot of people as time to move music um |
---|
0:27:32 | i have an face |
---|
0:27:35 | a a couple use but one of the big challenges was the computing speaker problem |
---|
0:27:40 | you have one person walking on another person |
---|
0:27:43 | no words |
---|
0:27:44 | music working on someone else and being or would try to suppress that's to try to the recognition for |
---|
0:27:50 | a |
---|
0:27:52 | question number |
---|
0:27:55 | i just got glasses i think that's for |
---|
0:27:58 | a a a we are just uh reading some common to the previous |
---|
0:28:02 | commons all this that as the single processing as speech and also |
---|
0:28:06 | recognition of of this is is is |
---|
0:28:08 | i to C |
---|
0:28:09 | personally actually a a of view myself a as a a how core signal processor |
---|
0:28:14 | so i E C is processing |
---|
0:28:16 | so the regardless whether is a speech or |
---|
0:28:19 | so is is |
---|
0:28:20 | actually a the models of research asia and i do have a lot quite of few about colleagues such as |
---|
0:28:25 | a professor |
---|
0:28:26 | so that almost |
---|
0:28:27 | oh from a uh a talk university |
---|
0:28:29 | is |
---|
0:28:30 | well as like a working on in both to maze |
---|
0:28:34 | and do we treated music either the instrumental music well |
---|
0:28:38 | well can |
---|
0:28:40 | yeah yeah as |
---|
0:28:41 | a all or or or on a interest or applications mary rules |
---|
0:28:46 | uh the |
---|
0:28:47 | oh we do all |
---|
0:28:48 | speech this this is |
---|
0:28:50 | we use a T T as |
---|
0:28:52 | the knowledge |
---|
0:28:54 | particularly these there is a and then the has for whether or |
---|
0:28:57 | and so not only just bridging the gap between the traditional |
---|
0:29:01 | uh you know |
---|
0:29:03 | concatenation or unit selection based since this is |
---|
0:29:06 | but right now the hmm a since this is |
---|
0:29:09 | a a a or or some people close the hybrid synthesis is but actually my opinion is really just T |
---|
0:29:16 | but the the whole a statistical and sample based uh the rent or E |
---|
0:29:21 | a as the |
---|
0:29:23 | as a |
---|
0:29:24 | holistic approach to the holes is this is or render E |
---|
0:29:28 | processes |
---|
0:29:29 | so T |
---|
0:29:30 | uh |
---|
0:29:31 | we you is uh a T T |
---|
0:29:33 | to do single is |
---|
0:29:35 | but only the knowledge and the we just try to say |
---|
0:29:38 | you you the out of E |
---|
0:29:40 | given the |
---|
0:29:41 | re quote speech material to real |
---|
0:29:44 | can we sing as cell |
---|
0:29:45 | yes the uh we had done that and that only you has a noisy in |
---|
0:29:49 | i saw quite a few people are quite a few research researchers |
---|
0:29:52 | uh |
---|
0:29:53 | really working but did shall in that direction |
---|
0:29:56 | and but |
---|
0:29:56 | polyphonic sink the pitch tracking |
---|
0:29:59 | which is a a really a |
---|
0:30:02 | i i i been but rather where be ever so i'll |
---|
0:30:05 | but the is definitely posing a a a a a kick take advantage lunch |
---|
0:30:10 | and interest to |
---|
0:30:11 | that the |
---|
0:30:13 | signal the processor |
---|
0:30:14 | speech researcher or or music researchers |
---|
0:30:17 | and the analysis |
---|
0:30:20 | analysis is that of the recognition uh uh again um probably just as a matter |
---|
0:30:25 | come motion say |
---|
0:30:27 | that is that |
---|
0:30:28 | used to be |
---|
0:30:29 | just say |
---|
0:30:30 | recognition commission |
---|
0:30:32 | the next the or to understanding |
---|
0:30:35 | and the the car we just of a speech synthesis |
---|
0:30:39 | uh |
---|
0:30:39 | session and maybe the advertisement houseman |
---|
0:30:42 | so that they |
---|
0:30:44 | but but uh in as there a speech synthesisers or speech this is a researcher or because |
---|
0:30:50 | the whole understand understanding |
---|
0:30:52 | to close the speech chain |
---|
0:30:54 | we do need a a good small out |
---|
0:30:57 | express a |
---|
0:30:58 | speech this is is too |
---|
0:31:00 | so uh |
---|
0:31:01 | to put a quick summary i personal out to C |
---|
0:31:05 | there's is the boundary between the nose |
---|
0:31:08 | and speech |
---|
0:31:09 | and the of the common to as statistical modeling |
---|
0:31:13 | the |
---|
0:31:14 | the sample both |
---|
0:31:16 | uh |
---|
0:31:17 | uh a really |
---|
0:31:18 | uh |
---|
0:31:20 | every using a it's a really is really just merging was each other in kind of the seamless model |
---|
0:31:27 | thanks for |
---|
0:31:28 | but the uh when you think about better recognition |
---|
0:31:31 | maybe a couple years ago a general perception is that since you can buy commercial be used |
---|
0:31:35 | speech recognition products in the field |
---|
0:31:38 | that some people perceive that it's solved |
---|
0:31:40 | uh but in fact were there's a huge challenges i think |
---|
0:31:44 | you will do to |
---|
0:31:46 | or four |
---|
0:31:47 | much more realistic data in the field make or recognition much more challenging to do |
---|
0:31:52 | and frank's comments on the synthesis part as is |
---|
0:31:55 | right on target when you look could be general usual population |
---|
0:31:58 | and the use of dialogue system |
---|
0:32:00 | uh studies have shown that |
---|
0:32:02 | uh the perception of how group the dialogue system use |
---|
0:32:06 | a is to a large extent related to the quality of the synthesized voice that you're interacting were |
---|
0:32:11 | uh |
---|
0:32:12 | hidden behind errors where are the recognition or rate |
---|
0:32:15 | uh is used to hard kind of recover from a lot of ground already recently |
---|
0:32:20 | uh or or uh looking at a rubber processing approach |
---|
0:32:24 | or the questions or comments |
---|
0:32:28 | we have gone or thirty minutes per |
---|
0:32:30 | should we have a we're three had papers here so we should have some more course |
---|
0:32:35 | so to to make a put us or for the use or you are workshop |
---|
0:32:40 | um |
---|
0:32:41 | it's it's motion some this are we were to the you training and maybe not as a a a sunny |
---|
0:32:46 | it's it's a sony and or |
---|
0:32:48 | yeah |
---|
0:32:49 | uh to the weather is great there and or |
---|
0:32:52 | a great opportunity to come uh |
---|
0:32:54 | follow up on some of the topics that you you've see that risk |
---|
0:32:58 | and everyone gets a little |
---|
0:33:00 | a a sort of of flowers that we could |
---|
0:33:03 | oh the comments or questions |
---|
0:33:07 | well with you for a a block to make the last pitch of view are for interest in being involved |
---|
0:33:12 | in the speech language stuck to can please |
---|
0:33:14 | a a contract one of pretty members there's or we're fifty members |
---|
0:33:18 | a a if you do the web for a you news later |
---|
0:33:21 | um blue fine there's a number of of topics that are are to if you were advertising for jobs or |
---|
0:33:26 | uh trying to to record folks so there's an online |
---|
0:33:30 | uh jobs posting your we're |
---|
0:33:33 | i i don't know lesson |
---|
0:33:34 | well S |
---|
0:33:34 | may later i put a little chunk can be on |
---|
0:33:37 | uh what represents a a grand challenge of a speech and language field and i think |
---|
0:33:42 | uh there's been a lot of talk curve |
---|
0:33:44 | in terms of energy and health care |
---|
0:33:46 | uh i was grand challenges |
---|
0:33:48 | speech and language arose |
---|
0:33:50 | uh the one of the most |
---|
0:33:52 | input mass perks when you work could society and interacting with folks |
---|
0:33:56 | a speech-to-speech translation some of the big advancements in this area |
---|
0:34:00 | uh will allow people to communicate more efficiently and reduce barriers between people so |
---|
0:34:05 | speech of mine which very important should represent a one of a grand challenges well |
---|
0:34:11 | if there are no more comments are will close the session and think are we |
---|
0:34:16 | uh |
---|