0:00:13bush water on
0:00:15don't ask me to carry in one
0:00:19but okay no shot and so uh but and for uh
0:00:24three sessions
0:00:26um
0:00:27for
0:00:28after lunchtime
0:00:30uh i think we're
0:00:32as
0:00:32because shall i we to and of about five to six o'clock
0:00:36and family
0:00:37and took pictures of uh of the poster sessions
0:00:40and the sessions that had
0:00:43i two people at the end
0:00:45i and of speech and language processing so people in speech and language or or or or or or is
0:00:50dedicated to kind of stay to the yeah so
0:00:52i can much for a for coming to it
0:00:55so uh uh i just go
0:00:56inter
0:00:57so uh i was to close have to the
0:00:59first just a couple
0:01:01about that the suspension line which technical can
0:01:04most of you have a saying that one can had a like a icassp i my above ten
0:01:09to a can use
0:01:11for special which to
0:01:12i the low just to fifty three members the
0:01:15uh a a for the notion of a should
0:01:18i because we have a a large number of people per it's made at a cost so
0:01:23still
0:01:24uh sum of about some papers
0:01:27a a a a a a a "'cause" spanish trapped on the last
0:01:31a a couple of hours passed because
0:01:33we have a separate a constant of filter
0:01:36a that focus on which much processing and and spend increasing a rather significant way
0:01:42a
0:01:43uh
0:01:43the paper try since you can can
0:01:45and
0:01:46initial
0:01:47we have we about a the of the papers that that's except at i cast
0:01:52and suspension which processing field and so
0:01:54uh and discussed like to cover
0:01:58uh i seven hundred submissions and a three and four are some papers
0:02:03i in for it's when five we could just one thing
0:02:06a session of those and that were to thirty minutes by itself
0:02:11um
0:02:12but we just them not to do that
0:02:14okay
0:02:14uh that's true
0:02:16a uh i was spent
0:02:19a under a giant
0:02:22a a but uh
0:02:24a a a a from a a a a has a big impact of interest
0:02:28was not able to attend a of the input here so
0:02:32a lot of uh chin folks kinda again
0:02:35um we encourage questions is or we're going to have to kind of try to cover to to uh
0:02:40uh uh i have tried to make sure that are we get to have a microphone well so
0:02:45uh i think
0:02:46german one going for
0:02:48okay
0:02:50um um be very quickly so if anything
0:02:54i white he's
0:02:55um if anything i set wrong or a miss anything point to where
0:02:58and i you need three
0:03:00to Q is set
0:03:01just
0:03:01i'm i
0:03:03why is that
0:03:04a because there is a were three hundred paper a C to really hard
0:03:07uh to goes through a them them a summarise be case
0:03:10section if die on it up now of a see that yeah
0:03:13um B
0:03:16um this is
0:03:17you thank you people like couple that that talk to try to get a job of U
0:03:21um so a is three hundred twenty five paper is uh it is true of an them are on speech
0:03:26and it seventy five non on the language processing
0:03:29oh according to how the conference
0:03:31i sign of me to different section is the be case is arguable second papers
0:03:34can be a both
0:03:36um
0:03:37so they they are to part out wall um i what cover an now one handed out well not
0:03:42uh the um
0:03:43uh they or departing in language processing and uh and speech processing by not T T S and a a
0:03:48and this speech
0:03:49and that will a cover to speak i D uh including a speech
0:03:54a speaker verification and recognition and test speaker tiring is
0:03:58diarisation
0:03:59and and and up yeah what touched they speech nice thing has to
0:04:03and the little B
0:04:05and so this is a language modeling
0:04:07um on their right top of the the to show shows they number of papers on the big field F
0:04:11i of the language processing
0:04:13so uh a couple of things worth mentioning here is that a for X Y the at the the like
0:04:18the model um a a a model M based exponential model um the class based in then your a network
0:04:23model a language model non spam model it a dynamic language model adaptation
0:04:28um discriminate models
0:04:31and uh i think there many others just a couple
0:04:33and i as i C O goes through a paper is so what's common in the
0:04:37um the um
0:04:39uh the a cup of the a sum up paper on computing
0:04:42um uh optimize asian time to do you know how to
0:04:45train a language model a large scale data uh so it distributed uh ch training a fast
0:04:51you are not recommended model training training how to manage in long span
0:04:54and there is a common um comic data set people work
0:04:58and uh spoken document a processing here the task it try to do to documents some summarization classification a speaker
0:05:05role identification um give a kind approach is a typical of machine learning
0:05:09um at the A C rap and now the motion any classification and reasons
0:05:13and the uh and translation and the semantic a classification or set
0:05:17um two sessions that route paper is on this topic
0:05:20including a um the standing you i search though um different paper sick probably use different times to by or
0:05:27of folks out how to use a lot the how to uh um and ten to carries for search
0:05:32and then and that there is a um
0:05:34well i'm paper use in T B N of a i think it you see lots of paper some T
0:05:38B the used to be a language
0:05:40um uh and is standing this the for car car routing
0:05:43um a speech translation uh cab are saying how you can and tied that
0:05:48a speech recognition and south
0:05:50um and translation together whether a yes are a word accuracy it's not probably is not good metric of four
0:05:56for uh um for speech translation
0:05:59um the bilingual audio uh subtitle extraction there's
0:06:02many others i think a
0:06:04i probably didn't not list L
0:06:07um
0:06:08uh power linguistic in an a linguist give features
0:06:11uh did this are very interesting and be case
0:06:14you could think case of what to you can you know for a from speech language the motion detection
0:06:18um recognising and lexical but yes and now
0:06:21um the cognitive load a correct classification a
0:06:25um trying to
0:06:26um um i to guess
0:06:28when there is a one to talk to compute a to compute trying to guess you know how much you
0:06:31are thinking
0:06:32um
0:06:34the perceptual difference so four innings this if for language learning be or speech at things that says
0:06:39um you generating a a traffic trucks pressure
0:06:41it's are on this stick to topic
0:06:43um some of them operating new
0:06:46um spoken term and recognition
0:06:48it's trying to um now um you know given the huge uh
0:06:52each file was so audio file was or of video trying to be trying a list of
0:06:55of spoken utterances giving a voice query boast term which you you just speak
0:07:01um the approach is are of dynamic whopping sub word recognition rate to one graph based approaches
0:07:07um there's comedy a set out from this
0:07:12and the dialogue um
0:07:13there are um this is a i we don't have
0:07:16for this calm face and there's only five people
0:07:19well i to but i mean to dialogue um by a uh uh
0:07:22you know a at the train is before you know if you look a back
0:07:25a couple years ago probably there not many approaches are the disk approach
0:07:28now is most papers to focus on this disc of there two
0:07:32oh fights
0:07:33wise you you you track a distribution of all the all the possible state
0:07:37the second part is the being from an any you put to again it become palm to P i think
0:07:42a um uh there are several papers on the conference on problem
0:07:46um this so
0:07:47there's a events this is so specifically so we're not go through those
0:07:51um language identification um
0:07:54yeah six a a a a a a paper is an in and one session
0:07:58they skate trying use phonetic a prosodic feature is the combination of them
0:08:02and and did to do that to to identified the language if you look and approach um how to do
0:08:07at it's a a you know um use a classification
0:08:10uh i i can i seek up a paper sound logistic regression in and grams
0:08:14there's a set to or was the same data
0:08:17or this uh
0:08:19a trying you guess the language what use thing
0:08:21um a lexicon modeling is trying to use the much line tight to automatically generate the pronunciation from the from
0:08:28the given word
0:08:29um that you to be there is a couple question lining uh and here have
0:08:33uh approach is introduced in that
0:08:35um are to lingo on a multichannel processing
0:08:38it task you is that you know you have a mixed them and to each input how do you
0:08:42uh to the asr um to uh index and search and um what mac thing you can set up
0:08:48the approach the very died verse by a so i did not list them on here
0:08:51um
0:08:52speech analysis this a few i really um don't know much so i uh
0:08:55um
0:08:57just trying to cover what topics that were kind motion detection
0:09:00um
0:09:01i you know on sing this level you kind to
0:09:03um
0:09:05it
0:09:05sec come up if the change you can see today the relationship between motion in F zero range
0:09:10um
0:09:10you motion you know including detect the anger and so and so far
0:09:14and duration modeling for for log block of account for zero
0:09:18um um yes are peachy frequency estimation so on
0:09:22um
0:09:23the approach is i i i sees you know couple of things probably um not new buys the comic class
0:09:28papers
0:09:29um singularity generic the i have a the phase locked loops
0:09:33and there is coming a set on that
0:09:35um
0:09:36as i said this is good i don't know much so if for the second chair is our or would
0:09:40know this a better place a calming know what's
0:09:42what's what's missing here that i didn't cover
0:09:45uh speech enhancement
0:09:46um
0:09:48a a time know the task it trying to just
0:09:50as separate a speech of versus not speech and noise
0:09:53um
0:09:55a there is
0:09:56okay
0:09:57um you can be the slides but uh there's is i i think i can i at
0:10:00compared to produce the conference is is a be more apt it's a music noise
0:10:05um
0:10:06there as many approaches here um
0:10:09you know
0:10:09somehow of mark and tuning to in uh
0:10:12they well no poke just like when you filtering in who the calm through train C C
0:10:17um have a long list T here i thing i don't have time goes through an all
0:10:22and uh
0:10:23you can vacation
0:10:24um we have over all there's a forty eight paper on this topic in clean
0:10:29speaker diarization and um but i think that we more than previous to conferences probably one the reason
0:10:36um i don't know what is the relevant to the nist to be recognition evaluation
0:10:40um
0:10:42a crop but he's through the
0:10:44the paper is there is a
0:10:46a couple of things just
0:10:47just highlights i think E
0:10:49very very is hard to summarise
0:10:51um i back to space and uh a probabilistic lda
0:10:54and uh the evaluation papers the from nist R
0:10:57are the are used to use in in a fusion that if you would us several uh speaker recognition system
0:11:04to you fuse the results
0:11:05um
0:11:08okay second one here speaker that issuing is first if who
0:11:11spoke when you in audio stream i'm meeting
0:11:14um
0:11:15yeah
0:11:16a just a which is a second and this into
0:11:17uh top down bottom class three
0:11:20um
0:11:20how to uh exposed features be close to give features there's is by new keys approach
0:11:25um there uh information bottleneck the based approach the couple course quite
0:11:29i knew hence
0:11:30is is me on this field
0:11:32um you bass so S I can in this uh
0:11:35um
0:11:36there's lots of people is here i search will miss something
0:11:40um the so i put i mean you several kind can't three the first one uh processing
0:11:44and signal up i think set compressed the sensing
0:11:46i you can use a compressed sing on on the other parts of a by S are two
0:11:50um now net to magic a factorization
0:11:53um how to use of a to transform in that the spectrum
0:11:57and then there other approaches i have give a long list to here
0:12:00and
0:12:01a feature is so how to and you know lots of features six we shouldn't based ten antenna
0:12:06um
0:12:07there's say a you a of the cup of papers how to use T N to two genders the tandem
0:12:11features
0:12:12um
0:12:13logistically smart mapping
0:12:15and noise it's feature normalization
0:12:17uh there is a a and different model
0:12:19um
0:12:20um i is
0:12:22quite diapers
0:12:23a collection
0:12:24so i don't think of will we can
0:12:26um
0:12:27maybe after this of can put a slice somewhere where thing wants to take a look
0:12:31um
0:12:32given at a wall and you know worked
0:12:41a i'll try to cover
0:12:42can everyone hear me
0:12:44so i like to cover all of the uh papers that but generally included in large vocabulary speech recognition and
0:12:51acoustic modeling and adaptation technique
0:12:53um can any that were a lot of a you can see
0:12:56the asr lie
0:12:58and so we try to split it in a manner that matched well with the sessions so let's for start
0:13:03with adaptation
0:13:04the problem here is basically to say how well can you adapt your existing models
0:13:09do is a specific speaker or environment
0:13:12and the most recent trend we been seeing is how can you and force sparsity or structure on the transforms
0:13:18we line
0:13:19and how can you do a better optimization
0:13:21now in general the ideas that have been floating it on in this field include discriminative transforms
0:13:27how can to find something that will learned rapidly or rapidly adapt to minimal amounts of data
0:13:33and so and now you see these things are adapted to more real well tasks such as a waste H
0:13:40and you starting to see as some impact from one of these techniques and this new of data
0:13:45and we did see some for is now on a rapid adaptation for uh like is said to a what
0:13:50test
0:13:51and how you can include um
0:13:53convex optimization methods in situations where your objective function is not convex anymore
0:13:58i if you want to read more about uh these bit where as i've listed that element section here
0:14:06that was not a
0:14:06good idea job
0:14:18we we have small problem
0:14:20and so i i think that do modeling now yeah are modeling was split the as many many sessions
0:14:26uh basically but all talking about statistical modeling of speech signals yeah
0:14:30uh the more recent trends have been along the line of how can i use machine learning technique
0:14:36in large vocabulary speech recognition we all know they were on certain class of problems like envision and handwriting recognition
0:14:43uh uh which are
0:14:45really difficult but but like i have small do it is sets so we're not looking to see how we
0:14:49can apply an and these techniques to speech problems
0:14:53and a lot what that comes the task of speeding up these learning algorithms to deal with large quantities of
0:14:58data
0:14:58and that guy here we saw more applications to real well uh tasks
0:15:02and including die play evaluation
0:15:05yeah i this like yeah your some of the i D as we saw most if you are familiar with
0:15:09these things
0:15:10um
0:15:12i i'll some of the key components yeah why at a a a we saw some papers and capturing long
0:15:16H on that
0:15:18uh critically more use of this do you has to like clean either an hmm framework or in other forms
0:15:23of coding
0:15:24uh a how can you use the psd there to type from class classifiers intelligently maybe we've using them in
0:15:30deep belief nets or maybe they are using them directly the hmm framework
0:15:34a a can you intelligently like acoustic units
0:15:37whether that it's for english or any other form of language
0:15:41and do you'll have enough data and now to pick these acoustic units which didn't white the before
0:15:45um also we have seen some papers that use language id accent and dialect identification in incorporating them to improve
0:15:53speech recognition accuracy
0:15:54so you a bunch of a as people are working on
0:15:57um we want to see in some recent interesting wake on uh last functions and busting methodologies that improve the
0:16:04quality of the classifiers of the learners an acoustic model
0:16:08uh this this particular yeah that that the meant in the the section title modeling for a a
0:16:22uh moving on a at but to why sessions which covered acoustic modeling these line it's topics and statistical that
0:16:28that
0:16:28and these do fall under the category of general asr type problem
0:16:32um
0:16:33that was some more i yes yeah which include complex models
0:16:37which include long spend board language modeling an acoustic modeling technique
0:16:41uh we see some applications of C i i have to be so multiple stream the nation's
0:16:46um a i thought is an using this D D as as some sort of any then to that there
0:16:50and thinking out how to model these posterior
0:16:53a a a a few a where is that where a uh uh derivation from the johns hopkins workshop which
0:16:58is that every summer focused john
0:17:00how you can use some of these posteriors in some sort of a segment of framework
0:17:05a a more recently if you see what the training is
0:17:08uh we see a lot of
0:17:10now and and sparse representations example are based methods
0:17:14how you can capture higher-order statistics using deep belief networks
0:17:18um you have a point process models are can you to spectro-temporal patterns
0:17:23uh i so we are saying a wide range of novelty here in this field
0:17:31uh a continuing on and modeling which is also included in discriminative techniques for asr
0:17:37a of the is you was mostly on how can i use just limit of training for both acoustic model
0:17:42as well as for adaptation
0:17:44uh i we saw some papers on training full covariance models
0:17:47uh we also saw a if you break it down into specific to saw some feature selection voices
0:17:53better are like it is the in your model parameters that was interesting
0:17:56and people also to present a different kinds of training criteria do you use an objective function that models see
0:18:02what at a rate or do you use an objective mark function that model something else related uh
0:18:07to to the likelihood or the ad or in some computer in some other fashion
0:18:16um the last session that a cover on asr was uh a tight to large vocabulary speech recognition
0:18:22uh the focus it was mostly and bowling large systems uh large systems for the galley value evaluation in different
0:18:28languages
0:18:29and that are also if you like six systems that were built on real world tasks
0:18:33and
0:18:34so of the key idea here are how can you exploit large quantities of unlabeled data and to the class
0:18:40of unsupervised training
0:18:41a a do you use better methods for lattice based training
0:18:45uh we also saw that's a is the best farming techniques and algorithms for building acoustic and language models
0:18:51and typically we and then like in tasks like mandarin then a big which were part of the gale evaluation
0:18:57oh also system combination strategies played an important role
0:19:01uh we also some that that it's to do unit selection
0:19:04particularly in language just like a man and polish we sell some methods to improve the quality of transcripts when
0:19:10you're don't have a uh manually transcribed data
0:19:13how you can improve the performance of your acoustic models of the training by
0:19:18getting better transfer
0:19:20uh that was a like a on in of decoding schemes to better optimize memory consumption and to make things
0:19:27go faster
0:19:28and B so a large presence of deep belief networks all over the place
0:19:38which still anyway somewhat
0:19:39and we saw lots of papers on acoustic modeling out so
0:19:43um
0:19:44this you can break it down into some a couple of that as
0:19:47one which includes or tended to features for hmms in addition to traditional mfccs and plps
0:19:53and the other in the modeling paradigms itself we saw a lot of but are starting from from recognition to
0:19:59lvcsr
0:20:00a a a a a few things to point out we saw energy based feature
0:20:05a lot of articulatory trajectories a hot can you do it uh include nonstationary features term and page for set
0:20:11languages
0:20:13uh we saw some efficient parameter estimation that captures phonetic variability
0:20:18i am not capturing everything and every session but these are sort of uh to get you what motivated to
0:20:23look at gender trends and ideas and bring in
0:20:26ideas from other feels that but perhaps help acoustic modeling better
0:20:30uh we did see a lot of like linear models for covariance model
0:20:34and particularly this time we set some work on a uh or of a lap speech detection
0:20:39and non-audible audible but detection which is useful in uh
0:20:43situations adjust just monitoring in the public domain
0:20:46uh the set relevant sessions are acoustic modeling one and two
0:20:54um
0:20:55that first session some speech synthesis
0:20:57so this is just a very brief summary and speech synthesis
0:21:00uh uh we sell a focus on well that two categories and synthesis hmm based in concatenative uh unit selection
0:21:07based tts
0:21:08a a bunch of the like on hmm based synthesis focused mainly on the underlying parameterization majorization and do construction
0:21:15and that included a work on X duration modeling
0:21:19how you can incorporate this technology and embedded system
0:21:22a a so impact of machine translation i meaning the number of errors the translation system makes and the fluency
0:21:29of the output the impact that has on speech synthesis
0:21:33um that tying like that of parameter estimation for hmms this is was also there
0:21:38uh i think that that of section we saw work on a prosody prediction how you can do better prosody
0:21:44prediction how you can do better uh annotation of pitch axe
0:21:49uh uh we also saw a new constraints being introduced used for unit selection in concatenative tts systems
0:21:56and the but all the relevant sessions are listed yeah there but also a few posters on
0:22:01in the machine dining section a speech and audio applications that cover synthesis
0:22:05so that's basically a broad overview i have for asr and since
0:22:14i know or
0:22:15no no seen over three hundred papers and thirty minutes
0:22:18a note
0:22:19maybe feel like a fire hose just at you
0:22:22um
0:22:23i about to see we could try to generate a few questions uh from folks
0:22:27um i will so that uh we use speech language to C
0:22:31uh we do put a newsletter news letter of any of you of of the local up through the paper
0:22:35a speech or you
0:22:36in this i care as we do reach a goal of we
0:22:40uh you know what dress for um
0:22:42uh all all of papers and speech and language area our group
0:22:46uh and and back to or
0:22:48uh a news letter or you mail lists so
0:22:51you good or a regular copy your of
0:22:54the newsletter for mark to C
0:22:56and we will include uh
0:22:57uh links to kind of down be slides if you like to get a copy of the
0:23:02right
0:23:03so can i you for the any questions here
0:23:07the river and may have to make a on its work
0:23:17and a can have the speakers
0:23:21no
0:23:21to to to get a
0:23:22i don't get all
0:23:24of of a three or four years ago
0:23:27speech technical to can be cut of organised itself so
0:23:31a a more
0:23:32text for you know spoken language so a is spoken language not your text
0:23:37alright spoken like
0:23:39or text
0:23:40processing a to spoken language processing from
0:23:43to try to sort of try those papers from your is you but they were generally going to
0:23:49also circle
0:23:51oh it's what's what's a room um
0:23:54uh
0:23:55solution is to actually things like a spring
0:23:59a are a put up your part of like are there more
0:24:04are more
0:24:05so if we more
0:24:06a set of the paper is that what is going to
0:24:09is your car
0:24:10he's coming here
0:24:11so i think we have a for about a hundred and term papers and spoken language should be push you
0:24:17or it's been sitting in the last two or three use roughly new were for about eighty two
0:24:21a a a little over a hundred and weeks a roughly you know average word forty
0:24:25six forty two to you percent of the people
0:24:29um close some of the work or uh but is presented in spoken language start also go to use em
0:24:34so i
0:24:35but it brings in more folks
0:24:37for um
0:24:37from that
0:24:38community so to speak
0:24:40um
0:24:40i just several also so that uh
0:24:43and the speech technical committee meeting we had on wednesday
0:24:47um um
0:24:48uh uh be up to short frames from the trains actions the number of paper submitted a are in spoken
0:24:53language was increase significantly uh spent a huge increase of the number of submissions
0:24:59and uh uh page count is actually local realms of some of you have
0:25:03a people sitting in volume ninety nine you'll know what i mean
0:25:07that's kind of a or or or going uh volume and are we can kind to do more
0:25:11a you to a kind of a or to the people but the us work there was a lot of
0:25:15people kind of coming in
0:25:17oh i a series
0:25:21of request
0:25:27no one see me was on in bands of sorry
0:25:30question
0:25:32uh so on from
0:25:34we use
0:25:36row
0:25:37i
0:25:38for
0:25:41sure
0:25:42no
0:25:44i
0:25:46i you
0:25:48so are was so i could so
0:25:51or do some channels as i'm not sure if you want to use one
0:25:53so
0:25:54for a was from the speech so it uh
0:25:57i think when people are looking at a uh there's a lot more work now you know real data
0:26:03uh and so what can you broadcast news uh working a real was to go search
0:26:08in you to write and videos an audio bits sparse of on the web
0:26:12um there's a lot more a you know play between music and speech
0:26:16um
0:26:17there was we used one people looking at speaker I D language i D
0:26:21uh in multiple languages
0:26:23uh
0:26:24with people singing
0:26:25uh and and so for a but we can use the past
0:26:28although occurrence but we're also seen uh
0:26:31yeah the morphing or transformation ear
0:26:33um
0:26:34i so not in this conference spoken in a pretty loose are cars or
0:26:38someone one to work a music video or a pop artist
0:26:41uh a in english and more of to than to spanish
0:26:44um
0:26:45and it would be sound of flawless and the of grammar
0:26:48really
0:26:49where is power being in english
0:26:51uh and to know spanish but you couldn't sell
0:26:54was really good
0:26:55so i i think you saying a lot of movement now
0:26:58a some of the tools of there exist for speech recognition
0:27:01a a speaker or do you'd are resolution and so forth
0:27:04trying to draw some challenges and music because a lot of a more realistic beater
0:27:08uh a the folks for getting access to
0:27:11uh have music in
0:27:12has become a big bit channel
0:27:15uh actually pitch tracking of this the speech analysis side
0:27:18pitch tracking where there's music is a real tough thing to work out
0:27:22and uh are some we'll folks that are or have been working about or
0:27:27and a quick comments i think a a do see a lot of people as time to move music um
0:27:32i have an face
0:27:35a a couple use but one of the big challenges was the computing speaker problem
0:27:40you have one person walking on another person
0:27:43no words
0:27:44music working on someone else and being or would try to suppress that's to try to the recognition for
0:27:50a
0:27:52question number
0:27:55i just got glasses i think that's for
0:27:58a a a we are just uh reading some common to the previous
0:28:02commons all this that as the single processing as speech and also
0:28:06recognition of of this is is is
0:28:08i to C
0:28:09personally actually a a of view myself a as a a how core signal processor
0:28:14so i E C is processing
0:28:16so the regardless whether is a speech or
0:28:19so is is
0:28:20actually a the models of research asia and i do have a lot quite of few about colleagues such as
0:28:25a professor
0:28:26so that almost
0:28:27oh from a uh a talk university
0:28:29is
0:28:30well as like a working on in both to maze
0:28:34and do we treated music either the instrumental music well
0:28:38well can
0:28:40yeah yeah as
0:28:41a all or or or on a interest or applications mary rules
0:28:46uh the
0:28:47oh we do all
0:28:48speech this this is
0:28:50we use a T T as
0:28:52the knowledge
0:28:54particularly these there is a and then the has for whether or
0:28:57and so not only just bridging the gap between the traditional
0:29:01uh you know
0:29:03concatenation or unit selection based since this is
0:29:06but right now the hmm a since this is
0:29:09a a a or or some people close the hybrid synthesis is but actually my opinion is really just T
0:29:16but the the whole a statistical and sample based uh the rent or E
0:29:21a as the
0:29:23as a
0:29:24holistic approach to the holes is this is or render E
0:29:28processes
0:29:29so T
0:29:30uh
0:29:31we you is uh a T T
0:29:33to do single is
0:29:35but only the knowledge and the we just try to say
0:29:38you you the out of E
0:29:40given the
0:29:41re quote speech material to real
0:29:44can we sing as cell
0:29:45yes the uh we had done that and that only you has a noisy in
0:29:49i saw quite a few people are quite a few research researchers
0:29:52uh
0:29:53really working but did shall in that direction
0:29:56and but
0:29:56polyphonic sink the pitch tracking
0:29:59which is a a really a
0:30:02i i i been but rather where be ever so i'll
0:30:05but the is definitely posing a a a a a kick take advantage lunch
0:30:10and interest to
0:30:11that the
0:30:13signal the processor
0:30:14speech researcher or or music researchers
0:30:17and the analysis
0:30:20analysis is that of the recognition uh uh again um probably just as a matter
0:30:25come motion say
0:30:27that is that
0:30:28used to be
0:30:29just say
0:30:30recognition commission
0:30:32the next the or to understanding
0:30:35and the the car we just of a speech synthesis
0:30:39uh
0:30:39session and maybe the advertisement houseman
0:30:42so that they
0:30:44but but uh in as there a speech synthesisers or speech this is a researcher or because
0:30:50the whole understand understanding
0:30:52to close the speech chain
0:30:54we do need a a good small out
0:30:57express a
0:30:58speech this is is too
0:31:00so uh
0:31:01to put a quick summary i personal out to C
0:31:05there's is the boundary between the nose
0:31:08and speech
0:31:09and the of the common to as statistical modeling
0:31:13the
0:31:14the sample both
0:31:16uh
0:31:17uh a really
0:31:18uh
0:31:20every using a it's a really is really just merging was each other in kind of the seamless model
0:31:27thanks for
0:31:28but the uh when you think about better recognition
0:31:31maybe a couple years ago a general perception is that since you can buy commercial be used
0:31:35speech recognition products in the field
0:31:38that some people perceive that it's solved
0:31:40uh but in fact were there's a huge challenges i think
0:31:44you will do to
0:31:46or four
0:31:47much more realistic data in the field make or recognition much more challenging to do
0:31:52and frank's comments on the synthesis part as is
0:31:55right on target when you look could be general usual population
0:31:58and the use of dialogue system
0:32:00uh studies have shown that
0:32:02uh the perception of how group the dialogue system use
0:32:06a is to a large extent related to the quality of the synthesized voice that you're interacting were
0:32:11uh
0:32:12hidden behind errors where are the recognition or rate
0:32:15uh is used to hard kind of recover from a lot of ground already recently
0:32:20uh or or uh looking at a rubber processing approach
0:32:24or the questions or comments
0:32:28we have gone or thirty minutes per
0:32:30should we have a we're three had papers here so we should have some more course
0:32:35so to to make a put us or for the use or you are workshop
0:32:40um
0:32:41it's it's motion some this are we were to the you training and maybe not as a a a sunny
0:32:46it's it's a sony and or
0:32:48yeah
0:32:49uh to the weather is great there and or
0:32:52a great opportunity to come uh
0:32:54follow up on some of the topics that you you've see that risk
0:32:58and everyone gets a little
0:33:00a a sort of of flowers that we could
0:33:03oh the comments or questions
0:33:07well with you for a a block to make the last pitch of view are for interest in being involved
0:33:12in the speech language stuck to can please
0:33:14a a contract one of pretty members there's or we're fifty members
0:33:18a a if you do the web for a you news later
0:33:21um blue fine there's a number of of topics that are are to if you were advertising for jobs or
0:33:26uh trying to to record folks so there's an online
0:33:30uh jobs posting your we're
0:33:33i i don't know lesson
0:33:34well S
0:33:34may later i put a little chunk can be on
0:33:37uh what represents a a grand challenge of a speech and language field and i think
0:33:42uh there's been a lot of talk curve
0:33:44in terms of energy and health care
0:33:46uh i was grand challenges
0:33:48speech and language arose
0:33:50uh the one of the most
0:33:52input mass perks when you work could society and interacting with folks
0:33:56a speech-to-speech translation some of the big advancements in this area
0:34:00uh will allow people to communicate more efficiently and reduce barriers between people so
0:34:05speech of mine which very important should represent a one of a grand challenges well
0:34:11if there are no more comments are will close the session and think are we
0:34:16uh