0:00:06 | so |
---|
0:00:07 | hi everybody and |
---|
0:00:08 | but the high |
---|
0:00:10 | i i'm presenting the world |
---|
0:00:11 | on the use of jesse as yen for speaker diarization and tracking |
---|
0:00:16 | that was brought on by my colleague yeah exactly |
---|
0:00:19 | great income and i come to the conference |
---|
0:00:21 | income collaboration with not too hard |
---|
0:00:24 | i can see generous |
---|
0:00:25 | before mac please to uh |
---|
0:00:28 | we live in |
---|
0:00:28 | happen |
---|
0:00:31 | so after the presentation of the of the task and the with division i will describe the two tasks |
---|
0:00:36 | uh |
---|
0:00:38 | that are explored |
---|
0:00:39 | i quickly speaker diarization and speaker tracking |
---|
0:00:42 | along with this |
---|
0:00:42 | stance |
---|
0:00:43 | uh |
---|
0:00:44 | uh use than the result obtained before compression and plastic |
---|
0:00:50 | first about the the two tasks that we consider a acoustic speaker diarisation is a |
---|
0:00:55 | who who spoke when' task |
---|
0:00:57 | we got married segmentation and clustering that was really and right |
---|
0:01:00 | the previews |
---|
0:01:01 | um |
---|
0:01:02 | uh |
---|
0:01:03 | oh |
---|
0:01:04 | um |
---|
0:01:05 | we consider it as a |
---|
0:01:06 | processing for the uh |
---|
0:01:09 | automatic speech recognition in for underage |
---|
0:01:12 | transcription |
---|
0:01:13 | uh in this situation we have new approach |
---|
0:01:15 | priori information on speaker phone |
---|
0:01:17 | and speaker's voice |
---|
0:01:19 | and we can see there you only acoustic driven approaches |
---|
0:01:22 | because al also approach |
---|
0:01:24 | linguistic use of the of the transcription |
---|
0:01:26 | but we are also interested in just to get writing that |
---|
0:01:30 | uh |
---|
0:01:31 | well we want to detect regional for spoken documents that |
---|
0:01:34 | a detailed by a given speaker |
---|
0:01:36 | and this situation we have a list of the speaker to the right |
---|
0:01:40 | and we have uh |
---|
0:01:41 | we are provided with training that that for |
---|
0:01:44 | for this because |
---|
0:01:45 | we consider the speaker tracking task as a combination of both acoustic speaker diarization |
---|
0:01:51 | press |
---|
0:01:51 | the speaker verification module in our configuration |
---|
0:01:56 | oh i'm and motivation in this work was to uh |
---|
0:02:00 | including the our system the |
---|
0:02:02 | as yeah techniques that |
---|
0:02:04 | you know all that they are |
---|
0:02:06 | become |
---|
0:02:07 | very uh successfully in |
---|
0:02:09 | speaker recognition |
---|
0:02:11 | uh |
---|
0:02:12 | we started with the |
---|
0:02:13 | G S B |
---|
0:02:15 | questions about vectors |
---|
0:02:16 | stan |
---|
0:02:17 | that was |
---|
0:02:17 | easy to develop a framework |
---|
0:02:19 | and they also for the |
---|
0:02:21 | uh |
---|
0:02:22 | well the features that can be used |
---|
0:02:24 | uh in uh |
---|
0:02:25 | uh |
---|
0:02:26 | as the end system for speaker recognition that well |
---|
0:02:29 | mllr cmllr lattice mllr that |
---|
0:02:32 | uh we also uh |
---|
0:02:34 | uh |
---|
0:02:35 | efficient to combine with the the |
---|
0:02:37 | do not you as it is |
---|
0:02:39 | or more |
---|
0:02:40 | i want also to say one about the context of the we're programs that |
---|
0:02:44 | uh |
---|
0:02:45 | what do you rate |
---|
0:02:45 | our work for improving |
---|
0:02:48 | uh |
---|
0:02:48 | so it's a a friend found that the |
---|
0:02:51 | research |
---|
0:02:52 | uh |
---|
0:02:52 | and uh |
---|
0:02:53 | uh |
---|
0:02:54 | innovation program |
---|
0:02:56 | uh |
---|
0:02:56 | that |
---|
0:02:57 | aims to improve automatic would you be sure uh document structuring and indexing |
---|
0:03:01 | and |
---|
0:03:02 | for this work we wanted to work specifically on the speaker diarization |
---|
0:03:06 | and tracking for |
---|
0:03:07 | exactly what 'cause that yeah |
---|
0:03:09 | and that's why we are uh still |
---|
0:03:12 | on the us team of |
---|
0:03:14 | uh |
---|
0:03:14 | offline |
---|
0:03:15 | um |
---|
0:03:16 | diarisation |
---|
0:03:17 | uh because we we are working on a carry on broadcast that that that are recorded and that |
---|
0:03:23 | uh patch on the on the web or on the radio on T V |
---|
0:03:26 | so one |
---|
0:03:27 | and we will |
---|
0:03:29 | still easy integration of |
---|
0:03:30 | been based |
---|
0:03:31 | fig |
---|
0:03:34 | we worked on the |
---|
0:03:35 | that out of the |
---|
0:03:36 | french ester evaluation |
---|
0:03:39 | uh |
---|
0:03:40 | i hope that the that that will |
---|
0:03:42 | so softly available to the |
---|
0:03:44 | were community |
---|
0:03:45 | remedied as being distributed to the participant to this evaluation in two thousand eight |
---|
0:03:50 | uh yeah uh |
---|
0:03:51 | one hundred |
---|
0:03:52 | being a target speakers |
---|
0:03:54 | uh for which uh |
---|
0:03:56 | we have |
---|
0:03:57 | about |
---|
0:03:57 | one and the right |
---|
0:03:58 | i well as a |
---|
0:03:59 | training that that |
---|
0:04:02 | consist of |
---|
0:04:03 | french speaking or radio shows |
---|
0:04:05 | from |
---|
0:04:06 | uh |
---|
0:04:08 | different sources |
---|
0:04:09 | french tools but also uh |
---|
0:04:11 | uh |
---|
0:04:12 | i consoles |
---|
0:04:14 | uh |
---|
0:04:15 | we have |
---|
0:04:17 | for the impostor that that we've to the |
---|
0:04:19 | it's yeah one which is uh |
---|
0:04:21 | uh |
---|
0:04:22 | is that when evaluation uh that that |
---|
0:04:24 | about |
---|
0:04:24 | four hundred |
---|
0:04:25 | impostors |
---|
0:04:27 | uh |
---|
0:04:28 | as the two development data consisting twenty radio show for that a lot |
---|
0:04:32 | six hours and the evaluation |
---|
0:04:34 | uh a row roughly the same amount |
---|
0:04:37 | twenty six radio shows |
---|
0:04:38 | uh false |
---|
0:04:39 | seven hours |
---|
0:04:41 | uh i also provide uh |
---|
0:04:43 | uh |
---|
0:04:44 | and the value of the |
---|
0:04:46 | uh |
---|
0:04:46 | so if you use some statistics |
---|
0:04:48 | on the number of speakers as speaking anything |
---|
0:04:51 | the segment length |
---|
0:04:52 | the |
---|
0:04:53 | uh S yeah |
---|
0:04:54 | uh |
---|
0:04:54 | to development and evaluation uh that that's that |
---|
0:04:58 | uh |
---|
0:04:58 | the development we we have between nine and twenty five |
---|
0:05:02 | speaker for the mean of C |
---|
0:05:04 | being |
---|
0:05:04 | and |
---|
0:05:05 | on the |
---|
0:05:06 | evaluation |
---|
0:05:07 | uh |
---|
0:05:08 | roughly lies |
---|
0:05:10 | uh with the |
---|
0:05:10 | given uh |
---|
0:05:13 | speaker there so we |
---|
0:05:15 | in a right |
---|
0:05:16 | the speaking length |
---|
0:05:17 | also vary a lot |
---|
0:05:19 | with a mean of uh |
---|
0:05:21 | sixty five seconds |
---|
0:05:23 | ranging between alpha signal and then |
---|
0:05:25 | more than ten minutes |
---|
0:05:27 | and on the evaluation the |
---|
0:05:29 | it's a bit |
---|
0:05:29 | i yeah with |
---|
0:05:30 | and the right of |
---|
0:05:31 | it is signals |
---|
0:05:32 | but we can see the standard deviation is very very high so it |
---|
0:05:36 | just |
---|
0:05:36 | just to have a rough |
---|
0:05:38 | uh i |
---|
0:05:39 | and segments |
---|
0:05:40 | oh also |
---|
0:05:41 | yeah |
---|
0:05:42 | in average |
---|
0:05:43 | six |
---|
0:05:43 | yeah |
---|
0:05:44 | seventeen seconds and |
---|
0:05:46 | the deadline for you know the about it |
---|
0:05:48 | we also uh rummaging from a fraction of signal to |
---|
0:05:52 | uh so norman |
---|
0:05:56 | i will not describe the uh acoustic speaker diarization system |
---|
0:06:00 | uh which is basically this is ten that uh high guy |
---|
0:06:03 | uh recap recap just |
---|
0:06:05 | the previous stork |
---|
0:06:06 | that was developed by uh |
---|
0:06:08 | C than men yeah uh |
---|
0:06:10 | changed to uh myself and only woman |
---|
0:06:13 | for them is |
---|
0:06:14 | two thousand for evaluation |
---|
0:06:16 | uh |
---|
0:06:17 | basically uh well |
---|
0:06:20 | so we just that |
---|
0:06:21 | uh |
---|
0:06:22 | die revisions |
---|
0:06:23 | then |
---|
0:06:24 | so initial a segmentation is using a front end with |
---|
0:06:28 | standard mfcc feature found in uh is a system |
---|
0:06:31 | uh |
---|
0:06:32 | the speech activity detection |
---|
0:06:35 | relies on the viterbi decoding with |
---|
0:06:37 | that's ever uh gmms of |
---|
0:06:39 | speech music and noise |
---|
0:06:43 | on the speech segment |
---|
0:06:44 | there is |
---|
0:06:45 | um |
---|
0:06:46 | uh |
---|
0:06:48 | the segmentation to ins more uh segments using |
---|
0:06:53 | to select |
---|
0:06:53 | two i just some sliding windows of output signal |
---|
0:06:57 | and using a local gardens major to segment |
---|
0:07:01 | i does that uh |
---|
0:07:03 | gmm are trained on the signal |
---|
0:07:06 | yeah the lower segmentation |
---|
0:07:08 | uh of this uh of this data |
---|
0:07:11 | this is the initial segmentation |
---|
0:07:13 | we have the first step of |
---|
0:07:15 | uh i |
---|
0:07:15 | i dramatic clustering using |
---|
0:07:18 | using the |
---|
0:07:19 | classical bic italian |
---|
0:07:21 | uh |
---|
0:07:22 | and |
---|
0:07:22 | oh using full covariance matrix on the single version |
---|
0:07:27 | uh something only |
---|
0:07:29 | uh specific |
---|
0:07:30 | thing is on the |
---|
0:07:31 | penalty which is the local |
---|
0:07:33 | big in I T |
---|
0:07:34 | uh taking into account only the number |
---|
0:07:37 | um it out of the |
---|
0:07:39 | two clusters that are um |
---|
0:07:41 | but |
---|
0:07:41 | and not of the |
---|
0:07:42 | all that |
---|
0:07:43 | and not |
---|
0:07:44 | or of the future |
---|
0:07:48 | and we uh |
---|
0:07:50 | with |
---|
0:07:51 | put the output of the biclustering into a second step |
---|
0:07:54 | using |
---|
0:07:56 | uh |
---|
0:07:56 | speaker I D um |
---|
0:07:58 | mode that is and uh clustering |
---|
0:08:00 | using |
---|
0:08:02 | a slightly different |
---|
0:08:05 | features |
---|
0:08:06 | using |
---|
0:08:07 | uh feature warping |
---|
0:08:09 | and not adapting the ubm |
---|
0:08:12 | the clustering relies |
---|
0:08:14 | the force log likelihood ratio between the two clusters |
---|
0:08:21 | so what what we did the |
---|
0:08:22 | uh was |
---|
0:08:23 | but there are simple |
---|
0:08:24 | simple stuff was looking at the G as the as yet |
---|
0:08:28 | then |
---|
0:08:28 | and |
---|
0:08:29 | integrate it into the system |
---|
0:08:31 | place of the uh last |
---|
0:08:33 | S I Ds clustering stage |
---|
0:08:36 | uh so i think i |
---|
0:08:39 | whiskey |
---|
0:08:39 | right up |
---|
0:08:40 | first of all that it's |
---|
0:08:42 | rose asked on that |
---|
0:08:43 | stuff |
---|
0:08:44 | in |
---|
0:08:45 | the G S V U |
---|
0:08:48 | consist of the means of the |
---|
0:08:50 | uh |
---|
0:08:50 | so predictor of the adapted gmm |
---|
0:08:57 | it |
---|
0:08:58 | exactly as that |
---|
0:08:59 | uh combining diarisation system can improve on the |
---|
0:09:04 | individual assistance |
---|
0:09:06 | there are several ways of doing this combination |
---|
0:09:08 | uh |
---|
0:09:09 | i think one system into the ozone |
---|
0:09:11 | the kind of thing that we already do in our |
---|
0:09:14 | stan |
---|
0:09:14 | we can also mounts different systems or |
---|
0:09:17 | do a |
---|
0:09:17 | cluster voting technique |
---|
0:09:19 | we did a version that's |
---|
0:09:21 | score label |
---|
0:09:23 | which means that |
---|
0:09:24 | during the |
---|
0:09:25 | clustering |
---|
0:09:27 | process |
---|
0:09:29 | we compute an average score between the G and then |
---|
0:09:32 | as the end |
---|
0:09:33 | and the |
---|
0:09:33 | G and then |
---|
0:09:34 | geodesy and then gmmubm schools |
---|
0:09:38 | with |
---|
0:09:38 | uh |
---|
0:09:39 | the weight |
---|
0:09:39 | oh that optimise |
---|
0:09:40 | on the that |
---|
0:09:42 | development |
---|
0:09:45 | the performance measure is the diarization error rate |
---|
0:09:49 | it was already described in uh |
---|
0:09:51 | preview stored so i want |
---|
0:09:53 | get |
---|
0:09:53 | too much into |
---|
0:09:54 | then |
---|
0:09:55 | uh again |
---|
0:09:56 | uh just |
---|
0:09:57 | to say that we also |
---|
0:09:59 | put some |
---|
0:10:00 | i could use with |
---|
0:10:01 | she with your coverage |
---|
0:10:02 | which are the ratio of the minutes reference speaker about within this cluster |
---|
0:10:07 | and |
---|
0:10:08 | combats ripples of possible right |
---|
0:10:10 | which can provide a |
---|
0:10:11 | better insight into the |
---|
0:10:13 | the speaker or |
---|
0:10:15 | and we use the |
---|
0:10:16 | the nist to for scoring following the |
---|
0:10:19 | step two evaluation plan |
---|
0:10:23 | yeah |
---|
0:10:24 | yeah he's a figure of the diarisation error right |
---|
0:10:28 | four |
---|
0:10:28 | uh |
---|
0:10:29 | i was |
---|
0:10:30 | ten |
---|
0:10:30 | uh |
---|
0:10:31 | to the to the left |
---|
0:10:32 | we have the performance of the gmm |
---|
0:10:34 | then |
---|
0:10:35 | to the right of the the G S E S P N |
---|
0:10:37 | then |
---|
0:10:38 | and |
---|
0:10:39 | uh |
---|
0:10:40 | on the uh |
---|
0:10:41 | X axis |
---|
0:10:42 | the different combination weight |
---|
0:10:45 | we have been |
---|
0:10:46 | right |
---|
0:10:47 | in the |
---|
0:10:48 | in green |
---|
0:10:49 | the green curve is for the evaluation set and the right go |
---|
0:10:53 | for the development set |
---|
0:10:55 | which is that the |
---|
0:10:56 | gmm is yeah |
---|
0:10:58 | yeah forms |
---|
0:10:59 | better than the |
---|
0:11:01 | that that the gmmubm so we have a bit of that |
---|
0:11:03 | the that yeah this is then |
---|
0:11:05 | but combination uh |
---|
0:11:07 | yeah |
---|
0:11:07 | is very uh |
---|
0:11:09 | successfully here |
---|
0:11:11 | uh |
---|
0:11:12 | more in detail |
---|
0:11:13 | what what we get is |
---|
0:11:14 | a ten percent |
---|
0:11:16 | relative improvements |
---|
0:11:17 | from a given that one to ten not one |
---|
0:11:20 | from the |
---|
0:11:21 | best performing |
---|
0:11:22 | then to the |
---|
0:11:23 | the G and then |
---|
0:11:25 | press |
---|
0:11:25 | at the end |
---|
0:11:26 | then |
---|
0:11:27 | uh on the development set |
---|
0:11:29 | and |
---|
0:11:29 | on the evaluation we are also saying |
---|
0:11:31 | right |
---|
0:11:32 | going down from uh nine that |
---|
0:11:34 | six two |
---|
0:11:35 | it but |
---|
0:11:36 | three |
---|
0:11:42 | was this was for the acoustic speaker diarization system |
---|
0:11:45 | no some words about the the speaker tracking |
---|
0:11:49 | as i said we uh you just stand as a combination of |
---|
0:11:53 | uh of the speaker acoustic speaker diarization system to the to the left |
---|
0:11:57 | with a speaker at our educations |
---|
0:11:59 | stan |
---|
0:12:00 | and we have |
---|
0:12:02 | three possible ways of doing this |
---|
0:12:04 | combination |
---|
0:12:05 | uh we can do this you can verification |
---|
0:12:08 | on the initial segments |
---|
0:12:10 | of the system |
---|
0:12:11 | or on the cluster |
---|
0:12:13 | output by the beach |
---|
0:12:15 | all by the S I D |
---|
0:12:16 | uh clustering step |
---|
0:12:19 | each case |
---|
0:12:20 | the segments are then |
---|
0:12:21 | uh compared to the |
---|
0:12:22 | speaker models and |
---|
0:12:23 | level according |
---|
0:12:30 | if you well |
---|
0:12:31 | a weekly on the on the C stands for the tracking system |
---|
0:12:35 | we use |
---|
0:12:35 | gmmubm and she is this the end system |
---|
0:12:38 | uh is that a |
---|
0:12:40 | uh |
---|
0:12:42 | that have the same uh |
---|
0:12:44 | properties as as assistant that we |
---|
0:12:46 | that we have already presented folder |
---|
0:12:49 | i musician uh a |
---|
0:12:51 | uh for the verification we choose the target model |
---|
0:12:54 | with the highest likelihood |
---|
0:12:56 | ratio |
---|
0:12:58 | i with the verification phase |
---|
0:13:00 | and |
---|
0:13:00 | the G S via the end is also uh |
---|
0:13:03 | is also |
---|
0:13:04 | following the the same |
---|
0:13:08 | the same uh |
---|
0:13:10 | architecture |
---|
0:13:12 | uh |
---|
0:13:13 | with the constraint that we scrolls input |
---|
0:13:16 | posters and target |
---|
0:13:17 | because |
---|
0:13:18 | uh using the agenda |
---|
0:13:20 | and channel matching the the current condition |
---|
0:13:25 | and we also perform |
---|
0:13:26 | uh waited at the right of but |
---|
0:13:28 | across all the |
---|
0:13:29 | so level |
---|
0:13:29 | uh system fusion |
---|
0:13:33 | the the performance measures for the |
---|
0:13:34 | tracking task where as |
---|
0:13:36 | finding the exact way evaluation campaign |
---|
0:13:39 | uh |
---|
0:13:39 | recall and precision |
---|
0:13:41 | and |
---|
0:13:42 | an issue of combining the |
---|
0:13:44 | but recall and precision |
---|
0:13:46 | um |
---|
0:13:49 | but in a time waiting |
---|
0:13:51 | um i manner |
---|
0:13:52 | and also the speaker weighted action that was the proposed |
---|
0:13:56 | doing the the F and you're that speaker |
---|
0:14:03 | we have something on the on the debt curls that that was |
---|
0:14:06 | simulated by uh by using a |
---|
0:14:09 | uh |
---|
0:14:10 | short segments of the evaluation data |
---|
0:14:13 | of all the different possible uh |
---|
0:14:15 | so |
---|
0:14:16 | then |
---|
0:14:17 | on the evaluation on that then and on the |
---|
0:14:20 | yeah |
---|
0:14:21 | so you're on the evaluation data |
---|
0:14:25 | but suppose that you and then you yeah and the G S V as yeah |
---|
0:14:28 | then |
---|
0:14:29 | we shall with the red the green and the blue gel |
---|
0:14:33 | a different version of the gmmubm since then |
---|
0:14:36 | uh with the |
---|
0:14:37 | verification applied |
---|
0:14:39 | i sat at the output of the segment |
---|
0:14:42 | initial segmentation in blue |
---|
0:14:44 | as the output of the |
---|
0:14:45 | D |
---|
0:14:45 | stayed in a red |
---|
0:14:46 | of the output of the excited state in green |
---|
0:14:50 | it appears that there are not that much if you're ounces and we add |
---|
0:14:53 | a slightly better |
---|
0:14:55 | a four months |
---|
0:14:56 | by using the output of the final stage |
---|
0:14:58 | which is a exciting stage |
---|
0:15:00 | and the G is yes yes |
---|
0:15:02 | then |
---|
0:15:03 | uh |
---|
0:15:03 | yeah you don't shown on the on this |
---|
0:15:05 | uh i'll put on the S I D the clustering step |
---|
0:15:08 | and uh |
---|
0:15:10 | yeah |
---|
0:15:10 | is |
---|
0:15:10 | you shown to perform much better than the |
---|
0:15:13 | gmmubm |
---|
0:15:17 | yeah some some figures i i will consult |
---|
0:15:19 | uh provides a recall precision if you're right |
---|
0:15:22 | uh |
---|
0:15:23 | and |
---|
0:15:24 | if your average by by speaker |
---|
0:15:26 | i i will mainly call uh focus on the uh |
---|
0:15:30 | S |
---|
0:15:31 | uh problem of the of the result |
---|
0:15:34 | that that of him on the |
---|
0:15:36 | uh on the dev and on T of or on the about it |
---|
0:15:39 | well what we |
---|
0:15:41 | what we |
---|
0:15:42 | so on the on the go |
---|
0:15:44 | on the |
---|
0:15:45 | on the development set |
---|
0:15:46 | we |
---|
0:15:47 | uh |
---|
0:15:48 | observe that the |
---|
0:15:49 | S I D clustering step |
---|
0:15:51 | provide the |
---|
0:15:53 | uh |
---|
0:15:54 | better performance that that was a condition |
---|
0:15:56 | and at the |
---|
0:15:57 | compare able uh performance |
---|
0:15:59 | that of the G S the end |
---|
0:16:01 | then |
---|
0:16:02 | and the combination uh is uh |
---|
0:16:04 | improving upon but it's then |
---|
0:16:07 | on the user on on on the evaluation dataset |
---|
0:16:10 | uh the G S V as yeah and you |
---|
0:16:13 | performing much better than the G bit gmmubm |
---|
0:16:16 | then |
---|
0:16:17 | and this case |
---|
0:16:19 | the combination |
---|
0:16:20 | a slightly outperform the G S T I E N |
---|
0:16:23 | and is |
---|
0:16:24 | better than the gmm ubm |
---|
0:16:26 | then |
---|
0:16:29 | well |
---|
0:16:29 | uh |
---|
0:16:30 | that was uh |
---|
0:16:32 | i would say a a simple |
---|
0:16:33 | experimental framework |
---|
0:16:35 | uh that |
---|
0:16:37 | was |
---|
0:16:38 | done that to do the integration of the D N A is yeah |
---|
0:16:41 | then into uh |
---|
0:16:42 | speaker diarization and tracking |
---|
0:16:44 | then |
---|
0:16:45 | so just yeah yeah as |
---|
0:16:46 | a private school |
---|
0:16:48 | performance to the |
---|
0:16:49 | existing standard gmm ubm |
---|
0:16:52 | uh that we that we had |
---|
0:16:54 | and the |
---|
0:16:55 | the score level fusion was uh |
---|
0:16:58 | what's that |
---|
0:16:59 | factory |
---|
0:17:00 | uh there are uh some caveats |
---|
0:17:02 | yeah |
---|
0:17:03 | for example in the post all set |
---|
0:17:05 | which is not very balanced |
---|
0:17:06 | according to the |
---|
0:17:07 | gender and the channel |
---|
0:17:09 | that |
---|
0:17:10 | there are some very small set for example |
---|
0:17:13 | honestly made on our bound that that we have a very few posters for the |
---|
0:17:17 | uh for the experiments |
---|
0:17:19 | and of course |
---|
0:17:20 | we want to go |
---|
0:17:22 | browser for |
---|
0:17:23 | well the svm features |
---|
0:17:24 | like an L L F |
---|
0:17:25 | cmllr lattice mllr |
---|
0:17:27 | and also the very interesting direction that were presented in the |
---|
0:17:31 | previous bill |
---|
0:17:34 | thank you for that |
---|
0:17:44 | cool |
---|
0:17:47 | yeah and that is that |
---|
0:17:49 | you using delta double delta features |
---|
0:17:52 | yeah |
---|
0:17:53 | uh and |
---|
0:17:54 | some other posters |
---|
0:17:56 | found that the amending the deltas |
---|
0:17:58 | yeah |
---|
0:18:00 | it is |
---|
0:18:01 | you could addendum limiting the deltas |
---|
0:18:03 | so |
---|
0:18:03 | the |
---|
0:18:04 | did you |
---|
0:18:05 | you try limiting the deltas |
---|
0:18:07 | yeah |
---|
0:18:08 | for example in the |
---|
0:18:09 | in the first |
---|
0:18:10 | stayed on the beach |
---|
0:18:12 | segmentation |
---|
0:18:13 | uh |
---|
0:18:15 | on the initial segmentation we use the delta delta |
---|
0:18:18 | on the first |
---|
0:18:19 | uh that uh |
---|
0:18:20 | on the big stage we use of |
---|
0:18:22 | comments metrics using |
---|
0:18:23 | nothing at all or only the |
---|
0:18:25 | a static features |
---|
0:18:28 | and on the second stage we use |
---|
0:18:30 | uh only the delta |
---|
0:18:32 | not the delta delta |
---|
0:18:34 | um |
---|
0:18:35 | the the russian one of the rationale is |
---|
0:18:38 | trying to have different |
---|
0:18:39 | uh |
---|
0:18:40 | feature representation |
---|
0:18:41 | to combine different |
---|
0:18:43 | uh aspects |
---|
0:18:44 | a different flavour |
---|
0:18:45 | i i'm not sure that it is optimal this way |
---|
0:18:47 | because we wouldn't test or configuration |
---|
0:18:49 | it was one way of doing that |
---|
0:18:51 | but i i i agree that that that that that that |
---|
0:18:55 | it's not |
---|
0:18:56 | clearly convincing that they bring uh always something uh in the district |
---|
0:19:11 | i |
---|
0:19:11 | to to to what people say that we observe that when the data very clean |
---|
0:19:16 | yeah |
---|
0:19:17 | we make some recruiting |
---|
0:19:19 | acoustic room |
---|
0:19:20 | so the didn't uh didn't they give some game but i mean |
---|
0:19:24 | it's |
---|
0:19:24 | data from nineteen |
---|
0:19:26 | precision you will never see any given to the different user |
---|
0:19:34 | thank you |
---|
0:19:51 | you don't |
---|
0:19:51 | slide fourteen years |
---|
0:19:53 | forty |
---|
0:19:53 | slide forty |
---|
0:19:55 | okay yeah |
---|
0:19:58 | yes you can explain why you could input result and |
---|
0:20:01 | the evaluation did there was a difference between a database |
---|
0:20:06 | the databases are |
---|
0:20:07 | where is that well not recorded at the same day |
---|
0:20:10 | and |
---|
0:20:11 | they have a slightly different balance between the sources |
---|
0:20:14 | of data |
---|
0:20:15 | there are some some |
---|
0:20:16 | uh |
---|
0:20:17 | so that coming from french |
---|
0:20:19 | i'll send their |
---|
0:20:20 | some of the uh from uh |
---|
0:20:22 | i pupils and that's good not |
---|
0:20:24 | was uh from a high view mall |
---|
0:20:25 | was up so they are |
---|
0:20:27 | and the balancing is different between the that when the evil |
---|
0:20:30 | slightly different and i think it's |
---|
0:20:33 | it's |
---|
0:20:33 | uh and it's |
---|
0:20:34 | blind some |
---|
0:20:36 | some |
---|
0:20:36 | reside |
---|
0:20:37 | and also the fact that uh |
---|
0:20:39 | well |
---|
0:20:41 | i think that for the acquisition system even the weight is uh |
---|
0:20:46 | the givens images |
---|
0:20:47 | even twenty uh |
---|
0:20:49 | well |
---|
0:20:50 | six a well it's not that much |
---|
0:20:54 | and then |
---|
0:20:55 | when you do speaker tracking |
---|
0:20:57 | you have like sometimes |
---|
0:21:00 | speaker |
---|
0:21:00 | we speak a lot |
---|
0:21:01 | so his model is what is fine |
---|
0:21:03 | sometimes speaker lies only few |
---|
0:21:05 | time to |
---|
0:21:05 | however on the speaker |
---|
0:21:07 | so how to fix it yourself in this case |
---|
0:21:11 | um |
---|
0:21:17 | on the speaker tracking it's a verification |
---|
0:21:20 | uh |
---|
0:21:22 | compared to |
---|
0:21:24 | two of racial |
---|
0:21:25 | so there is only a |
---|
0:21:27 | the normalisation by by the length but the the the free should there is no |
---|
0:21:31 | normalisation uh according to the length of the of the data |
---|
0:21:34 | i i agree that it is some something that needs to be uh |
---|
0:21:38 | addressed |
---|
0:21:39 | yeah |
---|
0:21:51 | and this |
---|
0:21:51 | i think yeah not |
---|