0:00:15 | my name is problematic a and i'll be talking about that |
---|
0:00:20 | a neural network or bottleneck features for language identification |
---|
0:00:24 | a i did this work during my postdoc iterating bbn |
---|
0:00:29 | and at first i will talk about the writes they darpa rats program which i |
---|
0:00:35 | tested you don't so it's a noisy condition |
---|
0:00:39 | then i will talk about the neural network what the mecc features and then an |
---|
0:00:44 | application to language identification |
---|
0:00:48 | so the darpa rats program i think that it's already introduce that so i would |
---|
0:00:53 | like to give you some taste of the red unfortunately |
---|
0:00:58 | there are not enough rates to taste for all of you arena there and so |
---|
0:01:01 | i know there is a place an audio samples |
---|
0:01:09 | i |
---|
0:01:14 | i |
---|
0:01:15 | they really i |
---|
0:01:18 | i |
---|
0:01:24 | so you get some impression a noise it is |
---|
0:01:29 | so the bottleneck features |
---|
0:01:31 | so the bottleneck feature stands for |
---|
0:01:34 | a neural network topology where the one hidden layer is a has a significantly lower |
---|
0:01:40 | dimension than the surrounding layers |
---|
0:01:42 | my case was at diamond used for the bottleneck and fifteen hundred for the surrounding |
---|
0:01:48 | layers |
---|
0:01:49 | what actually it does it does the it does a kind of a compression in |
---|
0:01:53 | this compress information so that we can use it in a some other ways then |
---|
0:01:57 | adjusting neural network |
---|
0:01:59 | it comes from the speech recognition |
---|
0:02:02 | i where they usually use it the frigid features alone or in the conjunction we |
---|
0:02:08 | is the baseline features that will be a final image mfccs |
---|
0:02:13 | what i actually used still got stuck but the mac |
---|
0:02:16 | where i have to the redhead doing you know networks |
---|
0:02:21 | both these bottlenecks |
---|
0:02:22 | and actually the that's second neural network takes the input from the first not from |
---|
0:02:30 | the bottleneck then expect it in time five frames with a five frame shift |
---|
0:02:36 | actually this was proven by that the but guys to be very good for speech |
---|
0:02:41 | recognition so are used today the right to do different number of frames different used |
---|
0:02:47 | different used |
---|
0:02:48 | and so on so we you mustn't for this |
---|
0:02:52 | right topology of the bottleneck features where is the for the first not okay used |
---|
0:02:57 | frequency domain linear prediction coefficients peace fundamental frequency |
---|
0:03:04 | as input actually we use the block of the log mel-filterbank it gives you about |
---|
0:03:08 | the same results |
---|
0:03:10 | then i have fifteen hundred sixteen hundred and eighty the bottleneck |
---|
0:03:15 | fifteen hundred and the target |
---|
0:03:17 | a number of target for me to targets where a state of the context dependent |
---|
0:03:22 | clause with queen phones |
---|
0:03:24 | usually like to the beauty garbage or use the triphones i use a queen phones |
---|
0:03:29 | because bbn had dbn is using the queen phones |
---|
0:03:34 | the second net actually has about the same topology just the input is different it's |
---|
0:03:41 | actually i have a five frames |
---|
0:03:43 | that's stuck in time so it's five times at so it's four hundred but then |
---|
0:03:49 | other otherwise the quality same |
---|
0:03:53 | for that's we have a two languages which were transcribed which is a farsi and |
---|
0:03:58 | eleven time you can see the number of hours what the net was trained |
---|
0:04:03 | and number of targets |
---|
0:04:06 | was we just for the each system |
---|
0:04:10 | so let's go to |
---|
0:04:13 | language recognition |
---|
0:04:14 | so the data that syndication as meat set the rest five target language is out |
---|
0:04:20 | of set class |
---|
0:04:21 | different durations and as you heard it's quite noisy so i would just keep this |
---|
0:04:26 | like |
---|
0:04:28 | i baseline discrete might baseline system description |
---|
0:04:31 | he's |
---|
0:04:33 | i use the p l ps might not nine plp coefficients i use a short |
---|
0:04:37 | time gaussianization usually can see the benefit of using this for language id but for |
---|
0:04:42 | these noisy |
---|
0:04:43 | condition you actually helps |
---|
0:04:46 | we use a block of these look at eleven frames respect them together and project |
---|
0:04:51 | them to sixty dimensions of hlda |
---|
0:04:54 | and as you |
---|
0:04:58 | sorry is you can see in the in the next slide i tried different coefficients |
---|
0:05:04 | to compare |
---|
0:05:06 | you go see the results in next slide are used a ubm with one thousand |
---|
0:05:10 | twenty four versions |
---|
0:05:12 | i-vector was four hundred dimensional and the final classifier was neural network we found that |
---|
0:05:18 | for this kind of task was the best |
---|
0:05:22 | but you should you have to do something speech are described in the paper |
---|
0:05:27 | so here are the slide with the first results ugh of them baseline system baseline |
---|
0:05:31 | results and there are four different feature extraction is i we focused on the three |
---|
0:05:38 | seconds and ten second condition because under twenty second was so good that it did |
---|
0:05:43 | make sense to look at thirty second was also would after diffusion |
---|
0:05:47 | so we mainly focus on the on these two conditions |
---|
0:05:50 | as you see the mmi coefficients from you two dollars are but the fourth |
---|
0:05:55 | ten second condition plp sub at the phone three second conditions |
---|
0:05:59 | the rest this was the but mfcc features which we very using for nist |
---|
0:06:06 | evaluations |
---|
0:06:06 | and this is the features which of a the best two thousand thirteen that doesn't |
---|
0:06:12 | thirteen rats evaluation for us |
---|
0:06:15 | so these are the like the baseline a baseline features like the conventional acoustic features |
---|
0:06:21 | so let me before agenda the results of is the bottleneck features let me talk |
---|
0:06:26 | about the prior for over |
---|
0:06:28 | the mainly |
---|
0:06:30 | they use the |
---|
0:06:32 | a context independent phonemes |
---|
0:06:34 | which makes quite a lot of differences we will see later |
---|
0:06:37 | and so what in two thousand thirteen in the forest evaluation jeff map from bbn |
---|
0:06:43 | actually he use the |
---|
0:06:45 | context independent and phonemes actually clustering on valentine arabic the dimensional thirty nine so he |
---|
0:06:51 | to go look of these posteriors just and simply just stick it to the block |
---|
0:06:56 | of the p l ps is the baseline and then all of this projected back |
---|
0:07:01 | to sixty dimensions with hlda |
---|
0:07:03 | and he got the pretty good results it's like got |
---|
0:07:06 | feature-level fusion it's like |
---|
0:07:11 | your idea is she she's doing so called phone a log-likelihood ratio posterior features |
---|
0:07:19 | what she does she takes the posteriors take the log and then do the likelihood |
---|
0:07:23 | ratio between them |
---|
0:07:25 | usually appended deltas and sometimes you use the pca to reduce the dimension dimensionality and |
---|
0:07:31 | then later she easier if use it is this plp |
---|
0:07:35 | she was before christmas she was it but and she was working on a lot |
---|
0:07:39 | as well so we could compare these features |
---|
0:07:43 | and actually these features these features that also better than the baseline features and exceeded |
---|
0:07:51 | are better than the phonotactic system because they did also the for like the conventional |
---|
0:07:55 | phonotactic system in this which is there much better and that the and the phonotactics |
---|
0:07:59 | the code like the conventional phonotactic system to make it to the fusion |
---|
0:08:02 | and these features the speech used it |
---|
0:08:07 | during the value process one of your told us that there was a there was |
---|
0:08:13 | a very similar work which was submitted to ieee electronic that there is at the |
---|
0:08:18 | end of two thousand thirteen |
---|
0:08:19 | it will by mister strong and he applied on the clean white cream data on |
---|
0:08:24 | the nist two thousand you have two thousand nine data |
---|
0:08:28 | then during the presentation on two thousand and fourteen i guess |
---|
0:08:34 | actually it's not in the paper we just in the presentation |
---|
0:08:37 | that's your more and of from google you present in the bottleneck features |
---|
0:08:44 | and but he's neural network is d n is actually the range to produce the |
---|
0:08:52 | posterior probability of target language is not a phonemes |
---|
0:08:55 | so it might open the new field of the like data-driven and data driven features |
---|
0:09:03 | so let's go to results |
---|
0:09:05 | so if i take so here are again this for baseline features then divide take |
---|
0:09:10 | the look up posterior just the log posterior of the which comes out of the |
---|
0:09:15 | neural network i think just one frame this time means of just one frame |
---|
0:09:21 | and just build the i-vectors esteem then you can see that it can it is |
---|
0:09:27 | better than any of the based on about |
---|
0:09:30 | so then what i did i to eleven like going to block of the this |
---|
0:09:34 | posteriors |
---|
0:09:35 | and |
---|
0:09:37 | stacked them together project the we hlda two sixty |
---|
0:09:41 | and you can see that it's |
---|
0:09:44 | quite better than just one frame so it means that the context is very important |
---|
0:09:49 | and then this is what jeff might need the like the baseline features plus one |
---|
0:09:56 | frame of and you the posteriors |
---|
0:10:00 | and projected to hlda via just dimensions |
---|
0:10:03 | and you can see that this is this case good but it's all the data |
---|
0:10:06 | like fusion of two systems |
---|
0:10:10 | so how does the what select features then |
---|
0:10:13 | so again is just one frame |
---|
0:10:16 | i tried also more things but it didn't help for me |
---|
0:10:21 | so one frame of bottleneck features the diamonds nineties at |
---|
0:10:25 | and you by take the but this at the bottom language the bottleneck from the |
---|
0:10:29 | first neural network |
---|
0:10:31 | and this is the stack but language is the book like from the second neural |
---|
0:10:35 | network so you can see that a boss this teams is quite better than any |
---|
0:10:40 | of the baseline and actually it makes sense to do the that the stack but |
---|
0:10:46 | may architecture because you get something |
---|
0:10:48 | something out of it |
---|
0:10:50 | what why i'm thinking just one frame it might be a this for the case |
---|
0:10:55 | for the button like for the state but make features that i'm doing this taking |
---|
0:10:58 | between the between the nets of it might be that the context it solidity so |
---|
0:11:02 | that there |
---|
0:11:05 | so then i did some i have some analysis slides |
---|
0:11:10 | and the first thing was obviously to a try to tune the bottleneck size |
---|
0:11:16 | so the usually they use it for speech recognition used a user usually at so |
---|
0:11:20 | i took eighty and it is the baseline and then try to very the button |
---|
0:11:26 | excise but is it is you see |
---|
0:11:29 | the at was the best |
---|
0:11:31 | if you go to sixty and i |
---|
0:11:34 | it's kind of stuff to rate both so i stick with at because it was |
---|
0:11:37 | the baseline formant |
---|
0:11:40 | the other thing i was interested in force how it stand if we if what |
---|
0:11:45 | what's but the target for the neural network should be |
---|
0:11:48 | so we did of is the context dependent phonemes |
---|
0:11:53 | but how it how is it is the context independent |
---|
0:11:56 | so it's much easier to train the system is context independent phonemes |
---|
0:12:02 | then this context dependent because we do not need to build the lvcsr sistine the |
---|
0:12:05 | training of the neural network is much faster and so and so on |
---|
0:12:09 | bob if you look at the results the results |
---|
0:12:12 | a query speaker use the context dependent context dependent phones |
---|
0:12:17 | i think it's because of you are modelling of the final estimate structure in does |
---|
0:12:22 | the this feature space |
---|
0:12:27 | then |
---|
0:12:29 | the |
---|
0:12:31 | we have it as i said at the beginning we had we have a to |
---|
0:12:34 | language is we do we have a transcription for |
---|
0:12:38 | it's farsi eleven time |
---|
0:12:40 | and so i |
---|
0:12:42 | the dues to set of features one was trained on farsi one on lemon time |
---|
0:12:46 | you can see that they perform about the same |
---|
0:12:51 | actual data used because it is you can see of the final a slide |
---|
0:12:56 | and evaluate would you buy we need to choose just one i which is the |
---|
0:13:00 | levenshtein one because it's just |
---|
0:13:03 | slightly better |
---|
0:13:06 | you would not see doing to do in test proceed but test but in that |
---|
0:13:10 | the reigning the farsi has much higher target like that might much more |
---|
0:13:17 | context dependent phones so that the reigning was more time so it for like training |
---|
0:13:22 | convenience the levenshtein but the |
---|
0:13:26 | so then into thai wasn't two thousand thirteen what jeff did for the rats evaluation |
---|
0:13:35 | he did the kind of fusion of several six teams |
---|
0:13:40 | recording language dependent sixteen |
---|
0:13:42 | and i was explained on the picture |
---|
0:13:46 | so the language do what is the language dependency for usually we have just ubm |
---|
0:13:52 | and i-vector and you're not to obtain on the same data which are usually all |
---|
0:13:57 | data we have |
---|
0:13:58 | so what we did is to train the gmm on one language which that's a |
---|
0:14:04 | just are a big language just on dari farsi bunch two or two |
---|
0:14:09 | and all other languages and then i-vector and in it was collected all of them |
---|
0:14:15 | and then at the end we to be a just a simple average of the |
---|
0:14:18 | scores we didn't want to train diffusion because it's more parameters so we do we |
---|
0:14:22 | because the fusion was then train this other systems |
---|
0:14:27 | so |
---|
0:14:28 | personally i do like that structure that match because the complexity of the systems grows |
---|
0:14:34 | quite a lot but i think is doing it takes advantage of different of different |
---|
0:14:38 | alignment of the of the ubms so how does it look like and the results |
---|
0:14:44 | so that the first line is the |
---|
0:14:47 | is the baseline where we trained everything on when we try to train the ubm |
---|
0:14:50 | on all languages |
---|
0:14:52 | one only down |
---|
0:14:53 | then next six lines |
---|
0:14:56 | are the sistine the separate sistine where we train the ubm only on the particle |
---|
0:15:00 | language |
---|
0:15:02 | so if you if you see the results none of these be the baseline which |
---|
0:15:06 | is kind of the is |
---|
0:15:07 | but then you by take average of this of the of discourse and score it |
---|
0:15:11 | you can see that there is a very nice benefit of doing this so it's |
---|
0:15:15 | those of fifteen percent forty second twenty five percent for ten second |
---|
0:15:19 | and |
---|
0:15:21 | we had also shan't we had we what we did also the so that in |
---|
0:15:25 | the rats there are eight |
---|
0:15:27 | it should be nine |
---|
0:15:28 | g h channels but the source channel so i did the same for that no |
---|
0:15:31 | on the channel level |
---|
0:15:34 | it perform about the same |
---|
0:15:35 | and then i do those of the average of all of them i |
---|
0:15:38 | it's also about the same so |
---|
0:15:40 | there is some separation is also |
---|
0:15:44 | due to some point it improves |
---|
0:15:47 | it would be good to the right the what we small saying that the like |
---|
0:15:51 | the d n and alignment which might be which might be something similar different alignment |
---|
0:15:57 | to look at is or |
---|
0:16:01 | so let's look at the final sideways the fusion |
---|
0:16:05 | the first line is the plp sistine |
---|
0:16:08 | then |
---|
0:16:10 | the then i have a fusion of three system which is the stacked bottleneck sistine |
---|
0:16:15 | with for false eleven time and then the feature level fusion with acoustic system |
---|
0:16:21 | and you can see that there is about the thirty percent improvement |
---|
0:16:27 | then |
---|
0:16:28 | the same one to compare if we did length both all system language dependent so |
---|
0:16:33 | we saw that like thirty percent or twenty five percent improvement from the |
---|
0:16:38 | from the language dependent us esteem and here we can see that the fusion still |
---|
0:16:44 | can't gain the same gain as if we do not the language dependency which is |
---|
0:16:49 | which is very nice |
---|
0:16:53 | but the thirty percent from the fusion over the single best system if you do |
---|
0:16:57 | the language dependency or not |
---|
0:17:00 | then one of your suggest it's to do something words isolated maybe it was a |
---|
0:17:06 | review from sri |
---|
0:17:09 | the |
---|
0:17:10 | and |
---|
0:17:11 | also are after the rats evaluation i actually extent streamers from within each and he |
---|
0:17:18 | said just introduced to do that so what we had is the blue |
---|
0:17:22 | lou stream |
---|
0:17:23 | and what's |
---|
0:17:26 | kind of day deed was very easy to for me to try so i didn't |
---|
0:17:30 | got the what to make here but i just use entire network and use the |
---|
0:17:35 | posteriors which were here and dialect defeated due to another mlp and produce the scores |
---|
0:17:40 | and then i could to use it |
---|
0:17:43 | so you can see that there is |
---|
0:17:45 | that's actually for me |
---|
0:17:47 | the posterior system was voters then the like the stacked bottleneck with i-vectors |
---|
0:17:54 | but yesterday we can i compared the results this image and actually they are see |
---|
0:18:00 | steam |
---|
0:18:01 | like the c n posterior system is a little bit better than mine system here |
---|
0:18:07 | we talk a little bit it might be because of the c n is behaving |
---|
0:18:10 | much better than the indian and for noisy condition |
---|
0:18:13 | which we need to train it to try but the fusion was fusion with these |
---|
0:18:18 | two approaches is very nice |
---|
0:18:21 | the conclusion the bottleneck features provides very nice gain |
---|
0:18:28 | it |
---|
0:18:29 | it's very nicely compete of is the with the conventional phonotactic system which we did |
---|
0:18:33 | before actually it it's much better |
---|
0:18:36 | and as i said before we |
---|
0:18:39 | for than for the rats evaluation this year we had also phonotactic systems and none |
---|
0:18:45 | of then made it to the final fusion |
---|
0:18:48 | and there are much bigger gains for longer audio files |
---|
0:18:53 | a |
---|
0:18:55 | as i said this what events you more noise during the |
---|
0:18:59 | the |
---|
0:19:00 | that is that the make this trained for direct to like to the bottom x |
---|
0:19:05 | but on the direct task that he's doing the net with the target are the |
---|
0:19:09 | languages for this case |
---|
0:19:11 | then it might open to new space for the that the ribbon feature extraction |
---|
0:19:18 | thank you |
---|
0:19:26 | thank you problem do we have any questions |
---|
0:19:31 | how to train the neural networks for the deck |
---|
0:19:35 | so i for this task is the bbn training told is a stochastic gradient tests |
---|
0:19:40 | and gpus and each net was trained like it's three day so i have two |
---|
0:19:46 | nets it was about a week to train trying to |
---|
0:19:55 | activation function but it was is |
---|
0:20:01 | i do they remembered activation function in the in the in the a hidden layers |
---|
0:20:06 | but i know that for the button make these the linear one so there wasn't |
---|
0:20:09 | there was a linear activation function for the bottleneck was also shown that for the |
---|
0:20:13 | speech recognition that is providing a better |
---|
0:20:16 | but the results i can i it's between actually it's in the paper |
---|
0:20:21 | i satisfy the deleted |
---|
0:20:25 | so that the same questions that's image this you tell what was used to train |
---|
0:20:29 | your asr in your d n and |
---|
0:20:31 | same s |
---|
0:20:32 | since all the channels |
---|
0:20:34 | yes all channels and |
---|
0:20:37 | the d n the d n and for that's the bottleneck or the but make |
---|
0:20:41 | features contain from the keyword spotting data so it's different data from the ubm and |
---|
0:20:46 | the and the this |
---|
0:20:48 | okay so you'd also at different datasets on there so what are the questions here's |
---|
0:20:53 | how much |
---|
0:20:55 | what is your sense on the sensitivity to so the to do the indians it |
---|
0:20:59 | seems like there's it's a start all that a good asr system i and label |
---|
0:21:03 | your data then trained indian answer the question |
---|
0:21:06 | maybe people the places i seven had experiences |
---|
0:21:09 | what people think this sensitivity is on saying i start off with a very good |
---|
0:21:13 | alignment |
---|
0:21:14 | see that you start to train at the end |
---|
0:21:17 | do you get the sense on that you know anything maybe not this work but |
---|
0:21:20 | otherwise |
---|
0:21:22 | that's hardware so i |
---|
0:21:26 | what is but here that you really need to beautiful lvcsr reason is to be |
---|
0:21:31 | good |
---|
0:21:32 | what i like is what a nazi armour nancy more noise doing that actually does |
---|
0:21:37 | not need to do that the irises subsystem you just a side so we can |
---|
0:21:42 | use actually the language id data directly you trying to neural network on the post |
---|
0:21:46 | like in the language posteriors so use the same data actually as norm assisting |
---|
0:21:52 | i played is that a little bit as well i did those of the bottleneck |
---|
0:21:54 | speakers the because if you do it what he was doing actually on the j |
---|
0:21:59 | g workshop |
---|
0:22:00 | that he train the net like the d n and to produce the |
---|
0:22:05 | line the targets the languages are then he did a because you have it you |
---|
0:22:10 | have a |
---|
0:22:11 | the posterior probability of language each frame so we need to do the some timing |
---|
0:22:15 | so what he did he just the average |
---|
0:22:18 | and which is |
---|
0:22:20 | good for three seconds but is not good for ten seconds |
---|
0:22:23 | so what i did then i to the exactly this a posteriors as the features |
---|
0:22:29 | i to the output of the layer before the features and then it helps because |
---|
0:22:33 | then and is that if you just i-vectors justin |
---|
0:22:36 | and then it helps to do something actually for that i would have much smaller |
---|
0:22:40 | system to do the to the i-vector system |
---|
0:22:44 | so that might be |
---|
0:22:47 | on support not to do the ldc as |
---|
0:22:55 | just to follow on in response to dogs question with the keyword spotting died of |
---|
0:22:59 | that was transmitted different time to language are data one thing so i observed in |
---|
0:23:05 | the speaker i data was that retransmission a time as it trying to the side |
---|
0:23:11 | as the atmosphere and the transmission affects |
---|
0:23:14 | so the channel is bearing i've a time so in one regard we got this |
---|
0:23:17 | keyword spotting that of that kind has different channels of language at daylight |
---|
0:23:22 | a even though it's theoretically the same equipment that sending that that's a different effect |
---|
0:23:27 | that's coming three so it's nice to see that is still working despite that different |
---|
0:23:32 | a similar question now is for instance in the clean sre data with saying difference |
---|
0:23:37 | between or a problem trying to classify microphone a trials when most you're trying that |
---|
0:23:45 | if we take your network is telephone speech |
---|
0:23:49 | the one here last statements on the thought was that the bottleneck features are a |
---|
0:23:53 | great even in noisy conditions so of course got very matched data he at do |
---|
0:23:58 | you have any theories of how the bottleneck features my car in mismatched conditions i |
---|
0:24:04 | last minute because of various system |
---|
0:24:06 | appears sensitive to it i wonder if the bottleneck smart little bit more distant just |
---|
0:24:10 | because the compression factor |
---|
0:24:12 | i think it would depend on the train data for the d n and |
---|
0:24:16 | okay so what we did for that's it for best |
---|
0:24:18 | together with speech |
---|
0:24:22 | we had adjust the clean data for training to the nn so we just say |
---|
0:24:26 | okay so what we go do if the test data will be noisy so we |
---|
0:24:30 | just to get thirty percent of the training data and we just artificially i denotes |
---|
0:24:35 | that help a lot |
---|
0:24:37 | so then the d n and source of the noisy condition |
---|
0:24:46 | since that reputed question our |
---|
0:24:50 | if you have to do very many languages the |
---|
0:24:53 | could you imagine having from for universal recognizer system for the d n are you |
---|
0:24:58 | think you'll have to be very |
---|
0:25:01 | i think that the people need to build at least a few d n n's |
---|
0:25:04 | because i think that mitch you said that you try to those of the like |
---|
0:25:07 | the farsi eleven time and then |
---|
0:25:10 | the universal one right |
---|
0:25:14 | so you might common much more if it was better than to separate one or |
---|
0:25:18 | the fusion of these two |
---|
0:25:23 | so we had someone in our lab construct the multilingual dictionary between these two languages |
---|
0:25:27 | that was the best |
---|
0:25:28 | of the three systems that we tried but we also found the fusion of all |
---|
0:25:33 | three was best in fact our primary system was the fusion of the t c |
---|
0:25:36 | and in systems but also three a c in an i-vector systems for the site |
---|
0:25:43 | languages but all with one language id feature |
---|
0:25:47 | if that might if you're member distinction between the d n and has a certain |
---|
0:25:51 | age and language id as a set which we just maintain one image and em |
---|
0:25:56 | sales code language id feature that the c and then change and that was |
---|
0:26:00 | a very good fusion |
---|
0:26:02 | i in terms of the sre all i read i to we found that having |
---|
0:26:06 | the multiple languages |
---|
0:26:10 | if you get the good scart across the different phone right that's one of stuff |
---|
0:26:13 | to converge |
---|
0:26:16 | then i think that you would need to a few systems |
---|
0:26:19 | not many for three four |
---|
0:26:22 | and it would be better than to have one universal |
---|
0:26:30 | okay |
---|