0:00:15 | okay so got on that difficulty with this is so these are |
---|
0:00:19 | so i sent to store for them |
---|
0:00:23 | it's a talk about so a neural networks primarily recurrent neural networks |
---|
0:00:27 | for text dependent speaker verification |
---|
0:00:31 | this is on paper at least a very natural fit between a model on the |
---|
0:00:36 | problem |
---|
0:00:37 | and it's something that's a good goal has got to work very successfully |
---|
0:00:43 | so we try to unfortunately we came to the completion that have we were a |
---|
0:00:49 | couple of orders-of-magnitude short simply amount of background data we with me |
---|
0:00:55 | so |
---|
0:00:57 | i one telephone dugout and it's explain why didn't work |
---|
0:01:04 | i would recommend that you read this paper i suggested to and the derided a |
---|
0:01:09 | as a survey article |
---|
0:01:11 | i think is worth reading on those on those trials |
---|
0:01:15 | but i don't cry going to spend the whole period talking about this particular problem |
---|
0:01:21 | i'd like to explain why our times are for getting of these neural networks to |
---|
0:01:28 | work i'm talking specifically about |
---|
0:01:30 | speaker discriminant neural networks |
---|
0:01:33 | getting them to work in text |
---|
0:01:35 | independent speaker recognition |
---|
0:01:40 | got times a thesis project will be specifically i'm getting convolutional neural networks to work |
---|
0:01:47 | and i personally i'm particularly interested in |
---|
0:01:52 | what is the right back end architecture |
---|
0:01:54 | for this type of problem |
---|
0:01:58 | so what i plan to do it was then maybe five or even though it |
---|
0:02:02 | would have only results to present have spent maybe five or ten minutes |
---|
0:02:06 | talking about |
---|
0:02:08 | well point this is a difficult problem but why the difficulties are not since approval |
---|
0:02:14 | and |
---|
0:02:16 | if possible like will explain for four |
---|
0:02:19 | hoping to do by way of at |
---|
0:02:23 | system for the |
---|
0:02:25 | for the nist evaluation based on speaker discriminant neural networks |
---|
0:02:31 | all this in the hope of provoking a discussion i would be particularly interested in |
---|
0:02:36 | hearing |
---|
0:02:38 | fans of and the other people who might be trying to do something |
---|
0:02:43 | okay so i don't |
---|
0:02:45 | that's for guns on this task of the problem was to use neural networks |
---|
0:02:51 | to extract utterance that features |
---|
0:02:55 | which could be used to characterize speakers |
---|
0:02:59 | in the context of a classical text dependent speaker recognition task where you have a |
---|
0:03:04 | fixed |
---|
0:03:05 | a pass phrase and the phonetic variability is partially nailed down |
---|
0:03:11 | the easiest |
---|
0:03:12 | way to do this is using an ordinary feed forward a deep neural network |
---|
0:03:19 | but we were particularly interested in trying to get this to work with recurrent neural |
---|
0:03:23 | networks |
---|
0:03:25 | largely inspired by |
---|
0:03:27 | recent work in machine translation which |
---|
0:03:30 | this briefly |
---|
0:03:33 | so |
---|
0:03:36 | so here's the problem i'll just mention at the outset that we were specifically interested |
---|
0:03:42 | in the case of getting that's to work with a modest amount of background data |
---|
0:03:47 | most of us working in |
---|
0:03:49 | text dependent speaker recognition are confronted by very heart constraint more if we're lucky we |
---|
0:03:55 | will be able to get data from |
---|
0:03:58 | one hundred speakers |
---|
0:04:00 | whereas if you read the google paper you will see that they have |
---|
0:04:04 | this really tens of millions of recordings |
---|
0:04:08 | all instances of phrase |
---|
0:04:10 | okay |
---|
0:04:13 | so for |
---|
0:04:16 | well what you would do and designing a deep neural network for this purpose you |
---|
0:04:21 | would just feed the a three hundred milisecond |
---|
0:04:25 | when no into a classical feed forward neural network |
---|
0:04:30 | with the softmax on the outputs where you have additional for each |
---|
0:04:37 | speaker among your development the population and train up with a classical cross entropy criterion |
---|
0:04:45 | you with then given utterance level features simply by averaging the output so from the |
---|
0:04:52 | over all frames so that this was implemented successfully by google the gold at a |
---|
0:04:57 | d vector approach |
---|
0:05:01 | and |
---|
0:05:03 | it works fairly well on our task as well although it's not competitive with play |
---|
0:05:09 | the gmm ubm |
---|
0:05:12 | so well this is just the |
---|
0:05:15 | classical feed forward architecture i don't think it needs and the anti further comments |
---|
0:05:23 | what was i think most remarkable about the |
---|
0:05:28 | or an architecture which are |
---|
0:05:31 | describe the next |
---|
0:05:34 | is that a local manage to get this to work has an end-to-end |
---|
0:05:39 | speaker recognition system not nearly |
---|
0:05:42 | a feature extractor |
---|
0:05:44 | but one of which could make a binary a decision concerning a trial as to |
---|
0:05:48 | whether it's a |
---|
0:05:50 | a target trial or non-target trial |
---|
0:05:53 | this has been sort of seen as a part of gold at the end of |
---|
0:05:57 | the rainbow in our field for very long time |
---|
0:06:00 | we |
---|
0:06:01 | it has been i people have been able to get to work with i-vectors |
---|
0:06:07 | but a direct approach to that problem has generally been you know resistant to our |
---|
0:06:14 | best staffers but go to work with their or and then system |
---|
0:06:20 | so you see that they used to an awful lot of data that figure of |
---|
0:06:24 | twenty two million recordings is not a misprint |
---|
0:06:30 | so the what the or nn architecture in the slides the diagrams refer just to |
---|
0:06:39 | the |
---|
0:06:39 | a classical memory module of them the and test again a memory module where |
---|
0:06:47 | in addition to an input vector at each time step you also have a hidden |
---|
0:06:52 | layers of encodes upon set straight |
---|
0:06:55 | and the one neural network does at each time step is that depends again but |
---|
0:07:00 | so the |
---|
0:07:01 | a hidden activation |
---|
0:07:04 | then squash as the dimension back down so the dimension of the hidden activation that |
---|
0:07:10 | i'm feeds a nominee repeated into a nonlinear z so you |
---|
0:07:13 | keep on updating a memory of the history of the utterance and that's |
---|
0:07:21 | a very natural sort of model |
---|
0:07:24 | for data with a left-to-right structure as in classical text dependent speaker recognition |
---|
0:07:31 | or and even machine translation |
---|
0:07:33 | and the was a |
---|
0:07:35 | was it paper |
---|
0:07:37 | okay so this is the classical or in an architecture |
---|
0:07:42 | there was a an extraordinary paper machine translation published and two thousand and fourteen |
---|
0:07:48 | which shows that it was possible to train a neural network for the |
---|
0:07:54 | french to english translation problem |
---|
0:07:57 | using an organ and architecture with a very special feature namely |
---|
0:08:04 | the was a single softmax |
---|
0:08:07 | okay in the what they call the encoder the encoder read french language sentences |
---|
0:08:14 | and |
---|
0:08:15 | the |
---|
0:08:16 | it was trained in such a way that the hidden activation the last time step |
---|
0:08:21 | was capable of memorising the entire french sentence |
---|
0:08:29 | so that all the information you need to you needed in order to do machine |
---|
0:08:34 | translation from french to english was summarized in the hidden activation at the last war |
---|
0:08:41 | of the of the sentence |
---|
0:08:44 | to get this work they have to use for layers of the nist m units |
---|
0:08:49 | it wasn't easy but they were able to get good results with a machine on |
---|
0:08:54 | a state-of-the-art results on machine translation task |
---|
0:08:57 | with sentences about the thirty warren's obviously that's must actually break down |
---|
0:09:04 | okay you can memorise sentences of indefinite duration this way just because the memory has |
---|
0:09:13 | a finite capacity |
---|
0:09:15 | but google data well if it works a machine translation is definitely going to work |
---|
0:09:20 | and |
---|
0:09:22 | text dependent speaker recognition will be possible to |
---|
0:09:26 | memorise the as a speakers utterance o a fixed hence frames |
---|
0:09:33 | so |
---|
0:09:35 | the other various ways them the past has been improved on |
---|
0:09:42 | an obvious thing to do instead of |
---|
0:09:46 | using the activation of the last time step to memorise an utterance would be to |
---|
0:09:51 | average the activations of all time steps |
---|
0:09:54 | but once again you would be taking the average activation and feeding it into a |
---|
0:09:58 | single softmax to do the to do the memory it's not one softmax per frame |
---|
0:10:07 | there was a bit of controversy as you can imagine and the machine translation field |
---|
0:10:11 | as to whether this would really was the right way to memorise entire sentences and |
---|
0:10:17 | that lead to a flurry of activity something called |
---|
0:10:22 | what was attention modeling |
---|
0:10:24 | okay where |
---|
0:10:25 | i mean the argument was that if you're going to translate from french to english |
---|
0:10:30 | then in the course of the english translation as you proceed work by where you |
---|
0:10:35 | want to direct your attention to the appropriate place and the in the french utterance |
---|
0:10:41 | and that's correspondence is not necessarily going to be monotonic because word ordering can change |
---|
0:10:48 | as you change one language to the other |
---|
0:10:51 | but that was and a model developed along these lines in the actual then shows |
---|
0:10:59 | about which i think |
---|
0:11:01 | planes to be the state-of-the-art in a text |
---|
0:11:06 | and |
---|
0:11:07 | machine automatic machine translation |
---|
0:11:09 | and what gotten set up to do was to |
---|
0:11:15 | take that idea and instead of using this sort of attention mechanism to weight the |
---|
0:11:23 | individual frames |
---|
0:11:25 | in the utterance to learn an optimal |
---|
0:11:28 | summary of a speakers production of the of the pass phrase |
---|
0:11:36 | and that was the thing that so actually work best for them |
---|
0:11:40 | so that this describes the task if a fairly classical text dependent speaker recognition task |
---|
0:11:48 | of the language was in german it was provided for us by the biphone stressed |
---|
0:11:57 | the results with the in the heavens well although the you know standard tricks worked |
---|
0:12:04 | as a as advertised of they were you know |
---|
0:12:10 | or the cold read you units rectified linear units dropped out some accent and so |
---|
0:12:15 | on each of them gave an incremental improvement in performance but |
---|
0:12:20 | we want able to match the performance of a gmm ubm |
---|
0:12:25 | and of course well the same thing happened with or analysis at doing intelligent summaries |
---|
0:12:34 | of they said data held but the results ultimately more disappointing |
---|
0:12:39 | and the reason |
---|
0:12:41 | it was quite clear that the reason |
---|
0:12:44 | with just one hundred development speakers we are going to |
---|
0:12:49 | hopelessly overfit to the to the data so |
---|
0:12:53 | at these methods are not going to work on less we have a very large |
---|
0:12:58 | amounts of data |
---|
0:13:03 | very large amounts of data ports are on the way i was |
---|
0:13:08 | talking |
---|
0:13:08 | just this morning to make a was set that the might be the possibility of |
---|
0:13:12 | getting a surly data |
---|
0:13:15 | where this sort of thing could be serious the as a viable plausible |
---|
0:13:24 | solution but it's clear that go term isn't going together up usually faces of that |
---|
0:13:30 | is solved |
---|
0:13:31 | is |
---|
0:13:32 | while he's been bitten by the |
---|
0:13:36 | by the neural network back so he's is task would be to trying to get |
---|
0:13:40 | convolutional neural networks working |
---|
0:13:45 | convolutional neural networks trained to discriminate between speakers working as a feature extractors |
---|
0:13:52 | for a text independent speaker recognition |
---|
0:13:55 | so |
---|
0:13:57 | what i would like to do it was just |
---|
0:13:59 | talk about what are our fans are for that |
---|
0:14:07 | what i thought it would do was first of all explain why this |
---|
0:14:11 | this is a difficult problem |
---|
0:14:13 | okay why |
---|
0:14:15 | we cannot expect out of the bars solutions |
---|
0:14:20 | already existing in the neural network literature to work for us |
---|
0:14:25 | a white nonetheless it's not in an superbly difficult problem and we ought to be |
---|
0:14:29 | able to do something about |
---|
0:14:31 | presently uncommitted |
---|
0:14:33 | to get in this work |
---|
0:14:34 | the to get in this work |
---|
0:14:36 | we are going to submit some sort of system for the for the nist evaluation |
---|
0:14:42 | but i think well it's going to take a bit longer to actually i and |
---|
0:14:47 | all the king set out of this |
---|
0:14:50 | so |
---|
0:14:51 | it seems to believe that |
---|
0:14:55 | it approach in this problem there are two fundamental questions that we need to be |
---|
0:14:59 | able to answer and how we answer them is probably going to dictate |
---|
0:15:06 | well direction we actually terry |
---|
0:15:11 | the car restroom about the backend which i'm particularly interested then |
---|
0:15:15 | but it's i actually of secondary importance |
---|
0:15:20 | so the first question i c is if we look at these success that feels |
---|
0:15:26 | like face recognition |
---|
0:15:29 | have a where |
---|
0:15:31 | a very similar biometric pattern recognition problem i'm taking thinking in particular of gee face |
---|
0:15:38 | one is it that it has more so spectacularly for them but we still haven't |
---|
0:15:43 | been able to get more |
---|
0:15:44 | that's what that's one question |
---|
0:15:47 | a second question would be |
---|
0:15:51 | if we look at the current state-of-the-art in text dependent speaker recognition |
---|
0:15:57 | because that's where we have a |
---|
0:16:02 | neural network trained to discriminate between senones |
---|
0:16:06 | collecting baumwelch statistics for a |
---|
0:16:10 | an i-vector field is a cascade |
---|
0:16:12 | wang is it |
---|
0:16:14 | if we simply trying to neural network to discriminate between speakers |
---|
0:16:21 | in the in the nist data what is it that we haven't been able to |
---|
0:16:25 | treat that architecture |
---|
0:16:28 | okay together to work satisfactorily |
---|
0:16:30 | in speaker recognition |
---|
0:16:34 | to my knowledge |
---|
0:16:36 | several people have tried this but haven't yet obtain a even a publisher result |
---|
0:16:42 | okay i'm it may be wrong about this be happy to select program wrong about |
---|
0:16:47 | this but i believe that this is where things stand a present |
---|
0:16:53 | so if we if we look at the |
---|
0:16:57 | at the deep face architecture became the |
---|
0:17:01 | so what these guys didn't facebook they had a population of four thousand development speakers |
---|
0:17:06 | one thousand images are |
---|
0:17:10 | speaker i |
---|
0:17:11 | subject okay |
---|
0:17:13 | one thousand images per for proper subject they |
---|
0:17:16 | trying to convolutional neural network to |
---|
0:17:20 | discriminate |
---|
0:17:22 | between this the subjects in the development population |
---|
0:17:26 | and use that as a feature extractor and one-to-one assumption that just that the output |
---|
0:17:33 | into a cosine distance classifier |
---|
0:17:35 | there are output was a few thousand dimensions but |
---|
0:17:38 | google later showed that you could do this with one hundred twenty dimensions but the |
---|
0:17:44 | same order of magnitude that we have found |
---|
0:17:47 | so be appropriate for characterizing speakers and |
---|
0:17:52 | text independent speaker recognition |
---|
0:17:55 | of course the fact that they have one thousand instances per subject but obviously does |
---|
0:18:00 | make like a lot easier |
---|
0:18:02 | then |
---|
0:18:04 | the market is four we have maybe time average |
---|
0:18:09 | but some people have raised a sort of more fundamental concern |
---|
0:18:13 | in our case we're not really trying to extract features from something that's |
---|
0:18:19 | analogous to static images |
---|
0:18:23 | because of the time dimension work on where we're confronted with model only |
---|
0:18:29 | are we dealing with utterances of variable duration model than a fixed dimension but |
---|
0:18:34 | the |
---|
0:18:37 | order of phonetic events is something that is nuisance for us |
---|
0:18:43 | okay we need to get a representation that's |
---|
0:18:47 | invariant under permutations with respect to the |
---|
0:18:51 | order of phonetic events |
---|
0:18:54 | i don't |
---|
0:18:55 | a convolutional neural network should be eight to solve multiples |
---|
0:18:59 | problems in principle |
---|
0:19:02 | because it will produce a representation that's invariant under permutations and the time dimension |
---|
0:19:07 | and in principle it will be able to handle |
---|
0:19:11 | utterances of variable duration |
---|
0:19:16 | there is an animal automatic segmentation image processing you seen that they do use convolutional |
---|
0:19:21 | neural networks with images of variable |
---|
0:19:25 | signs |
---|
0:19:28 | so i don't think it's hopeless but this would be my answer the question okay |
---|
0:19:34 | why |
---|
0:19:35 | two |
---|
0:19:37 | signal discriminant neural networks work but not speaker discriminant neural networks is because i think |
---|
0:19:42 | trying to discriminate between speakers on very short time scales is going to be very |
---|
0:19:48 | heart problem |
---|
0:19:49 | i think we should just stay away from the |
---|
0:19:51 | from the time being and the reason is very simple |
---|
0:19:54 | but the |
---|
0:19:58 | primarily |
---|
0:20:00 | variability in the signal at short time scales is necessarily phonetic variable |
---|
0:20:06 | not speaker variable |
---|
0:20:08 | it was very phonetic variability then |
---|
0:20:13 | speaker a speech recognition rather than what would not be possible |
---|
0:20:17 | okay so what happens again if we focus and if we take the same architecture |
---|
0:20:22 | as is used in signal discriminant neural networks at a ten milisecond frame advancement three |
---|
0:20:29 | hundred milisecond window |
---|
0:20:31 | then we're just gonna get swamped with the problem phonetic variability |
---|
0:20:36 | so |
---|
0:20:38 | it's actually quite easy okay to get neural networks working as a feature extractor |
---|
0:20:45 | if you use all utterances as the input i mean just encode the utterance as |
---|
0:20:50 | an i-vector you will get bottleneck feature that |
---|
0:20:53 | doesn't very good job of discriminating between speakers |
---|
0:20:56 | so |
---|
0:20:57 | if you feed and whole utterances they problem it some of the will but is |
---|
0:21:02 | actually too easy to be interesting i did not gonna get away from i-vectors |
---|
0:21:06 | if you go down to ten miliseconds i think is just going to get killed |
---|
0:21:09 | by the problem of phonetic variability and |
---|
0:21:13 | the sweet spot for the short term i think should be something like ten seconds |
---|
0:21:17 | okay that was marked in |
---|
0:21:19 | and language recognition |
---|
0:21:21 | and you'll see actually several papers in the in these proceedings |
---|
0:21:27 | that show that neural networks or good a extra features and language recognition |
---|
0:21:33 | if you're if you give them utterances of three seconds or ten seconds whatever |
---|
0:21:39 | but i would say that particular problem of |
---|
0:21:43 | getting down to short time time-scales is one that we should eventually be able to |
---|
0:21:47 | solve and we showed that go one |
---|
0:21:50 | okay i think if you want to |
---|
0:21:53 | use |
---|
0:21:55 | neural networks as feature extractor is not nearly for speech rec speaker recognition but also |
---|
0:22:00 | for speaker diarization then you are going to have to confront the problem |
---|
0:22:04 | okay you can't have a window of more than |
---|
0:22:08 | say five hundred milliseconds in speaker diarization or you're going to miss speaker turns okay |
---|
0:22:15 | so you |
---|
0:22:16 | we are eventually going to have to confront that problem how to normalize for the |
---|
0:22:21 | phonetic variability and |
---|
0:22:24 | in utterances of short duration if we're to train |
---|
0:22:28 | neural networks to discriminate between speakers |
---|
0:22:32 | i just mention |
---|
0:22:35 | paper of |
---|
0:22:37 | famous will be present in that attempts to deal with that problem with factor analysis |
---|
0:22:41 | methods |
---|
0:22:44 | the very last analysis |
---|
0:22:46 | i thought to be |
---|
0:22:49 | the idea would be |
---|
0:22:52 | i think this is going to work eventually okay you we should |
---|
0:22:56 | think of phonetic content as a |
---|
0:23:01 | short term |
---|
0:23:03 | channel effects |
---|
0:23:05 | okay one when i say short term i mean maybe five |
---|
0:23:10 | frames or chan frames in the normal |
---|
0:23:15 | way we think about channels this is sort of that this would be sort of |
---|
0:23:18 | hopeless okay you we can model channel effects that the resumes of the |
---|
0:23:24 | persistent over entire utterances but not at the level of say ten miliseconds however we |
---|
0:23:33 | do have the benefit of a supervision |
---|
0:23:37 | from that could be supplied by something like a signal discriminant neural network that tells |
---|
0:23:42 | you at each time step while the |
---|
0:23:46 | probable phonetic content |
---|
0:23:48 | that is |
---|
0:23:49 | so that it is actually possible to model phonetic content as |
---|
0:23:55 | a short lived channel effect and you can do that using factor analysis methods |
---|
0:24:01 | and that was the topic of famous as presentation you just a first experiment |
---|
0:24:06 | but i think that particular problem is going to be |
---|
0:24:11 | the solution of that problem is going to be a key element |
---|
0:24:15 | to i |
---|
0:24:17 | the guessing |
---|
0:24:19 | neural networks to discriminate between speakers i short just a short time scales |
---|
0:24:25 | okay so that's same about that so |
---|
0:24:36 | english |
---|
0:24:52 | okay so the i think that you said that you want to reduce and then |
---|
0:24:58 | to learn the same speaker variability how you while you're trying to think about how |
---|
0:25:03 | you like your yes the other one thinking about the softmax as the target speakers |
---|
0:25:08 | or you know for example i can tell you what we are interested in working |
---|
0:25:12 | is the what is trying to learn the cosine similarity between speakers so we have |
---|
0:25:18 | a skinny staring |
---|
0:25:19 | trying to mimic saying all this is the same speaker or different speaker would buy |
---|
0:25:24 | toward by learning some cosine similarity and tried to push the clusters friendly shoulders |
---|
0:25:30 | well my view about this and this is just a pen okay is that |
---|
0:25:37 | i believe that in order to get you are not forced to work in speaker |
---|
0:25:44 | recognition in the long run we are going to have to combine them with a |
---|
0:25:48 | general okay |
---|
0:25:51 | i the way it's you're working is that |
---|
0:25:56 | analogously to the face to face architecture we can hope to get neural networks working |
---|
0:26:03 | as feature extractors that would be trained to discriminate between speakers in the development set |
---|
0:26:09 | but used as feature extractors |
---|
0:26:13 | at runtime |
---|
0:26:14 | i would expect |
---|
0:26:16 | that |
---|
0:26:17 | we would have these neural networks i'll for thing |
---|
0:26:20 | so i axis |
---|
0:26:21 | okay i regular intervals as you as you go through an utterance |
---|
0:26:25 | and that the problem |
---|
0:26:27 | i believe that the interesting problem |
---|
0:26:30 | is how to design a backend |
---|
0:26:33 | to deal with that |
---|
0:26:36 | okay it in fact in fact involved modeling counts which you will be the |
---|
0:26:42 | the |
---|
0:26:44 | the topic of your presentation |
---|
0:26:47 | although i believe for |
---|
0:26:49 | there are other models which are just waiting to be used |
---|
0:26:53 | for the and thinking particularly of latent directly allocation |
---|
0:26:57 | which is the |
---|
0:26:59 | i'm along for |
---|
0:27:01 | i data eigenvoices four |
---|
0:27:05 | for continuous data |
---|
0:27:08 | and |
---|
0:27:10 | you can you |
---|
0:27:12 | i and the results that you want you can do is you can |
---|
0:27:17 | you can build an i-vector extractor using latent dirichlet allocation for count a so |
---|
0:27:22 | and if you can do eigenvoices you can also do |
---|
0:27:26 | an analogue of the of the i |
---|
0:27:29 | it'll behave very differently from the bleu we |
---|
0:27:33 | "'cause" it would've gaussian assumptions |
---|
0:27:35 | it won't even have this optional statistical independence between speaker effects and channel things |
---|
0:27:42 | that's a whole lot of thirty |
---|
0:27:44 | okay you can actually what basis for that the data with |
---|
0:27:49 | training the lda with unlabeled data you can do that's what latent dirichlet allocation |
---|
0:27:56 | so that it's actually very big |
---|
0:27:58 | figure here waiting to be useful |
---|
0:28:02 | only the question is do we want to go to tea |
---|
0:28:06 | the selected training of softmax forty want to go to direction of representation |
---|
0:28:11 | i think personally for this is just one and |
---|
0:28:16 | personally i believe |
---|
0:28:18 | the |
---|
0:28:20 | your networks |
---|
0:28:22 | okay or not to our task okay |
---|
0:28:28 | we could never hope to the |
---|
0:28:30 | training on labeled data |
---|
0:28:33 | with just a matter for you and that was cannot discriminate between speakers of the |
---|
0:28:37 | don't know harms the listener |
---|
0:28:39 | so i think you will need to be complemented by a backend which is waiting |
---|
0:28:47 | to be developed |
---|
0:28:48 | not the backend but we have present person |
---|
0:28:54 | okay |
---|