0:00:06alright
0:00:08everybody uh
0:00:09whose
0:00:09so i want
0:00:10to uh
0:00:13already start breaking before the wine tasting
0:00:16less than i i can continue now
0:00:18um
0:00:19they want to talk about speaker linking and um what what can you actually
0:00:23suspect
0:00:25uh before we go to go for the wine tasting
0:00:28a few things
0:00:29um
0:00:31and i don't doubt
0:00:33uh a high price question
0:00:35i have a graph
0:00:36uh even for the mathematicians i have a formula
0:00:39um and also a picture
0:00:42and
0:00:42finally
0:00:44maybe or maybe not depending on uh
0:00:46or how well i do i
0:00:48joe
0:00:49so
0:00:49that's start with the
0:00:51and it out
0:00:53by the way uh if if you're not interested in this this subject you can you can keep yourself busy
0:00:58with
0:00:58with detecting oovs
0:01:00specific events that i
0:01:02use case you
0:01:03um
0:01:05alright so
0:01:06i was reading a book
0:01:08i have a look at home
0:01:09and um
0:01:11i haven't finished it yet but it's it's it's
0:01:13it just think it it tells about how uh people in world war two
0:01:17um
0:01:19yeah the the the pitch in this case we're we're eavesdropping on the communication
0:01:23uh of the uh five guys
0:01:25uh from from that
0:01:27english perspective
0:01:28and um
0:01:30they said they were listening to the morse code signals and
0:01:33codes
0:01:34where encrypted
0:01:36but still they were able to it
0:01:38did you some kind of information others namely the person behind
0:01:42the morse code apparatus
0:01:45and uh i think you in morse code technology that this corpus
0:01:48the face so that's why i assume that is the way your face
0:01:52goes up and down the operators
0:01:54uh so even though they didn't know the identity of the people
0:01:57they were able to link together one broadcast
0:02:01maybe at one particular instance in time from one particular direction or whatever
0:02:05uh to another one
0:02:07later
0:02:07and then they could use
0:02:09uh movements of troops so even though the messages themselves are encrypted there were still able to
0:02:14did you some information so that is
0:02:16that gives you an idea of uh
0:02:18oh
0:02:19but this could be used for
0:02:21useful for
0:02:22so
0:02:23another example of of linking were clustering as you might uh
0:02:27uh colleges
0:02:29i i i would rather
0:02:30pacific implementation
0:02:32um is uh actually it
0:02:35it's on the web it's done by uh by big software
0:02:38where
0:02:39based software firm
0:02:41uh you could do this with photographs with with basis in fact
0:02:45so uh and it works pretty well see about a hollow
0:02:49pictures and even though the clustering itself isn't very good
0:02:53uh in terms of actual forms figures
0:02:55uh you
0:02:56you get a cluster in this case
0:02:58uh
0:02:59it's high shoe and already and you
0:03:01click one of them away you type
0:03:03two or
0:03:04three letters of person and
0:03:05of course you can your your email
0:03:08um
0:03:09a database
0:03:10and
0:03:11that you made a new cluster you get the next person et cetera
0:03:15so it works very well but
0:03:16the in and in uh
0:03:18interactive setting
0:03:19even though the
0:03:20clustering performance
0:03:21itself in these particular cases
0:03:23pretty bad
0:03:24so
0:03:25um
0:03:27now just a short intermission so
0:03:29and the way i see clustering it's actually kind of old fashioned we're doing
0:03:33some kind of identification right
0:03:35put
0:03:36people by their forties
0:03:38and make
0:03:39hard decisions about this
0:03:41um
0:03:42sort of identification and we don't like identification has the problem of of the priors
0:03:47yes
0:03:48nicholas eight if you want to do proper identification
0:03:50you need a prior don't know them what are we going to do with them
0:03:54uh just a little test
0:03:56for you we will work with
0:03:57equal error rate even in language recognition
0:03:59so suppose you have
0:04:01uh a system with a certain equal error rate
0:04:04five percent means
0:04:05to class system
0:04:06detection
0:04:08now you're going to apply this system to eight to speaker identification system
0:04:13and you do your identification by
0:04:16taking the segment
0:04:17compute a score to one more one computing score
0:04:20for another and you
0:04:21you
0:04:21choose
0:04:22you have an equal prior
0:04:24uh
0:04:25good
0:04:25the the model with the maximum
0:04:27like to score
0:04:29and you don't do anything clever no discriminative training between two speakers with a
0:04:34so the question then is uh
0:04:37what is your identification error rate going to be
0:04:40gonna be one percent five percent or ten percent
0:04:42so it's not a question you can think about
0:04:45uh during this uh
0:04:46this
0:04:47if you don't want to watch a slide
0:04:49so speaker like linking i
0:04:52yeah
0:04:52i'd
0:04:53this this term was actually used
0:04:55uh we're inspired
0:04:57by uh by
0:04:58george overton is a see those lots of
0:05:01uh
0:05:01inspiration
0:05:02in in
0:05:03in
0:05:04the question what kind of questions should be sold in speaker recognition
0:05:08and it was good
0:05:10dismissive
0:05:11type of
0:05:12of
0:05:13of answers george
0:05:14can can do these kind of things
0:05:16uh as
0:05:17note that a speaker link
0:05:19it's a back problem
0:05:20so
0:05:21uh
0:05:23it
0:05:25whatever and
0:05:26i am interested in a large
0:05:29set
0:05:29a speech segment so
0:05:31um
0:05:33i want you if i want
0:05:34here diarization within a single show i think that's fantastic but i would actually like to do
0:05:39diarisation over
0:05:41all the television shows
0:05:43oh for
0:05:43uh entire year
0:05:45what
0:05:45or whatever
0:05:46large scale problems
0:05:49and again it's kind of a clustering want to link those speakers
0:05:52and
0:05:53i think it
0:05:54the large scale thing
0:05:56is
0:05:57a problem and that's what i want to uh
0:05:59to show
0:06:01or want to investigate
0:06:02so it's a bit export
0:06:04so
0:06:04presentation
0:06:06alright nick already said
0:06:08uh
0:06:08previously that is related to all kinds of other things
0:06:12speaker clustering of course
0:06:14basically the same problem
0:06:16but we're focusing now large
0:06:18large scale problems
0:06:20uh partitioning
0:06:22probably much nicer way of doing things but
0:06:25uh you need prior distributions overall
0:06:28partitionings
0:06:28and it doesn't probably work for
0:06:31for large scale
0:06:32um
0:06:34that has
0:06:34relations with lots of other things
0:06:36first of all diarisation
0:06:38and diarisation you need to also the segmentation of course
0:06:42um
0:06:43and that happens
0:06:44but you typically applied are stationed within a cluster of within a
0:06:49a single
0:06:49a single recording and like i say i'd like to
0:06:52make the links
0:06:53between recordings as well
0:06:56um insulation with the uh
0:06:58so i
0:06:59to wire training conditions in uh in in this
0:07:02speaker recognition evaluation
0:07:05um
0:07:07um
0:07:09with their uh first of all you have
0:07:11uh diarisation as an
0:07:13additional cost
0:07:14and
0:07:15you know that there is exactly one common
0:07:18speaker
0:07:19common link between all the training segments that you have so this
0:07:22there's more prior information
0:07:25um
0:07:26also speaker tracking
0:07:28um
0:07:30is related there i think the problem is that you are given
0:07:34a mobile for a particular speaker and then you have to find it in a large
0:07:37collection
0:07:39um
0:07:40and finally it is of course related to
0:07:43clustering in general
0:07:46uh with the difference that in many clustering problems is not really here
0:07:50what the classes are if you look at topic clustering yeah what makes topic topic
0:07:55can be something else and here with speaker of course we know the truth
0:07:59well very quick overview of that type
0:08:02of
0:08:02clustering algorithms which take clustering as a solution to this problem
0:08:06uh there is
0:08:07is
0:08:08oh
0:08:08damn
0:08:09way
0:08:10and you might see the way we train our gmm says that it's a way of doing this
0:08:14you start with
0:08:15the single clustering attracts people
0:08:17uh clusters that are more similar
0:08:20or you can do baltimore
0:08:22but also a sorry
0:08:23or agglomerative clustering
0:08:25i mean
0:08:27this is typically what we do in diarisation in the beginning and diarisation also that
0:08:32so from the from from the top
0:08:34so there you start with
0:08:36individual segments and you try to cluster together into you say this is enough
0:08:40i have found my classes now
0:08:42i'm sure that there are many
0:08:43more
0:08:44clustering or algorithms
0:08:46that are actually better than
0:08:47then
0:08:48these kinds but now concentrate actually
0:08:51the
0:08:52agglomerative closer now
0:08:55um
0:08:56one of the things that i
0:08:59i the bothers me about this
0:09:01this clustering is that
0:09:03that it doesn't
0:09:03scale with time
0:09:05if you if you take the easy to use
0:09:07simplest agglomerative clustering idea
0:09:10then you start with the number of
0:09:11for all segments and you find the best matching class them together
0:09:15and then you do it again
0:09:17um
0:09:18and the total complexity would be
0:09:21all of the order into this
0:09:22third power
0:09:24and if you then want to get
0:09:25intermediate
0:09:27updates in some kind of online
0:09:29situation show you recorded
0:09:31shows over a whole year and you get an extra show with extra speaker segments
0:09:36and you want to put them in
0:09:37then again you have
0:09:38uh
0:09:39uh
0:09:41an extra order
0:09:42of
0:09:42complexity
0:09:44in total
0:09:45of course you also get
0:09:46getting time every day so that
0:09:48make you doubt
0:09:49done incremental
0:09:51context you thank you
0:09:53um
0:09:56what's the next uh
0:09:58thing
0:09:59oh yeah you
0:10:00this is if you do
0:10:01agglomerative cluster
0:10:02offline so you collect your data and then you say i'm going to do that live clustering in a in
0:10:06a very careful manner
0:10:08posted online the saying either one segments next segment is insane
0:10:12either a cluster or to make a new cluster
0:10:14that's a lot simpler
0:10:15so that
0:10:16the the incremental complexity is now the order of the number of
0:10:20found cluster
0:10:22and for the divisions ski
0:10:24i don't know exactly what i think it's
0:10:26it's
0:10:27also
0:10:29um
0:10:30some aspects of of this clustering is you can decide
0:10:33either to retain your models
0:10:37during the clustering process or not
0:10:39so there's some advantages if you if you do that you have more data from all better models but
0:10:44you browse might also
0:10:47um
0:10:48there's another question is
0:10:49are you going to use
0:10:51the data in your your speaker comparison
0:10:54trix
0:10:55two
0:10:55to do some form of normalisation
0:10:57for instance
0:10:58for the general acoustics
0:11:00that you
0:11:00getting
0:11:02or
0:11:03you want to normalise scores
0:11:05or it may be even better when you want to
0:11:07trained discriminatively so stranger clusters discriminatively believing that addresses are very well then you probably get much better
0:11:14speaker separation
0:11:15this is something we
0:11:17local news
0:11:17doing in this
0:11:19speaker detection
0:11:20for good reason
0:11:21uh but i think if you
0:11:23really
0:11:24art and
0:11:24the scene clustering you might consider to do these things but
0:11:27i also think that or not
0:11:29trivial to do
0:11:31another aspect
0:11:32would be are indeed going to make decisions are we going to make hard
0:11:36clusters we're gone
0:11:37you speaker segments together or not
0:11:39are going to do it in some kind of software which is moral
0:11:43suppose lines of the
0:11:44speaker partitioning
0:11:45with priors
0:11:46every
0:11:47i might be better to
0:11:48the soft
0:11:49soft way i think if you can call
0:11:51comparison way
0:11:53two
0:11:54uh the way uh you do
0:11:56you do
0:11:57you attribute your data
0:11:59two
0:12:00you're mixtures in the gmm that's also done in software
0:12:03and that
0:12:04that works
0:12:04better
0:12:05then
0:12:06if you do it the hard way so
0:12:08this is something to consider as well
0:12:12alright another aspect or speaker clustering would be
0:12:14highway evaluate how well i'm doing for speaker detection we
0:12:19we have found
0:12:20very far into
0:12:22uh
0:12:23defining good
0:12:25evaluation measures
0:12:26the problem
0:12:27while we understand
0:12:29where are we gonna do for clustering usually people that do clustering
0:12:34have a
0:12:35some form of single evaluation measure and i
0:12:38i don't know which ones are the best
0:12:40but
0:12:40the ones that i like
0:12:42are
0:12:43the
0:12:44impurities
0:12:45or from
0:12:46suppressed impurities but we like to look at errors
0:12:49so i i
0:12:50i'd go with the impurity
0:12:52as basically if you if you have your cluster
0:12:55in the end
0:12:56of the clustering process you want to know
0:12:58uh how homogeneous is
0:13:00and the simplest way of looking this what is the most occurring speaker
0:13:03and what fraction
0:13:05does that
0:13:06or
0:13:07in impurity measures
0:13:09how much difference
0:13:10segments
0:13:11are there
0:13:11compared to the most current
0:13:13i if you want to express this mathematically
0:13:15then uh
0:13:17the way i
0:13:18after the fine it looks rather complicated
0:13:21um
0:13:22but i couldn't get it any simpler
0:13:24um
0:13:26but the interesting is sitting is the this is the cluster purity you you know you you see in in
0:13:32general cluster literature but i think there's always the other side in speaker detection we know there's always the other
0:13:37side
0:13:38so in
0:13:38in
0:13:39cluster impurity that's comparable to
0:13:42to minimising
0:13:42false alarm some you know there's always the the downside
0:13:46the missus
0:13:47so
0:13:48we should also the find something like
0:13:50speaker impurity
0:13:52which is
0:13:53the same
0:13:53definition then with respect to the reference speaker
0:13:57and uh you don't always
0:13:59see these things but uh
0:14:01i think you should just computed both and see how they train
0:14:04in your final clustering uh
0:14:06but
0:14:08and the the the reason is
0:14:10it is trivial to make uh
0:14:12uh
0:14:13cluster impurity of of zero so perfect clustering
0:14:16by just making a single cluster for every segment you
0:14:20so those
0:14:21not into
0:14:21you need the other
0:14:23the other part
0:14:25uh there's also other measures which which are more probabilistic of nature's around and looking only at the most
0:14:31we can
0:14:31see frequently occurring
0:14:33speaker in your cluster you can actually look at the
0:14:36at the whole distribution
0:14:37so you get some kind of that
0:14:38entropy
0:14:39measure for your cluster
0:14:42you can average of the roll cluster are weighted by
0:14:45the
0:14:45the number of segments in each cluster
0:14:47and again
0:14:48um
0:14:50not
0:14:50on this slide
0:14:52not only the cluster entropy you can define
0:14:54can also define
0:14:55okay
0:14:56speaker entropy
0:14:57sure
0:14:57again look at both
0:14:59these measures
0:15:01but then we come to the uh
0:15:03experimental section it
0:15:05it's a small experiment
0:15:07um
0:15:08actually carried out a while ago
0:15:10seems pretty ancient uh
0:15:11in terms of speaker recognition develop
0:15:14uh it's
0:15:15can
0:15:15being
0:15:16it's good that it's uh
0:15:17two years ago and and at that time we had a state of the art system
0:15:21a fourth
0:15:22we still have the same system but is not always say the art
0:15:25but anyway so it's a
0:15:27gmm svm system weedy nor
0:15:29at reform
0:15:30pretty well on on on the two thousand six
0:15:33uh he fell set and that the experiment was done
0:15:36uh a warm the
0:15:38preparing for a three two thousand eight
0:15:41so that's why work with that data
0:15:43that at the time we didn't have the
0:15:46the truth data
0:15:47oh
0:15:48two thousand eight yet
0:15:50so we using the two thousand six data i would
0:15:52simply use all
0:15:53the test
0:15:54segments are there is some
0:15:56some
0:15:57thirty seven hundred test segments as male or female and say well you can do it up fish
0:16:02speaker today
0:16:03should not have
0:16:04cross gender trials because
0:16:06those portions of charles tend to be one target trials and kind of not fair
0:16:11uh but here we're not really doing speaker detection
0:16:13we do clustering so it
0:16:15if
0:16:16gender gives you
0:16:17if you some information about cluster them
0:16:20maybe it's fair to use them
0:16:22um and moreover are
0:16:24hours
0:16:24system at the time was completely gender independent or was not a single
0:16:28condition one
0:16:29gender there
0:16:31five minutes
0:16:32um so two versions of agglomerative clustering one online
0:16:37so taking one second
0:16:38the time
0:16:39making decisions
0:16:40and one
0:16:41a part
0:16:42click here for vocal work
0:16:44this would result
0:16:45so you see speaker
0:16:47impurity versus uh
0:16:49cluster impurity
0:16:50for both
0:16:52type
0:16:52of uh agglomerative clusters
0:16:54clustering
0:16:56and um
0:16:57well you can define something like an equal impurity
0:17:01point
0:17:02uh i i put the debt curves
0:17:04uh that access for
0:17:06for people
0:17:07but
0:17:08can't live without
0:17:09that accent
0:17:10i actually works
0:17:11very well because that there's no reason why these
0:17:14first straight not not one that i understand
0:17:17but easily
0:17:18but it works
0:17:19very well you see that these two different
0:17:21kind of approaches one as much
0:17:22simpler the online version is much simpler than the offline version
0:17:27form more or less
0:17:28say
0:17:29uh
0:17:30another interesting thing or
0:17:31not
0:17:32counting from abroad
0:17:33in terms of T norm
0:17:35uh thresholds that you
0:17:37put
0:17:38uh
0:17:40in in the cluster our them for for stopping
0:17:42uh
0:17:43these are
0:17:43quite different for the two algorithms so that will be easy
0:17:46look at
0:17:47different things
0:17:48uh
0:17:49what it worked even
0:17:51uh and this
0:17:53is the the the last subject and that's
0:17:55the scalability
0:17:56all uh
0:17:57of this whole process because i mean the thing large numbers so
0:18:01because
0:18:02mentioning a thousand
0:18:04uh
0:18:05segments
0:18:06that we doing seven minute segments but
0:18:08for two thousand six i did not more
0:18:10at least i didn't
0:18:11take more than seven
0:18:12signal
0:18:13um
0:18:14so here i am
0:18:15looking at what scale what what the the
0:18:18equal impurity is as a function or
0:18:20number of segments on a log scale and
0:18:22you see
0:18:23uh
0:18:24some people would call this uh
0:18:26uh
0:18:26graceful
0:18:27degradation i think that's a fantastic where they learn to use it
0:18:31and then some vision vision
0:18:33graceful degradation and it's it it
0:18:36it's a single work some award number of
0:18:39all speakers
0:18:40my dad
0:18:40to do with the way to this
0:18:42this
0:18:43segments are chosen
0:18:44in the east
0:18:46evaluations because
0:18:48um you can also express it as the number of speakers
0:18:51but then and linear access you could actually exactly the same graph
0:18:55so
0:18:56seems that this relation between the number of segments and number speakers if you just randomly leave out
0:19:02segments in order to reduce the problem that's what i did
0:19:05going from
0:19:06the full problem here
0:19:08down
0:19:09i just uh randomly
0:19:11uh left out
0:19:12second
0:19:13so again you see the same kind of
0:19:16graceful degradation but there
0:19:18for for this performance
0:19:19speaker recognition system there is some number here
0:19:23where
0:19:23we will have an equal impurity of fifty percent if this
0:19:26this trend is good
0:19:27and
0:19:28we shouldn't go beyond
0:19:29so i think
0:19:31the
0:19:33if you define the problem of speaker
0:19:35clustering or speaker linking
0:19:37uh you have a problem with the
0:19:39scalability in terms of the number of speakers or number of segments or
0:19:43whatever you want to look at that
0:19:45so from that perspective i think it's interesting problem
0:19:47from gets harder has to do with the fact that
0:19:50i suppose
0:19:51identification
0:19:53gets harder with more
0:19:54class
0:19:57okay
0:19:58um
0:19:59that said that
0:20:02there is actually in in different fields or something
0:20:05called C and C
0:20:06as an analysis tool and you remember what it is but it's something like it measures
0:20:12how well
0:20:13your
0:20:14target
0:20:16object is in the
0:20:17based
0:20:19and
0:20:20uh
0:20:21classes of class segments returned
0:20:24um
0:20:25it's around and looking at identification one and identification you're looking at
0:20:30this
0:20:30how it goes with
0:20:31with
0:20:32uh
0:20:33with two and
0:20:34you might say
0:20:37um
0:20:38and that has been analysing in different
0:20:40literature already
0:20:44good
0:20:45um
0:20:46and of course the real nice thing thing would be in once we have
0:20:49define or you cultivation measure any good taste
0:20:52and a and a and a proper test that that
0:20:54that we understand
0:20:55forms of all scales
0:20:57that of course you can look at different
0:20:59our the because they are in my
0:21:01use here
0:21:01is
0:21:02pretty trivial
0:21:04and i'm sure that you can use global algorithms that
0:21:08that consider everything at the same time
0:21:10and
0:21:10perform or much better
0:21:12and
0:21:13um
0:21:15of course there's also
0:21:17a question i didn't say that i started with the score
0:21:20matrix are just scored everything against everything
0:21:23which is pretty moment at the time if i think about it now everybody scores everything against everything but
0:21:28at the time was kinda
0:21:29uh
0:21:30you would score nice
0:21:32nice try at least
0:21:34but can we do better than that
0:21:36so we use
0:21:38no alternative
0:21:40speech segments that we've already seen ordered receding in global
0:21:44for either normalisation and discriminant
0:21:46training
0:21:48that's another
0:21:49and that's a question and that's
0:21:51where i'd like to still visiting
0:21:53time
0:21:54that
0:21:58i've run over time
0:21:59is that
0:22:00great
0:22:02so you five minutes i'm fine
0:22:07well i don't have any slightly more at that was kind of a nice way to
0:22:17thanks david
0:22:18which is
0:22:19my signing about
0:22:21we have time for some comments or questions
0:22:27this uh this then yeah
0:22:29and that use um
0:22:30um
0:22:31yeah that's uh
0:22:32it seems quite
0:22:33the
0:22:34quite upsetting
0:22:35as you as you mentioned
0:22:37uh
0:22:38it seems like things that thing to break down
0:22:41we uh
0:22:42that the problems that are too long
0:22:44so
0:22:44yeah
0:22:45uh
0:22:46and and
0:22:46then
0:22:47and your conclusion you conjecture that maybe you
0:22:50do some
0:22:50bit of
0:22:51considering
0:22:52yes
0:22:53school
0:22:54so
0:22:55yeah
0:22:55i think that
0:22:57i have
0:22:58uh
0:22:58exactly the problem
0:23:00oh yeah
0:23:01sure
0:23:02in uh
0:23:03in
0:23:04in
0:23:04and my method for example and also in the uh
0:23:08variational bayes method which
0:23:10patrick
0:23:11messages to be tried
0:23:12for the following workshop
0:23:14uh
0:23:15you effectively
0:23:17uh
0:23:18or
0:23:18looking
0:23:19the type that one
0:23:21the
0:23:22but in in an unsupervised way
0:23:24yes so that um
0:23:26my mike that train
0:23:27so we'll have to
0:23:28we'll have to
0:23:29we'll have to like look at that and see if we can
0:23:33true
0:23:37that that
0:23:39again
0:23:42click
0:23:45or whatever
0:23:46i'm i'm not the
0:23:48both
0:23:54yeah um
0:23:56can you
0:23:57i've got two questions
0:23:58one um you disappointing
0:24:00speaker linking
0:24:02could you clarify
0:24:03which
0:24:04different
0:24:05from speech
0:24:06cluster
0:24:07in the end
0:24:07i don't know the scale thing but
0:24:09it it's
0:24:10it's
0:24:10it's the same problem that's just the way i see it is is is
0:24:13is
0:24:14linking is more like a task and clustering and more like uh
0:24:18a way of doing it
0:24:20i think there are otherwise identical
0:24:21and the reason why colour blinking is because we're all so busy with large scale diarisation and there you have
0:24:26two steps
0:24:27states yeah first within
0:24:30uh
0:24:30say we within your meeting or within your
0:24:33broadcast
0:24:35segmentation clustering kind of things diarisation
0:24:38and then you try to link
0:24:39the
0:24:40the different
0:24:41clusters
0:24:42between
0:24:43meetings are between the uh broadcast
0:24:45and
0:24:46in order to separate
0:24:47things there
0:24:49we call that linking rather than clustering with otherwise we'd have clustering here clustering there
0:24:54maybe a little less uncertainty on the
0:24:56speech segment for one speaker
0:25:00but
0:25:01um
0:25:03the second question was uh i'm still there was mention i'm puzzled
0:25:07online
0:25:08uh system
0:25:10i mean to me
0:25:11it looks like it
0:25:14top down
0:25:15cluster
0:25:16this
0:25:16uh or
0:25:20you think that the lines you
0:25:22for worse than no um and possibly
0:25:24condition
0:25:25um because you're
0:25:27and it's not one of my more
0:25:29right
0:25:29the
0:25:30this online clustering
0:25:32oh and i'm not sure whether it's
0:25:34the agglomerative no well i supportive in a sense yeah at one
0:25:38at the time you try to fit
0:25:39somewhere in your clusters
0:25:41maybe not
0:25:42more formally aboard
0:25:45yeah
0:25:50so
0:25:51i
0:25:52a ninety
0:25:54speaking
0:25:55something
0:25:55fast
0:25:56linking partition
0:25:57things that
0:25:58so
0:25:59uh
0:25:59like like it'd it even more involved
0:26:01some of this
0:26:02speaker clustering and
0:26:03language class
0:26:05happening in
0:26:05there
0:26:06and
0:26:06papers looking at it
0:26:08but
0:26:09the point we ran into it at one point in this
0:26:11like
0:26:12just about the plastic
0:26:14that's
0:26:14is that
0:26:15in general
0:26:16these
0:26:17and like task trying to come up this mess
0:26:20perform
0:26:21class to make sure
0:26:23diarisation
0:26:25you know these horribly complicated like pca measures and all that
0:26:29i think that they should keep in mind is
0:26:31somebody's measures it
0:26:32yeah
0:26:33it's hard to relate to
0:26:34actually doing
0:26:36and actually yeah
0:26:38yeah
0:26:39for different diarisation error rate right we use that
0:26:41really
0:26:42a lot of diarization error down
0:26:44little late
0:26:45diarisation or
0:26:46cluster
0:26:47all these other things i view them as being
0:26:49oh right
0:26:51it's not that they're not worth working on
0:26:52but it's not really we're gonna get a single measure performance
0:26:55say
0:26:56when we optimise this
0:26:58it always opens a question
0:27:00now what i do it
0:27:02so diarisation has this problem to look at uh
0:27:05speech recognition make they want to do diarisation
0:27:08or clustering
0:27:09on audio
0:27:10for adapting
0:27:11speech recogniser
0:27:12they hear what they want for doing adaptation
0:27:15it's nothing like what we would say oh that's really good diarisation
0:27:19so let's see here i think
0:27:20we're going into these things and not at three days and then come back
0:27:24diarisation
0:27:25the partitioning the linking and talk about it
0:27:28but at one point
0:27:29we're gonna have to be a little careful about
0:27:31creating
0:27:32these numbers and saying oh i got a number X
0:27:34better than why
0:27:36but all the people are gonna say well what's the nation that
0:27:39what what is it
0:27:40i help you do
0:27:41what did why that is
0:27:43no
0:27:44it's not clear to me
0:27:45point two we start
0:27:46uh actually liking them or something
0:27:48doing
0:27:49at the end
0:27:50yeah
0:27:53so uh
0:27:54experiment linking
0:27:55then
0:27:56hmmm
0:27:57are you
0:27:58batch of
0:27:59this
0:28:01on the text
0:28:02yeah
0:28:02um
0:28:03yeah um
0:28:06yeah i agree i mean that it did this this is just
0:28:09an application and i think
0:28:10the focus here
0:28:11more was to look at what what happens
0:28:13if
0:28:14if
0:28:14things
0:28:16scale
0:28:16scale up
0:28:18and and that the nice thing about speaker detection is that
0:28:21you don't have to worry about that
0:28:22things go up just get
0:28:24better
0:28:24estimates of how well you're doing
0:28:27but
0:28:27did
0:28:28theory the
0:28:30reform and the cost function whenever it should be more or less the same shoot you
0:28:35stabiliser thing here
0:28:37this is not
0:28:38okay shifting you might say okay you're doing the wrong thing
0:28:41on the other hand
0:28:43trying to
0:28:43rotation
0:28:45speakers
0:28:46might be useful
0:28:47thing
0:28:49so there's an example in this case
0:28:51right weightings
0:28:52okay
0:28:53what's yeah
0:28:54okay
0:28:55you get these clusters out
0:28:57what does someone do it
0:29:02yeah
0:29:02do you uh
0:29:06put in the same boat although i mean that
0:29:08yeah
0:29:08the
0:29:10i give it and saying
0:29:11C D
0:29:12i mean yeah
0:29:15this does i open this up in general
0:29:17partitioning in other things that are going on
0:29:20such as
0:29:21you get these things they
0:29:22in some sense you get that stuff
0:29:24what am i
0:29:25doing
0:29:26i get a cluster
0:29:27yeah thousand clusters at night
0:29:29average
0:29:29here it is
0:29:30yeah percent
0:29:32but if someone says
0:29:33i know i want
0:29:34do what
0:29:35so my searching for somebody in my
0:29:36so for example we went to this day
0:29:39trajectories in this details
0:29:41we started a trend
0:29:42clustering
0:29:43and
0:29:44some of the things
0:29:45no
0:29:45working it's which were diarization
0:29:47although lately that pulled away from this doing diarization instead
0:29:50the task is detection we're gonna give you
0:29:53we wanna see how it was
0:29:54in the context
0:29:55doing diarisation help you purify your data
0:29:58to roland
0:29:59test models
0:30:00we wanna see how well you
0:30:01do
0:30:02linking these two together and trying to see that correlates T diarization error rate in in that
0:30:07cation like detection task
0:30:08was that one but it's
0:30:10very loose
0:30:12it seems
0:30:12ah
0:30:13that's one thing
0:30:14here i think in general people are put up this task
0:30:18to another
0:30:18talking on
0:30:19is
0:30:20linking
0:30:21some
0:30:21you could say
0:30:22if i drop
0:30:23twenty percent ten percent
0:30:25did i get
0:30:26twice
0:30:27good
0:30:27in my and application matter
0:30:29and i using centimetres
0:30:31things are miles
0:30:32i just don't have like
0:30:35well guess
0:30:36here
0:30:37a rates go down you you do better
0:30:39but
0:30:40where is good enough
0:30:43today
0:30:45fig
0:30:45common
0:30:46this
0:30:47yeah
0:30:48you he's very close
0:30:50your time
0:30:51proposed in the present
0:30:53nice
0:30:53me and myself
0:30:55and the only difference it was exactly the same task
0:30:58to explain the interest of a task
0:31:01we
0:31:02so cues
0:31:03in
0:31:03we should use on
0:31:05no
0:31:05oh raw
0:31:07like you
0:31:08T V book us
0:31:09if you days
0:31:10and you know you speaker diarisation each recording
0:31:14and you want to load
0:31:15after that to combat was you know
0:31:18and this is a real
0:31:20if you will with a
0:31:21national media organisation
0:31:24uh they don't uh the the computer for this one
0:31:27come back
0:31:28seeing on each time they need to do
0:31:30when indexing task
0:31:32you mix
0:31:33according to me would be different
0:31:35and the second
0:31:36constraint is
0:31:37you should have like you
0:31:39you say something to have to implement the indexing implement the
0:31:44we'd formation
0:31:45you can't
0:31:46each done or receiving a new file
0:31:48uh come back to and we
0:31:51we do all
0:31:52computing
0:31:53in this case you have a strong difference between generalisation
0:31:56on one one or a few times
0:31:59and speaker tying your
0:32:01you key
0:32:02well you could have a
0:32:04hundred of thousands of hours of video
0:32:08last comment the
0:32:09you could
0:32:11okay
0:32:17ah
0:32:18okay i'm i'm i'm also just
0:32:20uh on on something
0:32:21that so
0:32:23in this sense
0:32:24uh
0:32:25why we brought this
0:32:27uh
0:32:28this kind of problem and
0:32:30also about uh
0:32:32devalued i submit
0:32:34um
0:32:35i i i didn't uh
0:32:38uh
0:32:39discuss this in my presentation
0:32:41but
0:32:42in the paper
0:32:43i show how you can
0:32:45uh
0:32:45do all the usual this task
0:32:48um so
0:32:49the partitioning problem
0:32:50the multiple training
0:32:52the unsupervised adaptation
0:32:54yeah
0:32:55and so on so by
0:32:56right
0:32:57generalising we can we can learn more
0:33:01about
0:33:01all that
0:33:02the normal tossed
0:33:03that we're doing
0:33:04um
0:33:05so
0:33:06if you can solve this problem can solve everything else
0:33:10so
0:33:11sorry
0:33:18yes if you're segmentation is given it's not just the last
0:33:21the last
0:33:22the last
0:33:22the last thing is
0:33:25uh
0:33:25the evaluation metric
0:33:27um
0:33:29uh
0:33:30you could use it sort of any practical purpose namely
0:33:33uh
0:33:33numerically optimised
0:33:35discriminant
0:33:36right
0:33:36so
0:33:37uh
0:33:37this is probably what i'm going to be doing the next three weeks at at at the workshop and
0:33:43uh
0:33:43oh
0:33:44in
0:33:44be
0:33:45mazes here
0:33:47probably
0:33:48what i do
0:33:49going to use
0:33:50okay so what what application and it's keeping you go
0:33:53so that yeah
0:33:55limits its uh
0:33:57thanks to be with