0:00:16good morning everybody a well nigh
0:00:19contribution here and will be more most focus on keeping
0:00:24overview of some rating guidelines that have been developed in the last two years
0:00:29concerning directly or indirectly a speaker recognition systems or semi automatic speaker recognition we human
0:00:39intervention the feature extraction mainly
0:00:41and then the message format that before
0:00:45doing something with speaker recognition in court in europe at least we should read this
0:00:50guidelines because they're being generated after process of consensus among some community so i think
0:00:58they're relevant community so it's that's a message phone we want to do something you're
0:01:02if you're not from europe i thing at least it they deserve a re the
0:01:06not to know what's going on in you know or environment
0:01:12well the first one is
0:01:13and the so called m c guideline for evaluative reporting in forensic science most of
0:01:18you probably already know
0:01:20eight was released in two thousand and fifteen i'll talk about later
0:01:24second one is
0:01:26this works
0:01:30something wrong
0:01:35i don't know what's going on
0:01:38second one is a gallon that we have developed in a collaboration were then if
0:01:42i and with consensus roommate additions on validation of light of racial methods
0:01:48and for forensic evidence evaluation
0:01:50and the first guideline is a guideline that has been
0:01:53released
0:01:55something's wrong with the computers
0:01:58right
0:02:38that's for the best for the windows system
0:02:41okay
0:02:43with a one are some recent guidelines on but the logic and islands for back
0:02:49practising for a six you madam adding an automatic speaker recognition also develop by m
0:02:53c in europe and network forensic sciences to do particular the forensic speech analysis work
0:02:59we're concerned in the first the first one is probably to the three of them
0:03:02are available second one is already published in forensic science international from the third are
0:03:07in this repository of documents from m c
0:03:11and some critical combinations of this guideline are about expressing conclusions in court in general
0:03:18not only in speaker recognition but in forensic science in general their recommendation for all
0:03:22forensic science fields
0:03:25and there's some critical recommendations in the guideline that have especially stressed
0:03:30first one is that the expression of conclusions must be probabilistic
0:03:36somewhere breast cancer recommend in their server must gain in this in the guideline
0:03:40that i recommended to transform the probabilistic statement at a form of likelihood ratios in
0:03:45terms of formal equivalence and what is absolutely stressed is that okay absolute statements should
0:03:53be avoided
0:03:54like identification exclusion categorical statements
0:03:59second one is that when the one has to the finally hypothesis in the case
0:04:03that's a same die different guy or this guy comes from this voice comes from
0:04:08this guy all these speech segment comes from another person in with this characteristics
0:04:13one has to consider at least one alternative
0:04:17can be many of them but at least one
0:04:20and a clear definition of the database also
0:04:23a is also mandatory because the definition of interactive defines
0:04:28what is the data we have to handle in order to compute this weight of
0:04:33evidence
0:04:34there's one it'd findings must be evaluated to given each of all the buttons is
0:04:38so that lead as to
0:04:41somehow kind of
0:04:44well likelihood for each hypothesis only two hypothesis case we try to a where we're
0:04:48going to a likelihood ratio
0:04:51for the one it said that the conclusions of this breaks in terms of support
0:04:55of hypothesis instead of probability of the processes this support to the hypothesis that putting
0:05:01read this the way of
0:05:03it is quite easy way to avoid some fallacies in a reasoning
0:05:09and it for so as to suppress are support are the weight of the evidence
0:05:14in terms of aligning racial rather than a posterior probability ratio
0:05:17so support is an important work you want to avoid this kind of classes
0:05:22so they will last one is that a data driven approaches should be the
0:05:28final goal
0:05:29but in the meantime their many people that cannot role in the lower tiers to
0:05:34data driven approaches so
0:05:36the guy lighter considers a they use of subject the subjective judgement is subject to
0:05:41probabilities and so on
0:05:43but it is recommended that data-driven this is kind of
0:05:46a long-term goal
0:05:49there's also an example in speaker recognition is not an example of what speaker recognition
0:05:54soon should be because the generate the some controversy in the into the m c
0:05:58four six p channel your analysis group because there are many ways of doing speaker
0:06:02recognition
0:06:03this is an example you should on automatic case it was generated by people from
0:06:07what if it will that they used automatic speaker recognition for doing this but it's
0:06:10not
0:06:11exclusive
0:06:12just a guy templeton given example
0:06:15how to do this in a given particular scenario with a given particular weight of
0:06:20special conclusions which speaker
0:06:23well the second that nine is a guideline validation we have been developing with people
0:06:27identify and people that the a professor
0:06:30and this guideline is aimed
0:06:33to recommend everybody in forensic science that is you the likelihood ratios to go to
0:06:39works
0:06:40a objective evaluation procedures which is not the case
0:06:45typically in many forensic science fields
0:06:48here a speaker recognition we use that definitely in a in this conference everybody
0:06:53use a experimental environment to validate their methods
0:06:58but the two questions here first i if you're not used to that how to
0:07:01do it
0:07:03which
0:07:04somehow i it comes to perform as measuring how performance at messrs should be interpreted
0:07:10which perform and messrs i relevant
0:07:13and the second one is okay i have a validated by a system is in
0:07:18performance measure so
0:07:20how to put that into play in order to make one technique
0:07:25to be able to go to court some recommendations regarding laboratory accreditation laboratory and
0:07:32okay procedures and so on
0:07:35the guideline is very
0:07:37particular it can create but i'm not gonna go into more many details the of
0:07:41the thing is just determining if an implied a correlation matrix is able to be
0:07:45used in court
0:07:46and everything should be documented
0:07:50we are in the process of a stellar accent is island into allies just and
0:07:54therefore biometrics
0:07:56d mlps meeting these but there are some of the people here collaborating from start
0:08:01and or laboratories related to i so
0:08:04and we proposed in a tile i some relevant characteristic this table is not intended
0:08:09to that you read the table but you can see somehow cor eer thinks that
0:08:15we are used to here so we contributed this into the general for as you
0:08:18feel but this performance measures are not
0:08:22limited to this once just a proposal
0:08:24so that the guideline is supposed to be open that sense
0:08:27so everybody can contribute would more performance measures these are the minimum requirement that we
0:08:31understand that the validation process should contact regarding
0:08:36performance measure and also there's a high stress
0:08:39and most of my colleagues would talk about it
0:08:43about the use of relative relevant for a six data so laboratory data it's okay
0:08:48using a nist evaluation is nice
0:08:50but
0:08:52the last we follow with a critical the performance measuring in forensic
0:08:58fourth conditions which is extremely tricky can stay
0:09:02an extremely tricky issue and that like colleagues will talk about it later so
0:09:09finally this l m c guideline for forensic automatic or semiautomatic on automatic speaker recognition
0:09:14that was laid by pretty led by under the got a within the forensic speech
0:09:19utterances working group
0:09:21and it is guile anything have is compatible with the m c guideline for reporting
0:09:26is also compatible with the validation guideline that we have been talking about before
0:09:31and also address
0:09:33many other issues
0:09:35like a the most used technologies and matters with that the state-of-the-art methods that reliable
0:09:41the most used features we have the features that typically used in hearing in different
0:09:46approaches which are more reliable audio preprocessing how what is information if you might be
0:09:51a human being in the process as well as well so it's based techniques guideline
0:09:56and they have been developed within the for six a speech about it has what
0:09:59many of us here have been developing having contributing to that so it's a guideline
0:10:05that presents a high degree of agreement today i mean
0:10:09okay that was my can be
0:10:15thank you then and just and namely not so we have one minute for a
0:10:21small or we question
0:10:24in the case will have more time
0:10:26when all the fast talk
0:10:29any question
0:10:31for then
0:10:33and the guidelines
0:10:40we don't continue with k
0:10:41yes
0:10:43i
0:10:44so
0:10:45dennis
0:10:47is going to the one you know with his presentation
0:10:51okay
0:10:56window
0:11:01how do you full screen
0:11:18a good morning everybody
0:11:20i'm that jonathan
0:11:22from sweden
0:11:24work for a company always be
0:11:27and also
0:11:29why the university of garber
0:11:32currently at
0:11:36i'm gonna talk a little bit about
0:11:38a credit the small forensic speaker comparison
0:11:42which we are
0:11:44so the company
0:11:46company we performed case work for around eleven years
0:11:49been to sweden norway and us
0:11:54approximately fourteen cases
0:11:55almost all the more swedish cases
0:11:58there are three
0:11:59people employed
0:12:02all the most part time
0:12:04all employed by the university as well
0:12:07and we are the sub contract or of the swedish national forensic centre
0:12:12basically we handle more or less all the cases
0:12:15sweden
0:12:17a small area
0:12:19just give you some short
0:12:22just quickly talk about an implied methods mentioned them
0:12:26and then i'm gonna talk about the evaluations for accreditation where daniel stuff comes in
0:12:33very briefly what a forensic conclusion in sweden looks like and
0:12:38quite a few questions
0:12:39to put up there
0:12:43so before explaining the three parts very briefly there's of course screening processes so and
0:12:51fc screen means that
0:12:53and that's developed over the years of course of these days it's basically
0:12:58screen part of the cases are round fifty percent
0:13:02and that happens and fc
0:13:05these days basically
0:13:06before it used to be a lot more screening an in house for us
0:13:10but not in a t does it and one more because it's cheaper for them
0:13:14and then there's always the second screening done
0:13:18at our place as well and then
0:13:21hunting comes from one station with joe the others we always say keep open so
0:13:26that we can actually one samples during the analysis of web even if we take
0:13:29taken on the analysis
0:13:32job
0:13:34the first part of the analysis is the linguistic phonetic perceptual analysis
0:13:39i also
0:13:41these days and some cases a
0:13:44it's also could begin with a light dusting depending on how many people are embolus
0:13:48on linguistic part is you know go through different steps of perceptual evaluation
0:13:53it try to keep it in and some kind of bayesian manner so how do
0:13:57we treat covered by a small
0:13:59keep very brief you go through it once with and you bias yourself actually for
0:14:05the one hypothesis and then you go through it again and you bias yourself for
0:14:09the other i
0:14:10and two people always doing this and third person
0:14:14in most cases and to the by test
0:14:16now the three more or less to the point at a private case
0:14:20some level
0:14:22also
0:14:23matter cost and how much working pretty to case
0:14:27second part is stiff you acoustic measurements that we still do
0:14:31and are part of the standard protocol ones articulation rate basically produced a little per
0:14:36second
0:14:38fundamental frequency measures few of them graph and then the long-term formant analysis
0:14:45which is basically nowadays handle more or less automatically
0:14:49and well
0:14:50also put into an i-vector system
0:14:53and
0:14:54and third party cycles than the automatic system so currently there are two systems are
0:14:58active
0:15:00we're
0:15:01evaluating one system and as one system researcher for systems altogether
0:15:08guidelines when it comes to the evaluation for accreditation
0:15:12we've been
0:15:13fiddling around in the dark basically not knowing what to do exactly and i think
0:15:17false we we're
0:15:19we very much appreciate the work that's done but by and say and that's true
0:15:24but also maybe especially since we're in a tight schedule mouse are next a deadline
0:15:29for accreditation is like
0:15:31to over
0:15:32so when they regardless on when i was that and a five month ago meeting
0:15:36would be da
0:15:37and only but this work with the dog you know rudolph
0:15:42that guidelines really important for us to how to treat the validation of automatic systems
0:15:48it doesn't solve and everything of course and that's a lot of you can discuss
0:15:51that very much but at least there are some guidelines now we can follow and
0:15:55we know what to do basically for the accreditation at least and then you
0:16:00people discussing
0:16:02so that some of these are just example some of the plots that they
0:16:05a suggestion the guidelines for some that it all looks five and you know it
0:16:09can get the figures for each of those plots
0:16:14these are some example of the problems you can start running into well from this
0:16:20is from doubled from the flu to identify
0:16:23you created directly maps for the results in this case it's a little are means
0:16:27but also for equal error rate and so on four different testing that you don't
0:16:31and huge telephone database so
0:16:34more or less like see what happens when the training samples are more than one
0:16:39and when the test sample is more or less or shorter and shorter
0:16:43what happens
0:16:45in the evaluation process
0:16:46but if you consider all those plots
0:16:49and all those figures you can and accreditation process you can realise that
0:16:54is gonna be quite many pages if you also very brief you don't have to
0:16:57read all this
0:16:58consider how many validation start
0:17:01very quickly went through that during these eleven years we've done over a hundred evaluations
0:17:07and if you consider all those different the conditions and so on different durations like
0:17:12microphone distant microphone mobile recordings with and without phase cover in an outsider car indoor
0:17:19outdoor different languages different compression with done more less all those
0:17:25with different datasets and some simulations but
0:17:28you can imagine what a large document that would be
0:17:33document in all those evaluations for the accreditation process
0:17:38the perceptual phonetic analysis also has to be evaluated
0:17:42currently we well it's been a difficulty for us because we're we've been to before
0:17:46and we know pretty much that a to we have to some extent at least
0:17:51so we been trying to evaluate each other
0:17:54back and forth over the years now we are third person she goes through basically
0:17:59training
0:18:00testing because even though your the phd like speech pathology in her case
0:18:04and you a great year still have to evaluate everything and you're not really used
0:18:08to do forensic analysis on telephone material
0:18:12she had to go through training phase the testing phase and then aligned evaluations
0:18:17as we started that the small scale of course because extremely time consuming the last
0:18:22almost like
0:18:23twenty three speaker took are some three days to form the analyses
0:18:30just quickly showing you what the
0:18:33the national forensic centre verbal scale looks like nine point ordinal scale conclusions
0:18:39two hypotheses
0:18:42so from level loss for two-level minus four zero in the middle
0:18:46and it goes something syllable-level plus four isn't like the results are extremely much more
0:18:51probable is the main approaches to compare the alternative
0:18:55a mind of two there's also more probable the alternative hypothesis to compare two main
0:19:01behind each level there is a standard likelihood ratios
0:19:08important to remember
0:19:10even if you do all these evaluations and you put this probably thousand pages document
0:19:14for accreditation every cases uni how much can you actually inferred from all the evaluations
0:19:20you've done
0:19:21to each and every case
0:19:23is not easy at all
0:19:25even though it looks really don't know it's evaluation
0:19:29see a lot of stuff to think about still even though you go threat accreditation
0:19:33process and you get this down problem
0:19:37evaluation is not like the evaluation stops
0:19:42and that just in general pattern that out there as well
0:19:45we what is need to have a transparent report
0:19:49still don't know that there's
0:19:51something that we need to discuss much more
0:19:54and who has to be able to understand this report
0:19:57is it the actual
0:19:59the jury or judge the
0:20:01actually another expert probably which that's how are basically the
0:20:08i think that's pretty quick
0:20:09thank
0:20:16excellent i mean we have time for a couple of quest
0:20:20nico
0:20:22like we did not
0:20:28why
0:20:31it's data mining for its well it's just two examples because of output all them
0:20:36they're the slide look crazy and
0:20:38so i just like plus or minus to give you an example i could have
0:20:42taken the minus four
0:20:53i suppose is probably more of a common that will get it to later but
0:20:56based on
0:20:58the preview so far seeing with first to talk
0:21:01but one concern i have nothing wrong that is that
0:21:05the big of the forces all the data right so that the data and there's
0:21:10a lot of that are going to about guidelines accreditation so for one thing is
0:21:14gonna be everybody keeps a the data is the problem but it keeps
0:21:19kind of putting
0:21:22of the near that if i guidelines and accreditation now it's gonna look like it's
0:21:28more official time disconcerting later it is not really quest the discuss of how we're
0:21:32actually ever gonna get our hands around the data issue
0:21:37one leader answer all to me
0:21:40well what i can tell us that
0:21:42there is a lot of data
0:21:44but of course we can cover all these conditions that amount of data but
0:21:50to me also it's this the sensitivity of the data
0:21:53so i can tell you there's a lot of data i can't really tell you
0:21:57about how it's collected what data is its own because it's all kept behind
0:22:04secrecy to too large extent and
0:22:07that's also depose specially in sweden to do when you publish things
0:22:11a lot of evaluation that we don't over the years we can publish because it's
0:22:16i hope that is actually going to change now but it's the it's gonna
0:22:20huge problem i can't really i can give
0:22:22well if i probably something i have to be able to give the data actually
0:22:26to another researcher if he asked for
0:22:28or making can i
0:22:31to this intuition and actually use the data error or something to for falsify ability
0:22:35thing
0:22:38but
0:22:39if you can do that you can't really publish anything so
0:22:43and that's gonna difficulty but now we're
0:22:46probably we maybe can do that anyway because organization it's changed please it's also
0:22:51but will see
0:22:59thank you let's go to our next us the you can sense
0:23:28some talk a little bit up about some aspects of a word could be a
0:23:34became we do speaker recognition since the seventies and early days it was done automatically
0:23:38but the technology wasn't really
0:23:40and ready and
0:23:44that the method used was the autumn
0:23:45auditory in acoustic-phonetic method starting from the eighties
0:23:49and since about two thousand five use both this onto an acoustic method a compact
0:23:53with that
0:23:54plus also automatic speaker recognition
0:23:59a just a few slides you
0:24:01so you heard about from daniel about these
0:24:04guidelines
0:24:05for
0:24:06as semi-automatic an automatic speaker recognition
0:24:09and just again repeating into two of the aspects or one is
0:24:14the outcome of an automatic or semiautomatic method is
0:24:18the likelihood ratio so
0:24:20it's all about
0:24:22it it's and systems that output like ratios
0:24:26and another important aspect that then dimension as well this is that validation
0:24:31of a like information method has to be performed with speech samples
0:24:35that are typical representation of the speech material frantically boundaries confronted with an everyday work
0:24:41so it's gonna be
0:24:42forensically the relevance
0:24:45information
0:24:47these kinds are accessible even here on the on the website you might have noticed
0:24:52that this is link using all the
0:24:54it gets you to well
0:24:57the and c website and there are four documents on there so as one of
0:25:02all documents
0:25:04on the nist website
0:25:07now
0:25:07since we have you those guidelines are we have to sort of a
0:25:11practice what we preach so we have two
0:25:13get busy
0:25:15collecting the forensic data forensically relevant data and we've been
0:25:21starting doing this a while ago one of those activities have been published and the
0:25:27odyssey two thousand twelve
0:25:30in our activity and ongoing
0:25:33and another q this is our collaboration with the end of high
0:25:36on
0:25:37they have a
0:25:39not really is they have good
0:25:42compiles vienna five fruits corpus that was document and all those in two thousand fourteen
0:25:47and we have a special license to work with them was off work to look
0:25:51at this going many restrictions and so forth
0:25:56also for in terms of what kind of data we have the best coverage is
0:26:00for matching conditions
0:26:03involving telephone intercept data
0:26:07what's more difficult is about condition so especially mismatched conditions one type of conditionally frequently
0:26:14have is
0:26:15comparing
0:26:17terrace videos
0:26:19the people making announcements to public disguising the phase this and
0:26:25is encouraging people to come to their
0:26:28training hams and stuff like that
0:26:31as opposed to telephone intercepted recording us all these guys callhome and then there's interception
0:26:40it would be captured telephone section
0:26:43so this would be an indispensable in terms of technology but also the speech style
0:26:48so this guy i read something for example make involvement or learned it's at all
0:26:52it's different from a natural telephone conversations
0:26:56so we do have somebody remote we it's more difficult to collect the data
0:27:00in other challenges language so we have case work in several languages and we want
0:27:05to can cover them
0:27:07and we do collect data from different languages but there is a limit to what
0:27:11we can do in is an impact as a parallel strategy
0:27:15we also investigate the affects both the size and that the type of the effect
0:27:21if the is
0:27:23mismatch in terms of the data we have so one type of situation is if
0:27:28we have a
0:27:29a testing corpus were but we don't have the right reference population for that we
0:27:34have to use a reference for lid from another language what is the effect in
0:27:38terms of shifting the like ratios will be used the incorrect
0:27:42reference population as not a big effects
0:27:44so these kind of effects are to some extent predictable
0:27:47this what we also to took too to capture this language should languages is a
0:27:52big issue
0:27:53it's a
0:27:54we don't we can just one
0:27:57language a it's a it's in several languages we want to cover that
0:28:02this one more practical problems not
0:28:05no that to move more conceptual
0:28:08problem and it issues
0:28:12the one that's combining
0:28:14different kind of every this there is quantifiable evidence
0:28:18like a ratios coming from automatic or semiautomatic systems that's what the guy like
0:28:23well
0:28:23there's also qualitative evidence coming from the auditory phonetic and acoustic phonetic method
0:28:30and we use both kind of evidence i mean some partitions an answer to this
0:28:35just work with quantifiable evidence
0:28:38others work with both have of evidence and the question is that how to combine
0:28:43the two
0:28:44and the since not everything is quantifiable if we use both methods eventually it we
0:28:50have to be something
0:28:53some strength of evidence statements that are not entirely
0:28:57quantitative so in the in the end of it
0:28:59one components qualitative the quality of the their entries that has to be qualitative be
0:29:04because it doesn't
0:29:06can you cannot calculated or way through there is some qualitative aspect so that's standing
0:29:11problem
0:29:14not unsolvable of course
0:29:16but it was all those in students to use both like the ratio producing methods
0:29:22and qualitative methods the other one that's the most painful problem probably is this one
0:29:29here about the a colour the interfacing with the core so
0:29:34you can do audio stuff and would well or not so well but i one
0:29:39cases judgement they have to go to court and interface with people from record and
0:29:43they have of and have different mindset and different expectations and so forth and the
0:29:48situation that we have in germany is
0:29:50the courts in germany still expect posteriors statements
0:29:54so the expects things like what you have your table
0:29:59well the identity and or not identity cannot be assessed or is probable are highly
0:30:04probable very are probable
0:30:06can be assumed was near certainty that this is sort of stuff they used to
0:30:09and that it still expects
0:30:12no this of course this that discusses and everything but there is sort of psychology
0:30:17inertia against switching to a bayesian framework
0:30:21the v ideal
0:30:24idea about the bayesian framework is
0:30:26the speech experts supplies like reissues over prevalence of a forensic experts to the same
0:30:34and then the courts applied prior all calculate posterior also from the prior art
0:30:39and all the like iterations that coming from the expert so that would be the
0:30:43ideal scenario
0:30:45that's still and there's against implementing it and all and the netherlands sweden
0:30:51you have much five and then we didn't only
0:30:54i don't know if you sort of can
0:30:56especially point three how state of the or some on that one
0:31:01but this is since topic for discussion
0:31:04just i'm just a interfaces so that
0:31:07this it's not this is
0:31:09the and then and this expectations coming from the core about sort of things they
0:31:14want and so forth
0:31:17that's basically but
0:31:19i've system model
0:31:27good thank you very much we have time for a couple questions
0:31:42could you can just say something about how you actually at the moment go about
0:31:46combining
0:31:47they quantative on the qualitative data is the sum
0:31:52explicit statement about how you do that and how you integrate any kind of
0:31:57relationships between those two types of evidence
0:32:03we went to do with for the automatic is a thing of a here for
0:32:08example
0:32:09this is a plot coming from the guidelines and
0:32:12and four we have is that we have
0:32:17and i
0:32:21well
0:32:29i
0:32:34i
0:32:36i
0:32:38i
0:32:44i
0:32:48i
0:32:52i
0:33:10i
0:33:15i
0:33:43five
0:33:48i
0:33:54i
0:34:11the resistance against the bayesian
0:34:15paradigm
0:34:16could it could it could vocabulary contribute to all the could german words for like
0:34:22to drive so prior also still you have
0:34:28i
0:34:38i think easy i think using john colour you have to explain the concepts and
0:34:45everything
0:34:46no i think it's not
0:34:48language a little or no so probably but
0:34:51as more as regression process on the core
0:34:55the reason i'm asking in my home language awfully cons we don't really have words
0:35:00for we have four probability
0:35:03voice kind look but okay this enables as well that's why is that it but
0:35:07once all the because there's not even though this things with your likelihood and probability
0:35:11is just a sign or this is like overcomes were i got no idea of
0:35:17this a posteriori
0:35:19probability of the cost i don't know how to set
0:35:22i
0:35:29since the guy border vocabulary
0:35:34my comment on there are two sort of got up again and again
0:35:38which contribute to
0:35:41at least at least partially to pull this with
0:35:45interfacing with the legal profession
0:35:49one of them is support
0:35:53and the other one is the use of speaker recognition
0:35:56no if you keep on talking about speaker recognition is not surprising that the cool
0:36:00thing should one speaker recognition
0:36:03right and do not this isn't speaker recognition you giving them elected version the speaker
0:36:08recognition comes with the posterior
0:36:12i think it's okay for us we understand that i think but
0:36:17of course the legal profession is something to the if you keep on talking about
0:36:21forensic speaker recognition
0:36:24then
0:36:25so surprising that i'm the one to the size of the sense we will
0:36:29and secondly this
0:36:31the one of the things that really gets by backup is this will support
0:36:37in the likelihood ratio supports the hypothesis
0:36:42it doesn't well
0:36:45the like to the meaning of the likelihood ratio is the
0:36:50hypothesis merges with the post with you when you take two parts into account mm
0:36:56it can be reversed you know the last iteration of the thousand be robust
0:37:01it has the meaning else it has a that has no meaning
0:37:05apps the problems sight talking about this likelihood ratio of support for the prosecution hypothesis
0:37:14the trouble thing to support a language
0:37:19this is i know that's what people use i think it's a very bad choice
0:37:27i
0:37:30i
0:37:34what's the same think then
0:37:35they didn't this is this is you talking you're talking about you the trying to
0:37:40say something about
0:37:42no trying to say something about the posterior in the in the absence of the
0:37:47prior
0:37:49and i'm not that there are plenty of other words but the but it's a
0:37:53it seems to the standard itself as i
0:37:57expression i and i again a way that we discuss later but i think that
0:38:04the grim some core implicitly stays
0:38:08there is no a consideration of all information for supporting previous opinion but you use
0:38:15it in conjunction with
0:38:17support for the hypothesis the not
0:38:21the results are more likely
0:38:25not that are i understand lately sentiment over the whole thing but if you say
0:38:31my likelihood ratio to give support for the prosecution hypothesis well the defence hypothesis that
0:38:38no one is a i mean how could happen that the wording that's been used
0:38:42i understand the problem
0:38:45i would like to stress is not the likelihood ratio what supports
0:38:49is the findings also for via
0:38:52with of evidence which is quantified in a range well the findings of different
0:38:59s
0:39:09okay so next
0:39:12having
0:39:32good morning
0:39:34the title for like till today
0:39:38he's opening the black box
0:39:40for forensic automatic speaker recognition and this talk was
0:39:44a prepared by financially and myself
0:39:48we're from also wave research
0:39:52which is e audio not speech rd company based out of oxford and are all
0:39:56experiences feel is that we develop systems for automatic speaker recognition speaker diarization and audio
0:40:03fingerprinting
0:40:04and we've been what in this field
0:40:06for quite awhile a products all used by law enforcement u k and other agencies
0:40:12in the u k u is your the middle east
0:40:16and include them at least you came only the n if i and seventy k
0:40:26the
0:40:27topic i'd like to dress
0:40:30coast with some of the common set of in that come up already
0:40:34and
0:40:36it is the fact that
0:40:37automatic speaker recognition
0:40:40ease eight black box and this is a comment that what about colleagues
0:40:44one of our conferences set and it stuck with me
0:40:48and
0:40:49i think a lot of this work needs to be attracted to address the fact
0:40:53that automatic speaker recognition methodology is a black box
0:40:58well the last few days we being treated to a variety of new algorithms you
0:41:03techniques in might have i mean variations and modifications of different algorithms
0:41:09it isn't
0:41:10any surprise
0:41:12that these mathematically complex methods
0:41:15all black box
0:41:16to the laypeople the juries judges and voice
0:41:21to a certain extent even to the forensic experts
0:41:24where using these
0:41:27now
0:41:28as we've seen recent advances have been with these
0:41:32with a large number of variables and does comment earlier about it or being about
0:41:37the data training and evaluation data the feature modeling and parameter choices if you have
0:41:43an evaluation you have fifteen systems with
0:41:45variations of orders where the arguments been placed in one way of the other
0:41:49and how parameters and tested i have been included in the focus
0:41:54has been on getting incremental improvements on these loss database
0:41:58and weighted like to do not
0:42:00the variability in these databases has been designed all controlled
0:42:06now
0:42:06how does this it within the context of opening up this black box if you've
0:42:11got real forensic casework like some recordings of doing
0:42:15how do you use and how do you address
0:42:18the can
0:42:20but
0:42:22let's look at the end c guidelines for some sport
0:42:26now the l c guidelines talk about any expert method
0:42:30addressing
0:42:31balance
0:42:33transparency robustness and logic is on these of we already addressed quite good to go
0:42:37into them
0:42:39the things that stick out of balance for example that you have competing hypotheses or
0:42:44propose a propositions and evidence is considered with respect to these hypotheses and propositions given
0:42:53of course the prior background
0:42:58and then there was about loading
0:43:01and the fact that you know you don't want to
0:43:04transpose the logical of
0:43:06evaluating the hypothesis against evaluate the evidence instead
0:43:13and robustness which is slightly different from the sorted speaker engineering we're talking about robustness
0:43:19which is how well we did hold up to scrutiny however we really wanted to
0:43:23cross examination the actual techniques the actual techniques of the use i will build a
0:43:27problem
0:43:28and i think
0:43:29white importantly that something you don't get any black box
0:43:34its transparency
0:43:38so
0:43:39how well with the forensic expert be able to explain the methods
0:43:43and explain the data and that goes in
0:43:47a few system that the using
0:43:49now let's take a very simple straightforward it's expect for tonight used i-vectors in the
0:43:54same sentence politics a straightforward automatic pipeline wave training the ubm
0:44:02you've got a whole lot of data that you can put into training the ubm
0:44:05you choose another
0:44:06another set of data for training the total variability space
0:44:10and then you if you using lda p lda you can use even yet another
0:44:14speaker and that i know was used a lot well in these
0:44:20and this is just before you it testing in training and validation or equal error
0:44:24rates and so on
0:44:25so if you we even got started
0:44:28you've got data decisions multiple data decisions about the ubm training about the tv matrix
0:44:33about the l d and the lda
0:44:36and this is before considering things like what is the relevant population than the likelihood
0:44:41ratio method and so on it so for this is embedded within the system
0:44:45and
0:44:47going back to dogs comment about resolving about data
0:44:51the system that are developed
0:44:53with these kind of background data
0:44:56have to be explicit
0:44:59about their effects on
0:45:01the likely to show what least that needs to be transparency about the effects that
0:45:06the that these are like calibrated
0:45:11that that's one part of the problem that is sort of the automatic
0:45:15a black box if you will
0:45:18somebody could help
0:45:19now if the u k most
0:45:22of the forensic speaker recognition case what is performed by forensic conditions
0:45:27and they have a lot of experience and knowledge they understand the material and send
0:45:32the language they understand the that idiosyncrasies of that speech the in the centre legal
0:45:35requirements of their
0:45:37and
0:45:39that they want to
0:45:40include these automatic methods but are all automatic systems give these goals
0:45:45and how you then
0:45:47connect
0:45:48this automatic score that you've got with this knowledge that you have about the fact
0:45:54that this
0:45:55speaker says
0:45:57something that is very particular to a region or space
0:46:01how do but these things together
0:46:03okay assuming you even wanted to make your analysis more objective using likelihood ratios and
0:46:08evaluating before system performance
0:46:11how do you can to do this
0:46:14what generally happens all happened was you had to
0:46:18putting
0:46:19against that sort of
0:46:21you had a traditional sort of forensic phonetics based approach look at performance and voice
0:46:25quality and linguistic
0:46:29characteristics
0:46:30and then you have the automatic space
0:46:33which
0:46:33which look at the spectrum and
0:46:37you know street treated as a signal processing problem
0:46:39because they only against each other
0:46:41sometimes we don't even sit together at conferences
0:46:44so
0:46:47it's not
0:46:48that kind of needs to go to this common political platform produce
0:46:53beginning to be accepted which is that the that the bayesian likely iterations and it's
0:46:57nice because you can have these multiple methods and not approaches and they can put
0:47:04together in the same direction
0:47:08i've been working with this problem for quite some years and then be with a
0:47:13lot of colleagues who work with forensic casework
0:47:16and i really think the
0:47:19black box used
0:47:20quite a quite an important probably creates
0:47:24you've got situation where the forensic expert has four systems that they haven't elaborately decorated
0:47:28these four systems for example
0:47:30and you don't wind able to look in order that automatic system to you all
0:47:35k-histograms i go back to but you on this is point about every case being
0:47:39unique
0:47:40and the expert should be
0:47:43say system parameters means to use
0:47:45new data at every step speaker recognition process
0:47:49and in some sense
0:47:50i in this
0:47:51doesn't just go for you know commercial systems
0:47:55i
0:47:56x the expert should not be limited to these prepackaged preprocessed manufacturer provided models and
0:48:02configurations
0:48:03and they should be able to train the system specifically for the problem domain
0:48:08and it's it was in this context from table three
0:48:14that's
0:48:16that we looked at one point in this is by no means the only good
0:48:20only way of doing things
0:48:22but
0:48:24when you know
0:48:26we don't that
0:48:27putting together a not automatic system that was built with the with an open box
0:48:33architect if you will so one if you flexibility
0:48:36in the features that you put in so you could use automatic spectral features like
0:48:40mfccs and so
0:48:42but it is important but you could also use traditional forensic parameters like formants
0:48:47and then
0:48:48a debatable the fate but you can use user provided features again allow i i'll
0:48:54the strength of these mathematical modelling techniques like i-vector p lda gmm and gmm ubm
0:49:01and
0:49:01and you can use and within the context of these lexical features
0:49:07and
0:49:08been doing this was that it was you were able to introduce needed all stages
0:49:12in the i-vector by plane or the gmm-ubm pipeline
0:49:16and
0:49:17to a certain extent the system to the conditions of the case now
0:49:22you lasting is this make
0:49:24it's this big black box
0:49:27transparent
0:49:28no it doesn't
0:49:30i e ds as complicated as it is
0:49:32the what it tries to is open it up
0:49:36to what goes into it and what data was into it and
0:49:42allows for validation that's more meaningful
0:49:45in the context of in this case
0:49:55thanks any so there is only one we questioned
0:49:59in you know case two
0:50:03so that has a speaker
0:50:04anyone very quick and then
0:50:10and then the question itself
0:50:12i'm another so i'm by s so i'm sorry for that but this is this
0:50:16is a very interesting topic the black box thing and so on and i think
0:50:20that
0:50:22my opinion of course address trained yes because i think that when forensic expertise going
0:50:27to court the board if an something he needs to understand what's going on and
0:50:31what type of with a little additional using what type of algorithms that but using
0:50:36wasteland that deceased into your specific case yes it's obvious every that's the main in
0:50:42forensic problem that is every casey's is different and you need to have some ability
0:50:46but that
0:50:47but be careful with that because
0:50:49you create a system where you can tune everything
0:50:52then you create you make unsolvable the problem that what something before
0:50:57because if you wanna system that is validated
0:50:59and the same time you can change everything every time
0:51:03that we're gonna problem because then you are gonna need to validate this is then
0:51:06a single case so that for me for me creates
0:51:11l a big problem and apply them or with a time because you need to
0:51:15change data and sometimes is not a see the change data in the form of
0:51:20audio files and so on if every single system every single case that you need
0:51:25different the parameters of different song also makes more difficult to separate as also so
0:51:31i think that
0:51:32we need to find a place where you balance both things a transparency and openness
0:51:36of the system but also unique list data lies some sort of a specific things
0:51:42on the system just to the make it
0:51:45to make the little the validation of the system at
0:51:47what it does it
0:52:54okay thank you any thank you in any case we can we can twenty maybe
0:52:58this is interesting is gaussian
0:53:00after that as a speaker
0:53:02and then said well actually it and some of these points in all at the
0:53:06in the other hand the demo in this challenge so you can also continue with
0:53:11him
0:53:15okay i'm gonna tell you about simon introduced to you right multi love our evaluation
0:53:21or friends or voice comparison
0:53:23that is being organised by myself and my former phd student of all bands and
0:53:32so i think we've already talked about doesn't need for evaluation of forensic evidence
0:53:37this goes across all branches of forensic evidence best been calls since the nineteen sixties
0:53:42for forensic voice comparison to be evaluated under realistic case what conditions but i think
0:53:49just by what everybody here said i think this still goes widely unheeded
0:53:58so in our contribution to this is to run this friends go evaluation which were
0:54:02calling forensically vol zero one
0:54:05it's designed to be open to operational friends a greater or trees we especially want
0:54:10them to partake take part
0:54:13it's also going to be open to research work
0:54:16and where providing training and testing data they're representing the conditions of one forensic case
0:54:23so based a where providing the data but have that has based on a relevant
0:54:28population for the case it based on the speaking styles for this particular case and
0:54:32also the particular recording conditions for this
0:54:35and
0:54:37we are going to have the papers recording on the evaluation of each system published
0:54:42in a virtual special is you all of speech communication
0:54:46so the call for papers the system is not quite setup but i'm hoping it'll
0:54:51be done maybe ventilate of this week or next week covers your
0:54:57the
0:54:58information if you wanna get information that still that's already available you can find it
0:55:02by going to my website
0:55:04and you can get started if you wanna start
0:55:09so there's an introductory paper which is already available dropped of at least is already
0:55:14available and it includes a description of the data and it includes the rules for
0:55:19the evaluation
0:55:22each paper that's evaluating system needs to describe the system in sufficient detail that it
0:55:26could potentially be replicated
0:55:28and we're thinking about the level of it could be replicated by forensic practitioners who
0:55:32have the requisite skills and knowledge and facilities
0:55:37we're not prototypes deadline on this people working in operational forensic laboratories are very busy
0:55:43there
0:55:45their priorities to actually do case work so where giving a two year time period
0:55:50within which people can evaluate systems and submit
0:55:57so disclaimer casework conditions very substantially from case the case
0:56:03basically i'm of the opinion that you're sensually at this stage do have to evaluate
0:56:08your system on a case by case basis because three conditions also variable from case
0:56:14the case
0:56:17and what that means is one should not whatever results one gets out of taking
0:56:23part in this evaluation one should not assume that those are generalisable to other cases
0:56:27unless a one can make a case that yes this all the case is very
0:56:31similar to these the conditions in the in the front to give l zero one
0:56:35case
0:56:38so a little bit by the data to based on real cases i said of
0:56:42the offender recordings of telephone call made your financial institutions call center this is just
0:56:48something i
0:56:49this work i just something i still of internet it's a landline recording at the
0:56:56call center and it has babble and typing background noise it saved in the compressed
0:57:01format because of course they want to reduce the matter storage that they have its
0:57:05forty six seconds long and it is clearly an adult male australian english speaker
0:57:09the suspect recording we should be able to get nice high quality suspect recording yes
0:57:16okay right okay or no i have a point over there right this is the
0:57:20actual room but the suspect recording was made in u c v is nice heart
0:57:24goals and i think the cat the person taking the camera is like in the
0:57:28opposite corner of the room
0:57:30right imagine what the reverberation is like and you see this here
0:57:35is nice fashion
0:57:36and the microphone is in this box
0:57:41so
0:57:42a problems with the suspect recording as well but that's pretty typical of
0:57:48the sorts of problems that we used we experience in real forensic work
0:57:52so the data that we're providing a come from a database we collected which is
0:57:57the whole database is actually available
0:58:01but this is that this is extracted from that database i we got male australian
0:58:04english speakers we have multiple non-contemporaneous recordings of each speaker we have multiple speaking tasks
0:58:11recording session
0:58:13we've got high quality audio so we recorded we actually had to record
0:58:18the route speakers from the relevant population we have to record the relevant speaking styles
0:58:23but then what we've done is with you type of the audio and we simulated
0:58:26the technical recording conditions that i just mentioned and that's pretty pictures about signal at
0:58:31most conditions
0:58:33so we have training data from a hundred five speakers so if you're if you're
0:58:38nist
0:58:39definitely used of nist sre is that sounds ridiculously low but day i think availability
0:58:46of data relevant data is a major problem in forensic voice comparison
0:58:50and that's
0:58:51are actually quite a lot of data of compared to what people
0:58:55can usually manage to get
0:58:56and the test data comes from a total of sixty one speaker
0:59:01so i can i have time to show you some preliminary results
0:59:06based on the data from friends give a zero one
0:59:10so this is results that of all than i actually did so this is not
0:59:15part of this special the specialist you in speech communication it's something that we did
0:59:21previously which is pretty a which is already been submitted but it's on almost exactly
0:59:27the same data
0:59:29so it's the in this example is looking at an i-vector system mfccs ubm t
0:59:35matrix lda ilp lda and then a score to likelihood ratio conversion at the end
0:59:45using logistic regression
0:59:48and we trained a two different versions of this system one is using generic data
0:59:55it's not using the training the first training level is not using the date i
0:59:58just talked about it using a whole bunch of nist sre data it's about an
1:00:03order of magnitude more speakers and two orders of magnitude more recordings
1:00:07and we use the generic data for everything to get to the score to training
1:00:11all the models to get to the school and then we use the case specific
1:00:15data for training the model that goes from the score to likelihood ratio so that
1:00:19logistic regression model at the end
1:00:21that's a fairly typical way of doing things
1:00:25because you do all the heart rending upfront here
1:00:28right we did another system where we use case specific data all the way through
1:00:33where train the models that get to the scores using k specific data and then
1:00:36with training the score the likelihood ratio models using k specific data
1:00:40and here are some results in terms of a zero if you just nosy llr
1:00:45accuracy of a look at
1:00:48okay so the case specific data
1:00:50is the one that performed using k specific there are always through perform much better
1:00:56than using joe generic data to get to the score and then k specific data
1:01:00for sparse code a likelihood ratio commercial
1:01:04and if you like tippett plots use tippett plots there's the generic the gen our
1:01:08data systems use the k specifics
1:01:12and if you understand tippett plots that's a huge difference
1:01:18dive in front of words has already been mentioned his
1:01:22doing very well in this presentation for not having been here
1:01:26so he's going to his or his already started doing the evaluation and we've got
1:01:32some results from him and his kindly allowed us to show the results here he
1:01:37was testing that works this different user options and bat fox a one user is
1:01:42a one option is a reference population
1:01:44we put in either or data from all the hockey put in data from all
1:01:48hundred five speakers or you like that but select a subset of thirty and he
1:01:53tried using no impostor data already tried using impostor data from all hundred five train
1:01:58speakers
1:02:00we here are the results us summarize if you use
1:02:05data from all the speakers instead of having better luck select a subset you get
1:02:08better performance
1:02:10if you use impostors versus don't use impostors using about this gives you better performance
1:02:15so that the combination that gets you the best performance at the two
1:02:20and if you like to but there's a tippet plot one thing that's clear to
1:02:23notice is when you only using the thirty speakers selected by that works there's a
1:02:28clear by us here which is then maybe a bias there but it is less
1:02:32it's less clear
1:02:35okay scale cask
1:02:43thank you so we have just time for one question before we move into the
1:02:50final phase for open questions and all the presentations remember in the session and z
1:02:56nine forty five so there's less than ten minutes
1:03:01so if we could begin with some questions for jeff that be great
1:03:20the if the data was
1:03:22totally appropriate but
1:03:24giving it's viable to do a comparison of the two systems that you put up
1:03:29based on your compare your evaluation
1:03:38i was prepared for the question
1:03:41here's the use the best so this
1:03:44that was the red what the red one is the best of that systems and
1:03:49the blue one is the best of this just the i-vector systems we did
1:03:53and
1:03:55so blue one is better in terms of cmllr and there's the difference
1:04:00in terms of the tippett plots as well
1:04:05right and i think and i think cross going back going back to
1:04:11just our system that there are the versions of our systems i think the but
1:04:15the big differences where using case relevant data although we threw
1:04:19where is that was using a lot of generic data to get the score to
1:04:23likelihood ratio
1:04:24to get the score level
1:04:27and i think that fox works better than our system that use generic data at
1:04:31the beginning but i think we've end work better than that folks because we use
1:04:36case relevant data all the way through
1:04:52what's the difference in the likelihood ratios for the data
1:04:56that's the crucial things
1:05:02sorry three
1:05:05what was the outcome in so you the you've compare two systems
1:05:12but i would like to know what is the difference in the likelihood ratios the
1:05:17this that the systems gave you the actual comparison
1:05:22for the actual case yes
1:05:26well there is
1:05:27are we haven't we haven't tested that when we did when we did the actual
1:05:30case we chose one system when we used one system
1:05:35so we haven't for doing the case work we chose one system we validated the
1:05:39performance of that one system and we didn't
1:05:42go out and try a whole bunch about the systems on the actual on the
1:05:46actual case
1:05:49right because we do in case work it's it do in case work is not
1:05:54a research activity were not trying to choose the best one and also the problem
1:05:59comes up is okay you might say we chose three or four different systems and
1:06:04then we pick the one that were the best
1:06:07we will then over training so
1:06:11where over training on the test set
1:06:13we've optimize to the test set then rather than to the previously unseen actual suspect
1:06:18and offender recording
1:06:20and then there's also the problems of you know well okay you're presented
1:06:24three different systems which one should we believe
1:06:28precisely in a that's what i'll ask evolves so the defence counsel yes but and
1:06:36not that i would've expected to have but suppose one of the systems gives you
1:06:41a little loglr both minus five on the other one gives you local or four
1:06:48twenty
1:06:50right so certainly that's not so what we what we would do what we do
1:06:54re in our practise is we
1:06:57we pick the we optimize the system we pick a system that we're gonna use
1:07:01we optimized to the conditions of the case we don't freeze the system
1:07:06we then test the system using test data
1:07:10with that we don't go back and change the system again that's just that's it
1:07:13that's how well the system works and then the last thing reduced has the actual
1:07:17suspect and offender recording
1:07:19so we don't go gee i got an answer g let's see i got a
1:07:24relatively low likelihood ratio who's paying me the prosecution they want a high one i'm
1:07:28i'll go back i don't with the system and i can get a better answer
1:07:31so we keep a straight chronological order to avoid any
1:07:37and he suggested that we would be doing anything like that
1:07:40yes i understand that but we're talking about different systems are we know little the
1:07:45just one wants that all about the freezing of the system but the moment we
1:07:49comparing systems
1:07:51that's what tools about so while the results there were comparing says but it's a
1:07:55whole across a whole bunch of test rats so it's averaged over a whole bunch
1:07:59of trust us
1:08:01for is the compare the comparison of the two different systems are based on this
1:08:05you might decide
1:08:07that you wanted to use one of this you might decide wanted to use the
1:08:10best performing system but
1:08:14in a few cases you would maybe decide to choose one of those systems but
1:08:19if the conditions of the case in the future different i we then test the
1:08:24performance of the system under the conditions of that you case
1:08:28i might have decided on the basis of this case but i'm not taking this
1:08:31case as the validation for the case what conditions are very different
1:08:50rhino you're having entries news but i guess my question goes to about michael and
1:08:54jeff at some point
1:08:56okay
1:08:57as you go through your case work
1:09:00most judges are not experts maybe speech or speaker verification so if you're working for
1:09:10example a tippet plots do you present there was in core proceedings and if so
1:09:18how do your difference in prosecuting attorneys actually i'd
1:09:23program ask about the support about you always plots or how you present results
1:09:33yes and case you point one in recent years we did included to the plot
1:09:38together with the case specific thing that's but decided before so when we do explain
1:09:44everything and try to make it easy and so forth will be not shielding the
1:09:49the court from both results we we're giving them the results and then but try
1:09:53to explain assesses that this is used
1:09:56possible
1:10:01okay
1:10:03yes all its stuff that we put in our ports of course we see his
1:10:08the validation of the system
1:10:11and typically itself i centric or two lawyer
1:10:15and then they start from the call me and they start asking me questions was
1:10:18this mean what is this mean and i have system known okay
1:10:22i'll come to your office will spend a day together i will go through the
1:10:25basics with you so that you have got to level of sufficient level of understanding
1:10:30and then the next day then you can ask specific questions about this particular case
1:10:35in this particular report and so you know sometime in the mid afternoon which we
1:10:42get to the level with so we started by doing very basic what's a likelihood
1:10:45ratio and sometime by mid afternoon we get to the testing level and explaining
1:10:50what a something like a tippet plot means
1:10:53and then you get a court
1:10:56and the court seem to be designed to prevent this transfer of information from the
1:11:01expert to the trier of fact
1:11:04because you know if
1:11:06if you were going to train so if you're going to train somebody you what
1:11:09you do you might send them something to read beforehand you go you give them
1:11:12a little lecture you get them to ask the questions you ask them confirmation question
1:11:17to see the understand but in court it's
1:11:20the lawyer asked you the questions and you answer only those questions and that
1:11:24jury isn't allowed to ask questions it's
1:11:28getting major getting the trier of fact understand this
1:11:32i a serious problem
1:11:36i'm not a research it's don't also varies there we have not good solutions of
1:11:40the one thing
1:11:52thank you
1:11:53i just so the suggestion which is to
1:11:56to stop two
1:11:57see for like a glacial has a single number
1:12:01likelihood ratios not number it's a rush you and it's very important to be able
1:12:06to present
1:12:07with two parts of the racial the similarity and typicality
1:12:13it's really important for you do fall because when you all
1:12:19changing the reference population
1:12:21could be very interesting the coat two
1:12:24make link between the similarity typicality pills and
1:12:28you'll decision the boat v
1:12:31reference population
1:12:34but talk
1:12:41the sum and for some new software perhaps also the buttons will give you in
1:12:47the very for the actually sure what electricians calculated from the from where the evidence
1:12:54intersects
1:12:56with the is a different speaker
1:12:58with the suspect distribution and then the and the
1:13:04the distribution coming from the reference population so easy to
1:13:08two distributions use you the case and then point you could you could see the
1:13:13how the decorations calculated
1:13:18the question is then if you i mean this is an important that we call
1:13:22can request then you please are added to the board or not but can always
1:13:29can an insider how it is calculated
1:13:34so
1:13:35i guess they seek out of the
1:13:37two pieces that are going on here one thing jeff actually what he was ending
1:13:42up presenting was talking about the
1:13:45underlying
1:13:46accuracy of the system right
1:13:49the performance of the system and then we have the whole thing about the likelihood
1:13:53ratio that number that comes out that you what present to the trier of fact
1:13:58we all think is the
1:14:00or seems to be the going and way to go
1:14:02one issue i have with the likelihood ratio
1:14:06when we talk about being a number is
1:14:10there is no real ground truth likelihood ratio right
1:14:13in reality the only ground truth likelihood ratio that we can even calibrate ourselves to
1:14:18are infinity
1:14:20i mean zero one right it's
1:14:22it's either true or not true between those two things we start saying that we
1:14:26actually have evaluated the likelihood ratio
1:14:29of six point three
1:14:32there is no we never actually a value we don't
1:14:35estimate the likelihood ratio relative to any ground truth likely racial because the ground truth
1:14:41likelihood ratio lives the polarity
1:14:44i mean there's right we only evaluated through the posteriors
1:14:48is the llr stewart posterior
1:14:51so i guess my question the people to go to court is
1:14:54what you say is the ground truth how do you say what it means to
1:14:58be between the two poles i guess
1:15:01unlike which were ground truth likelihood ratio is what's
1:15:04what is that
1:15:09thank you some
1:15:10might be one
1:15:14for me and this
1:15:16this is personal opinion
1:15:18for me the answers the calibration of the likelihood ratio so it is definitely to
1:15:23that the only ground truth is like the final label what is to proposition
1:15:27so
1:15:29what we have tried to do and
1:15:32in this validation guideline that comes from the workers have been on precisely here in
1:15:36speaker recognition
1:15:38is that okay
1:15:40then i will racial would be better is not at this supports the right decision
1:15:43and the decision has to ground truth bold fine like
1:15:48so
1:15:49and there's another issue that is the issue of the calibration so calibration helps you
1:15:54to make better decisions because if you likelihood ratio that calibration calibrated when you buy
1:15:59them to the vocal imitation changes usually chain
1:16:02the cost reduced
1:16:04so
1:16:06that's one issue of calibration and the other issues the kind that calibration gives you
1:16:10some kind of tuning imagery to a
1:16:15generate heavier or lighter weight of the evidence depending on what you're discriminative power
1:16:21so systems with a very good the car should generally higher likelihood ratios good conditions
1:16:27right then
1:16:29system with the one stronger migration systems equipped with the words that occur is that
1:16:33the two properties of calibration so
1:16:35on the on one hand you improve your decisions which is the final accuracy mess
1:16:39you're looking for
1:16:40and on the other hand you have and it's kind of limiting a entity that
1:16:47is telling you okay you do not discriminating good they give likelihood ratio should be
1:16:51model
1:16:53so that's
1:16:54that's a true that the performance measure that we have been proposed
1:17:01e
1:17:05i mean i know this it's fills a politically for a but it also just
1:17:08seems that everything that we want to say that we're presenting this likelihood ratio in
1:17:13talking about scales right for you know bands on it but at the end of
1:17:17the day
1:17:18what really talking about is
1:17:21a decision
1:17:22which has prior mean you even still are when you calibrate everything's done through
1:17:27a priors that are there you may say you integrated out we go through all
1:17:30this the realities the day a six point three you can't say in ground truth
1:17:36my six point three likely racial estimated was really close to the true likelihood ratio
1:17:42except it's poles you're going to the heart decisions is the prior so i think
1:17:47it away were sort of
1:17:48i think that's what
1:17:51j a set of twenty two
1:17:53is your really just try to tell people of all the time to use this
1:17:57is how often it was saying when it was the same for you know the
1:18:01true this is what how often it wasn't when it was not you know to
1:18:06the quality and similarity i just wondering in a sense of breaking
1:18:10are we making it more complicated going to this issue try to describe the court
1:18:14are we getting a too complicated by overlaying with so much
1:18:17issues here in training ourselves and
1:18:20to not to try to get away from any the priors verses just trying to
1:18:23give
1:18:24a simple answer a like so it this forensic
1:18:28thing one guy setup
1:18:29and just had a visual way of doing it you put down the dots like
1:18:33here's all the dots when i ran it was the same here's the dots are
1:18:36and what it was the same here's dot of when i ran the case data
1:18:41through and you can visually see where sets relative to
1:18:45it's true that's the two distributions but in some sense is almost just saying here's
1:18:51here's what i got my read it when it was i knew the truth here's
1:18:54what i got there and they were the same it hears with this starts it
1:18:58you choose you know look at deciding to think it's close to the
1:19:01one of the other without right overlay so much
1:19:05issues on putting down to the single number
1:19:07but i mean he's using equation
1:19:11one of the things that
1:19:13in my opinion there is not the to line ratio
1:19:16likelihood ratios inspirational
1:19:18kind of support and hopefully then somewhat to doing so that there's another should is
1:19:22the competence is you so the likelihood ratio it's they're mainly because incompetence so that
1:19:28the final decision has to be done by someone
1:19:30that person with five i find in fact asks for some information so how the
1:19:36guy that have the information that the fact finder has not can communicate his opinion
1:19:41about that piece of information that he tries to integrate with a whole
1:19:46that's the main issue about behind language all the way
1:19:50decision could be made by anyone but this proportion of competence so
1:19:55the form out the formalities their problems because
1:19:58leaving everything without performance leads them anything that are consider illogical because the decisions are
1:20:04made in the reports about that's one issue
1:20:07the issue about simplicity about complexity i fully agree with you
1:20:11i think that things are
1:20:13have to be made much simpler and i was talking about with joe before the
1:20:19band of yesterday that if you've got a chemical analysis
1:20:23there's one guy that i expressed his opinion about one comparison of two pieces of
1:20:27glass using
1:20:29and scanning electron my microscopically with energy for six x rays and so on
1:20:35and or by well i it is not the same goal they're trying to be
1:20:41displaying what's going on inside the microscope of whatever with the energy is present right
1:20:46so for the it's you know how to say that this agreement on the community
1:20:50that there are standards regulate in the use of the procedures are great that are
1:20:54there some kind of make sure
1:20:57error rate so that comes along with it would be would with the standard over
1:21:02so in my opinion that the weighted ball so
1:21:06giving a lot of information to judges is something that can be counterproductive be my
1:21:11way so the balance between transparency and not biasing communication it's important so i think
1:21:17i think your argument is
1:21:19it in some way talking about this issue it is very important issue for me
1:21:23given things simple are the starting point to go if you want to put a
1:21:28new method following way
1:21:32can i can i just a we i think there's lots of details that we
1:21:36can talk about later but i think
1:21:39we have to present something which we believe is logically correct first and then we
1:21:44have to second worry about how to communicate that and it's not appropriate to present
1:21:49something which we believe is logically incorrect all we which
1:21:53but
1:21:54which is easy to present
1:21:55and the exact example you were giving i think that's one where when we if
1:22:00the jury looks of that they will immediately jumped to it was him
1:22:04they will jump to an identification
1:22:08and so that i think that's a problem
1:22:13okay we have to move might be this one
1:22:16yes
1:22:17jason
1:22:23is that i just want to common to on the point
1:22:26proposed by do go in the body and so from then you
1:22:30are you agree and i think we should be honest when you experts are doing
1:22:37information report
1:22:39it's not for the judge
1:22:41it's only for some over simple if you kick spits which will be able but
1:22:46the difference side for example to exit mine you information and to give some inputs
1:22:52to willow your
1:22:53we have in a g and you in front of the court
1:22:57the only important things is how you are present in your opinion
1:23:03it's only based on what you are sitting on the there is no thing to
1:23:09do with the cued racial you could save my ticket rituals then to any you
1:23:14know but betterment me or like me you know
1:23:16so we have to be clear
1:23:20report on the scientific boats of for some people should find enough information in order
1:23:26to criticism norwalk
1:23:28and if you know
1:23:30information is given by some morning and widely by the expert recall
1:23:39i will you don't
1:23:42i have been discussed in many forensic scientist where we always agree that transpires is
1:23:48important everything has been transparently reported and so on
1:23:52talking about explanations in court about issues
1:23:56the balance have to be taken into account for example indiana analysis the nn analysis
1:24:01the deities they start to use probabilities for reporting and it was a huge mess
1:24:07for ten twenty years
1:24:09but i have a more exactly right in things and interpretation fallacies where common
1:24:15and
1:24:17that experience tell us that
1:24:20it has to be a balance between boarding
1:24:23transparencies important one and when someone comes to core to explain that don't reports
1:24:28probably is better to keep the simplest writers thing rather than going to complicate things
1:24:33for me for example having a performance graph with a lot of details
1:24:39can be
1:24:40okay for us but when you well that for all your
1:24:44probably the information that he's taking from that graph is not what you're trying to
1:24:48express
1:24:49so the problem is that the level of detail
1:24:53which are transparent
1:24:55probably so much detail
1:24:57is giving a person like listening
1:25:01a different message then the person that is speaking is given
1:25:06so the balance has to be there and i'm not saying how to do things
1:25:09but the balance has to be there might been and i fully agree with you
1:25:12are we have to be transparent
1:25:14transparency and
1:25:16and the level of detail our things that has to be considered
1:25:22i can do you want me
1:25:26i can't and on that to planning
1:25:28just one minute
1:25:30should i okay no
1:25:34i think is really important what you're saying and the you can never
1:25:38sort of leaves of the
1:25:40responsibility of what you're actually expressing the court
1:25:44ask some somewhat subjective whatever you do you know you go jeff's
1:25:49weight of the
1:25:51that's a danger in that i know if you
1:25:53i read something about like theory of science or something called like physics and b
1:25:57and that's very much appears when you when you move into a different paradigm which
1:26:03did you just salsa system is actually completed different paradigm where argumentation is actually the
1:26:10thing that they're doing not
1:26:12i it's not
1:26:13engineering more signs in the way we are used to it so you do a
1:26:17lot of analyses but when you end up in corked it's a lot of a
1:26:21argumentation and you express some opinion on
1:26:26all the analysis you made so you are actually
1:26:30you is this a big point with the physics and you that you don't leave
1:26:34the responsibility to just the number logged on all this i have this system and
1:26:37the this is the score and you do whatever you want with it because
1:26:41the communication is equally important on the insecurities noticing i think those
1:26:47there is a mile like in our system with the nine point scale and there
1:26:52are some likelihood ratios bands it's not really that important but it's also like historical
1:26:56and everything that they are used to this kind of system and of course the
1:26:59in a is much
1:27:01stronger is a label
1:27:02much more often have a plus four and so we now our case we're almost
1:27:06never about the class to for example and
1:27:09you have to express the a kind of strange that you can get to in
1:27:12that
1:27:13that it's all a lot of parts of it no matter if you use automatic
1:27:16system or it is based on this phonetic analysis is gonna be some subjective part
1:27:20of it are
1:27:21i mean even that the things that you of the that produce with automatic system
1:27:25you know you choose chosen the data are you of this some subjective nist to
1:27:29all of it so
1:27:30i think is really important to remember the i think some neto is written a
1:27:34really good the article on this interior signs on physics and the end because of
1:27:38you show all these numbers and all these graph and nobody
1:27:41well understanding cord i promise you
1:27:43the
1:27:44the defence lawyer will say something like okay so you actually adjusted your system to
1:27:49the case that jeff
1:27:52did you do that and then you probably in the end at one he forces
1:27:55you when you've been in court for twelve hours a in the chair you that
1:27:59i did that then is gonna say okay able so it's objected
1:28:03and then and you're done
1:28:05so you have to really
1:28:08think about how you expressed thinking course try to stick to your opinion and what
1:28:13you based it on
1:28:15but can remember the physics and b i think it's really important they see number
1:28:19and the score and they would just all
1:28:21he's really smart this guy you know see snaps
1:28:25so okay good thank you so much and i think we wanna go around the
1:28:31plots for all the panelists