Good morning ladies and gentleman,

welcome to the third day of your Odyssey Workshop.

Out of fifty one papers, twenty seven have been presented over the last two days

and we have another twenty one to go.

twenty four to go, if I'm doing the calculation right. Twenty four to go and

yesterday was the... all papers were on... mainly on i-vectors. And we can say yesterday

was the i-vector day. And today papers are... except one paper, there are two major

sessions. One is language recognition evaluation and then features for speaker recognition.

My name is Ambikairajah, I'm from the University of New South Wales in Sydney, Australia.

I have the pleasure of introducing to you our plenary speaker for today, doctor Alvin

Martin.

Alvin will speak about the NIST speaker recognition evaluation plan for two thousand twelve and

beyond.

And he has coordinated NIST series of evaluation since nineteen ninety six in the areas

of speaker recognition, language and dialect recognition. And the evaluation work he's involved

collection and selection and preprocessing of data, and writing the evaluation plan, and evaluation of

the results, coordinate in the workshop and... and many more tasks.

He has served as a mathematician in the Multimodal information Group at NIST since nineteen

ninety one and to two thousand eleven.

Alvin holds a Ph.D. degree in mathematics from the Yale University. Please join me in

welcoming doctor Alvin Martin

Okay! Thank you! Thank you for that introduction and thank you for the invitation

to do this talk. I'm here to talk about this speaker evaluations and, as you

know, I have

at NIST

and I remain

associated with NIST for this workshop, however

I am here

independently, so everything I say or

I'm responsible for everything and no one else is, opinions are all my own.

I guess I might... don't think I subject to any restrictions, but

I'm at the clock.

okay

stay closer to this. An outline of the

Topics I hope to cover... Gonna talk about some early history, things that preceded the

evaluations, the current series of evaluation. The things that happened during the early times of

the evaluation

and giving kind of a history of the evaluations and in part of past Odysseys

evaluation... who's involved with I should note my debt to Doug Reynolds who gave a

similar

talk on these matters four years ago in Stellenbosch and I will update one of

the slides that

he presented there. Gonna say some things from the point of view of an evaluation

organiser, about ... about evaluation organisation. Say something about performance factors to look at, something

about metrics which we've already talked about at the others workshop. Say something about progress

measuring progress over time

and when we talk about the future, quitting SRE twelve evaluation process currently going on,

it will take place in the end of this year and then

so about what might happen after this year

The early history

ones I would mention

One thing that backed to the interesting speaker recognition evaluation success of speech recognition evaluation

back

in ... in the eighties and the early nineties, this

very much

involved in this, in this showed the benefits of independent evaluation on common data sets.

I'll show a slide of that in a minute.

I will mention the collection of various early corpora that were appropriate for speaker recognition:

TIMIT, KING and YOHO, but most especially Switchboard. It was a multi-purpose corpus that was

collected around nineteen ninety one, so one of the purposes that they had in mind

was speaker recognition, collected conversations from a large number of speakers so that you have

multiple conversations for each speaker. Success led to the collection later Switchboard two and similar

collections. And in fact in the aftermath of Switchboard, The Linguistic Data Consortium was created

in nineteen ninety two with the purpose of supporting a further speech and also text

collections in the... in the United States, and onto the first Odyssey, all wasn't called

Odyssey, it was Martigny in nineteen ninety four followed by several others. I will

show pictures and make a few remarks on those. And andthere were early NIST evaluations.

We date the current series of speaker evaluaqtions nineteen ninety six, but there were evaluations

in ninety two ninety five. There was a DARPA program evaluation at several sites involving

the DARPA programming in ninety two. In ninety five there was a preliminary evaluation that

used Switchboard one data and at the six sites. But these earlier evaluations, the emphasis

was rather on speaker identification

on closed set rather than on open-set recognition that we've come... to know in ...

in the series of evaluations

So here's this favourite slide on speech recognition. The Benchmark Test history. So these... you

know, these... the word error rate is on the lyrical scale, logarythmic scale

start from nineteen eighty eight

and this show best system performance of various evaluation, various conditions in... In successive years,

or years when evaluations were held. So pointing out, of course, is the big fall

in error rates when multiple sites participated on common corpora and we looked at error

rates and

with probably fixed conditions we could see progress being evident, specially this is showing the

early series. This

this... we came a mile over in the evaluation cycle research, collect data evaluate, show

progress that gave inspiration to other evaluations and in particular, speakers

okay, so now

do some walk down memory lane

the first

workshop of this series was Martigny in nineteen ninety four

It was called Workshop on automatic speaker recognition, identification and verification

and that workshop, you know, was the very first of this

recently will attended, but not as well as this one. And there were various presentations

and there were many different corpora, many different performance measures and it was very difficult

to make meaningful comparisons. I presented here one of the papers I presented papers that

interest from the NIST evaluation point of view. There was a paper on public databases

for speaker recognition and verification. It was given there

And to pull the other of the early ones... Avignon, nineteen ninety eight. Speaker recognition

and it's commercial and forensic application is what it was called. We called... also known

as RLA2C from the French title.

and one observation is that in terms of the talks there

TIMIT was a preferred corpus

for many was

too clean, too easy corpus. I remember Doug making comments that he didn't wanna listen

anymore. Papers that described results from TIMIT... there's also characterized by sometimes bitter debate over

forensics and how good job forensic experts could do with that at speaker recognition

there were

several

missed speaker evaluations related papers... actually, three of them that combined into

this paper in speech communication

from three presentations, perhaps most memorable was the presentation by George Doddington who told us

all how to do the speaker recognition evaluation

So, this was a talk that laid out the various principles, and most of the

principles are kept and followed in our evaluation series, includes a discussion of the

one golden rule of thirty

Crete, two thousand one

Two thousand one, A Speaker Odyssey, took the official name. Speaker recognition workshop. That was

the first official Odyssey

it was characterized by more emphasis on evaluation. There was an evaluation track that was

persuaded, the NIST was

involved with

So, one of the presentations, the NIST presentation, I think I

gave it...

the history of NIST evaluations up to that ... that point and I will actually

show a slide form there later on.

another

key presentation was... was one by several people from the Department of Defence: Phonetic, idiolectal

and acoustic speaker recognition, that was... these remained their ideas that were being pursued at

the time and that were influencing the course of research that point I think the

name was Noan George had a lot to do with that. He had the paper

on idiolectal techniques as well

Toledo in two thousand and four,

I think was really where Odyssey came of age

It was... it was well attended, I think it probably remains

the most

highly attended of the Odysseys. It was the first Odyssey in which we had the

NIST

SRE workshop, held in conjunction at the same location. That was to be repeated in

Puerto Rico in two thousand six and Bordeaux in two thousand ten. It was also

the first

Odyssey to include language recognition units. It had two notable key notes on forensic recognition

earlier in Avignon ... these were two excellent well receieved parts and since then, Odyssey

has been established biannual event that's been held every two years

And that this data presentation, I think Mark Przybocki and I gave called The speaker

recognition evaluation chronicles. And it was to be reprised, I think that about two years

later in Puerto Rico. So, Odyssey has marched on

Two thousand six was in Puerto Rico I find, incredibly, the picture of it. Two

thousand and eight, Stellenbosch hosted by Niko. Twenty ten, two years ago we were in

Brno. This is the logo designed by Honza's children. And now we're here in Singapore,

and I think

before we finish this workshop we will hear about plans for Odyssey in twenty fourteen.

Okay! Let's move on to talk about organisation.

think about evaluation. The thought are that is

part of the organisation responsible for organising evaluations. And questions are which tasks are we

to do, key principles, all this ... some of the milestones will be take directly

taught.

I've done different evaluations and talk about that participation

So which speaker recognition problem? These are research evaluations, but what is the application, environment

and alignment? Well, we know what we have done, but it won't be necessarily obvious

before we started, but it would be access control, the important commercial application. It might

have formed the model. It would raise s question of text independent or text dependent.

There are some problems, I think we shuld do text- dependent. In part of the

access control is the

prior probability of target used to be high.

their forensic aplications that could theoretically be or there's person spotting

of course, is the way ... sometimes the way we went. Inherently in person spotting

the prior probability target is low, it's text independent

Well, in ninety six... and we'll all look at the ninety six evaluation plan. The

separated NIST evaluations would concentrate on speaker spotting, emphasising low false alarm

area of the

performance curve

Some of the principles have been the speaker spotting

in our primary task

we were research system oriented, you know. Application inspired but in it to research

NIST traditionally, with some exceptions, doesn't do product testing in the English. You do the

product testing to advance the technology. We searched the principle that we're gonna pool across

target speakers

people had to

Get scores that will say that work independent on target speaker or having a performance

curve rate every speaker and then just averaging performance curves

and we emphasize the

alarm rate region, both scores and decisions were required in that context with the other

system

NIko suggested that George is gonna talk about tomorrow that calibration matters. It is part

... part of the problem, the adress

Some basics... Our evaluations were open to all willing participants to anyone that

you know, follow rules. I could get... get the data and run all the trials

and come to the workshop where research oriented we have tried to

discourage commercialised competition. Now, we don't want people saying an advertisements, we the missed ideal

our evaluation that's featured with the evaluation plans is specified applying all the rules or

all the detailed evaluation, we'll look at one.

Each evaluation is followed by workshop

these workshops were limited to participants plus interested government organizations that every site or team

that our participance was expected, we represented. At some of them we talked meaningfully about

the evaluation system. The evaluation datasets that we subsequently published, made publicly available by the

LDC. I would not give ... that remains the aim... remains the case the SRE

o eight data is currently available. In particular, sites getting started in research may wanna

be later... 'cause are able to obtain it. Typically, we'd like to have not the

most recent eval, but the next most recent eval, in this case that's o eight,

available publicly probably next year SRE o ten will be made available, heopefully the LRE

o nine, to mention language eval, will soon become available

okay

with one hand on this web page

hpage for the speaker eval, list of past speaker evals in for each year, you

can click on and get the information on the evaluation trought that year

started in nineteen ninety seven. For some reason, the nineteen ninety six evaluation plan things

have been lost, but I asked Craig to search for it and he found it,

so I hope that will get put out, but that mean

what went into the evaluation plan, the first evaluation plan of the current series, which

we said the emphasis will be on issues of handset variation and test segment duration

in traditional goals as were said to drive the technology forward, measure state-of-the-art, find the

most promising

approaches

Task has been task of the hypothesized speaker, segment of conversational speech on the telephone.

That's been expanded, of course, in recent years. Interestingly, are you surprised to see this?

The research objective, given our overall ten percent miss rate for target speakers, is to

minimize the overall false alarm rate.

That is, actually, what we said in ninety six. It is not what we emphasized

in the year since

until

this past year, as you heard in the best evaluation, that's was made the official...

Craig is gonna talk about the best evaluation tomorrow, so in that sense, come full

circle.

but this also mentions that performance is expressed in terms of the detection cost function.

And the researchers than minimize DCF. They also specify the research objective that I am

natural emphasize and I don't think we'd achieve the... achieve uniform performance across all target

speakers. There have been some investigations about classes of speakers and

sometimes attributed Doddington different

types of speakers in different levels of difficulty

so again the task is given up

speaker... target speaker in that segment

is the hypothesis if that speaker's true or false

two measured performances in two related ways. Detection performance from the decisions and detection performance

characterized by roc.

word is roc

here is the dcf formula we're all familiar with. We have parameters cost

which was once expressed as ten, also false alarm as one and the prior probability

target

expressed as point zero one. We also... in this old computerized DCF for a range

of p target in a sense where to return to that promise in the current

evaluation site,

Here we say our ROC will be constructed by pooling decision scores

these scores will then be sorted and plotted on PROC plots.

PROC are ROCs plotted on normal probability

plots. So this was in nineteen ninety six, the term for what we now

all refer to as

as DET plots

we talk about various conditions ... results by duration not this type decision previous task

or reqiure explicit decisions

and that scores of multiple target speakers are pooled before plotting the PROCs. So that

requires score normalization across speakers. So that was the key emphasis that was new in

the ninety six evaluation

previously. Now we honor the term DET curve following the nineteeen ninety seven Eurospeech papers,

which preserved ... used the term DET curve, the detection error tradeoff. I think George

had a role in choosing tyhat name

George turning to one person involved, another is... you may know as Tom Crystal. Incouraging

the use of ...of this kind of curve that linearizes

a performance curves assuming normal distributions

and

I was surprised to find that there's a Wikipedia page for DET plots. So, this

is the page showing the linearizing effect.

okay, now we talk about milestones

These are sorted down, others may choose different ones, but you know We realized that

we had earlier evaluations in ninety two and ninety five, the first in the series

was in ninety six.

Two thousand is first that we had a language other than English, we used the

AHUMADA,

Spanish data, along with other data. Two thousand one was

rather late, we were in the United States with first evaluation cellular phone data. Two

thousand one we also started providing ASR transcripts, errorful transcripts. We had kind of limited

forensic evaluation using a small FBI database in two thousand two. Also, two thousand two

there was the SuperSid workshop, one of the projects that Johns Hopkins workshop; it followed

the SRE and helped to advance the technology. Other Baltimore workshops that followed up on

speaker recognition. Many people here participated. Two thousand five

first multiple languages, bilingual speakers

in the eval... Also the first microphone recordings of telephone calls and therefore included some

cross channel trials. Interview data, like with the mixer corporate day in two thousand eight

have been used in two thousand ten. Two thousand ten involved the

new DCF and the cost function stressing even lower false alarm rate, a little more

about that later. Also in two thousand ten there are lots of things coming out

in the recent years. We have been collecting high and low vocal effort data; also

some data that look at aging. Two thousand ten also featured HASR, the human assisted

speaker recognition. Evaluation small set that invited some systems that involve human as well as

automatic systems.

Twenty eleven is best. We had a broad range of test conditions, including add noise

and reverb, Craig will be telling you about that tomorrow.

Twenty twelve, it's gonna involve target speakers to find beforehand

participation

participation

grown

begin with. The number in fifty eight is... we have it in... these numbers are

all a little fuzzy in terms of what's a site, what's a team, but I

think of these numbers like... these are the ones that Doug used a few years

ago and updated them. Fifty eight in twenty ten

Doug, the MIT has provided... I think we're not doing physical notebooks anymore, but when

we did, provided a cover pictures of the notebooks that the sold

sure wanted to. One thing to note for understandable reasons, I guess, is the big

increase in participation after two thousand one

and the point I should notice is handling

the scores of participating sites becomes a management problem. To a lot more work doing

the evaluation of fifty eight participants than one dozen participants, and you know,

this is the... actually this is a

can't handle scores of the participants, that is handling this

trial scores of all these participants, it doesn't matter if score is of participants, they're

score participants

so this is one of Doug's cover slides from two thousand four showing logos of

all the sites and in the centre is a DET curve

condition of primary interest, common condition well

systems

from two thousand six

than Doug for those efforts

So here it is, the graph,

ninety two and ninety five were

outside the series and had limited number of participants. Twenty eleven is the best evaluation,

it also was limited to a very few participants

otherwise, you didn't see the train... particularly those that trained after two thousand one growing

to the

fifty eight alongtime twenty ten. Twenty twelve evaluation to the registration is open, was being

open over the summer and last count I had is thirty eight and I expect

that's going to grow

So, this is a slide

from

two thousand one presentation at Odyssey that describe the evaluations up to that point

in the center is the number of target speakers and trials, so the first

six evaluation on Switchboard one had forty speakers that had really a lot of conversations

and one of the trains in the other evals restored more speakers up to eight

hundred by two thousand

we... each case to find a primary condition

whether we are basing that on the number of handsets in training

or whether we... can we emphasize different number... different phone number trials, we were looking

at the issues of electret versus

a carbon button, that was a big issue is the days of landline towers. So,

this specifies the primary conditions and evaluation features for these early evaluations

here is an attempt without putting in numbers to update some of that for

evaluations after two thousand one

we end up pulling primary condition of a common condition in that everyone but that

the true for the official chart that we first evaluate all other conditions a when

we introduced different languages to the common condition involved English only all the kinds of

handsets so time to trade it on know and how well a problems that

and on the right you see some of the other features that came out anew.

Cellular data was added, multilingual data

came on in two thousand five

two thousand six we had some microphone tests

and then

things only got more complicated in the most recent evaluations

on terms of common conditions, in two thousand eight we had eight common conditions

two thousand ten we had nine common conditions. Two thousand twelve five common conditions that

classified

so in eight, we contrasted English in bilingual and contrasted interview

in conversational telephone speech

in two thousand ten we were contrasting nineteen telephone channels, interviewing conversation speech and high,

low and normal vocal effort. Two thousand twelve we get interview test without noise or

with added noise or repeated with added noise or with conversation phone test collected in

a noisy environment.

two thousand eight and ten involved interviews with the mics collected over multiple microphone channels

two thousand ten, of course, added high and low vocal effort

effort in aging with the Greybeard corporate in two thousandten also introduced HASR. Two thousand

twelve offered more about target speakers, specified in advance.

So, something about performance factors.

I'll try not to say too much of this, but in terms of what we've

looked at over the years, we've tried to look at demographic factors

like sex and in general, there have been exceptions. The performance has been a bit

better in male speakers than female. Kind early I would look at age and Geordge

more recently has done a study of age and recent evaluation; he may say something

about that tomorrow. Education factor... haven't looked in too much. One very interesting thing in

getting the early evaluation is to look at mean pitch.

people's

test segments and training. And

if he's put a non-target trials between

similar pitch or pitch not.. it means it's similar, not close. The difference... and even

more interesting, look at target trials, where the meet pitch was the same or not

similar pressure person and all that it seriously that's all

speaking style

conversational telephone interview, particularly .... A lot of data has been collected on that. Vocal

effort, more recently. The questions about

defining vocal effort and have it coillected. Aging switchboard with the reviewed corpus ... limited

time collecting it is difficult. These are the intrinsic factors related to the speaker.

The other category, extrinsic factors relates to the collection by microphone or telephone channel. Telephone

channel, landline, cellular, VOIP is something we work on. Earlier times, since days, carbon versus

electret. Telephone handset type; various types are various microphones in the recent evaluation of matched

to mismatched microphones. Placement of the microphone relative to the speaker and

background noise and room reverberation

talk about that tomorrow and it kind of takes the best

And finally, parametric factors. Duration of training test, and also the number of training segments,

the training sessions which

evaluations that have eight sessions of training for telephone speech could greatly improve performance. We

tried carrying along for many years, ten seconds is short duration of things, but there's

also the increase in duration, especially in twenty twelve, we're gonna have lots of sessions

and duration

in training and I think, perhaps the emphasis is larger than

seen the effects of multiple sessions and more data in evaluation. English, of course, has

been the predominant language, but several of the evaluations include a variety of other languages

and one of the hopes is that performance is good in every language as English.

We at first suspected the reason that overall performance had been better in English is

due to the regularity and more quantity of the data available in Englis. Cross language

trials are a separate challenge

okay

the metrics

Mention equal error rate, it is with us, it's part of our lives, in it's

substance. I tried to discourage it, but ... It is easy to understand

in some ways

at least amount of data

but, you know, doesn't deal with calibration issues and basically the operating point of equal

error rate is not the operating point of applications

high target

Prior probablities target or may have load not really equal. Decision cost has been our

main state bread and butter, we'll hear more about that. CLLR has been championed by

Niko, we talked about it

monday and we've talked about just looking at false alarm rate, it affects miss rate,

which we return to in best. So, you all know about the decision cost function,

it's the sum of the specified parameters

First we normalize it by the cost of a system that has no intelligence, but

simply always decides yes, always decides no, so the worst possible score is one.

So the parameters that were mentioned in ninety six, these were the parameters form ninety

six to two thousand eight,

twenty will reach for domain, conditions for core and extended test.

we changed

what's the miss is one, false alarm is one, target point zero one.

the driving force, and a lot of people

were upset, their scepticism with

create systems before that. I think the outcome has been relatively satisfactory, I tink people

feel that they developed a good systems

before this

Niko talked about

cllr, he noted that George suggested limiting cllr to

to false alarm rate, covers a broad range of operating points.

Fixed miss rate, we said, has it's roots in ninety six, but

is used in twenty twelve. It's practical for applications, it may be viewed as cost

for listening to false alarms. Some conditions... conditions were really good, you see, can't get

a ten percent miss rate, maybe appropriate for one percent miss rate.

recording progress

How do we do that? it's always difficult to assure test set comparability, if you're

collecting data the same way as before, is it really equal tested? Well, we encourage

participants in the evaluations to run their prior systems, moth old systems.

a new data, which gives us some measure

But, even more, it's been a problem with changing technologies, you know, ninety six landline

phones predominated, we dealed with carbon and electret.

now, the world is largely cellular, we need to explore VOIP , present the new

channel. So, the technology makes changing and with progress we will make the test harder.

Always want to add new evaluation conditions, new bells and whistles.

More channel types, more speaking styles, languages... the size of the evaluation data measures.

In two thousand eleven, we explored externally added noise and reverb. The noise will continue

in this year. So, Doug attempted in two thousand

eight to

to look at this, to explore existing condition, the course of years and looka at

the best system.

and here is an updated version of his slide, showing for more or less fixed

conditins.

logarythmic of a

DCF, I believe

where things worked. This numbers go up to two thousand six

with added data in the right, two thousand eight showed

some continued progress on various test conditions. Then in twenty ten

we threw in the new measure. That really messes things up, numbers went up, but

they're not directly comparable. This is the current

of our history slide tracking progress

so, let's, you know, turn to the future

SRE twelve

target speakers

at most are specified in advance. There are speakers in recent past evaluations. I think

it's something in the order of two thousand. That it's best why it is potential

target speakers. So, sites can know about these targets, they have all the data, they

can

develop their systems to take advantage of that. All prior speeches are available for training.

There would be some new target speakers with training data provided at evaluation times; that's

one check on the effect of providing the targets in advance. We also have a

test segments that will include non-target speakers.

that is the big change for twenty twelve. Also, new interview speech will be provided,

and was mentioned yesterday, in sixteen bit

linear PCM

some of the test phone calls are gonna be collected specifically in noisy environments

And moreover, we're gonna have artificial

noise, you know - added noise, test was done best... some test segment. Another challenge

in this community. But, will this be an effectively easier task

because we find the targets in advance and subsets

It's... it mix it into these partially ... close that trial, you know, you are

allowed to know not only about the one target, but these two thousand other targets,

will that make a difference? We had, you know, we have open workshops. you know,

workshops where the participants... we debate these things. Last December this got debated how much

will this

change the system? Will it make the problem too easy?

It was ... we could have conditions when people were asked to assume

that segment is target so it... since things fully close that.... or to assume no

information about targets other than that of the actual trial.

clearly speaker siding is in the past, so people do this, their results provide basis

for comparison. This is what's to be

investigate to be seen in SRE twelve. In terms of metrics, log- likelihood ratios now

are required. And since we're doing that, no hard decisions are asked for.

in terms of primary metric

you know, just use the

the dcf of

point ten, but Niko pointed out that if you're not really required to calibrate your

log likelihood ratios if you're only using it at one operating point.

so therefore

to require calibration and stability, we're gonna actually have two DCS and take the average

of them. Also, cllr is alternative. Cllr m ten ... Niko referred to

the limit cllr trials with

high miss rate

so

the formula for TCF. We have three parameters, but we're working right at this one

parameter beta and so

the cost function of the simple average TCF one , TCF two, where cost of

one where target priors are either point zero one aas is twenty ten, or point

zero one

order things to

that would be

the official metric

and finally

future hold

that of course

none of us knows, but

twenty twelve , the outcome will determine whether all this

idea of prespecified targets will be

an effective one, that doesn't make the problem too easy or bigger, now we're gonna

see

Artificially added noise will be included in noise and reverb added may be part of

the future.

HASR twelve will be repeated, HASR ten had both other tests of fifteen hundred trials

or a hundred and fifty. HASR twelve will have even twenty or two hundred, and

anyone, you know, those with forensic interest, but anyone interested in to involve human assisted

systems is invited to participate in HASR twelve, I would like to get more participations

this year

HASR extraction

and answer is

to just bigger

fifty or more particiating sites. Data volume is now getting up to terabytes.

best evaluation that so much, this year will be in twenty twelve, because we're only

providing

most priors will be run test data but that... you know, the numbers are

segments are in the hundreds of thousands and the number trials

is going to be in millions, tens of millions, even hundreds of millions before the

optional full sets of trials, so

likely you see the schedule moving to an every three year-one but details really need

to be

a lot more

I don't know it, but I think that's where

I'll finish

discussion

segments for a speaker

know about our speakers

so

I didin't say they are normal curve

Right, so LDC has an agreement with the sponsors suppert the lre in sre evaluations,

that we will keep the most recent evaluation set blind, hold it back from publications

and the general LDC catalog until the new data set has been created and so

part of the timing of the publication of those eval sets is

the requirement to have a new blind set

is the current evaluation set and I

ascending we have

answers. We can raise that issue with them, and give them the key back they

were getting

right, so the SRE eval set is just being finished now

as soon as that's finalised

sre ten will be put into the queue for publication. Sre... sre ten, will be

put to queue for publication

it is sort of rolling in circle

you'll have to ask the sponsor about that, I can't speak of their motivation, only

they're contractually obligated to delay the publication of discussion.

right well LDC is also balancing the needs of the consortium as a whole and

so we... we are staging publications in the catalog, balancing a number of factors, so

the speaker recognition and language

are the one of the communities that we support. I hear your concern, we can

certainly raise this issue with the sponsors and see if there's any

any ... if we can provide. But, at this point, I think this is the

strategy that spending

this one