thank you very much for video presentation

mandarin min come from you don't time

today i can actually for competition expectation for shown to the spoken language identification

i want to keep this presentation of the follows

clustering and we introduce the short utterance language identification tasks

the thing i shall use a neural network based on writing techniques

extractor

and they show how that vector use them for lid task

after that the feature compensation learning will be introduced

then

i'm sure you

our experiments are sent out

one really

and you summer and the conclusions

okay language identification techniques and topical use of a pre-processing stage a lot you lingo

did recognition and translation system

for real time speech processing system

incorporating performance of shock filters are task

are important

because it can

zero to reduce the real-time factor and the

it is also or system

well of the

state of the how

to

right the master is the i-vector based and that's it

alright to this semester very effective a relative number of devices

recently

most of the researcher neural network based approaches

because so the idea is the classification task

therefore they neural network model can be directly used for classification

the entanglements sure that the performance

a shot boundaries right you task

experiments a high initial for speaker verification task

and the recent study it was also successfully used to derive the task

in this work

we focus on the big vector based

nested

the expenditure the neural network based they write presentation data

note that using that are applied to men cost

the speaker recognition even today actually on the language identification

the network for extracting extractor

consists of three month euros

reliable feature extractor

statistics hogan

and the boundaries

variable representation years

a very well feature extractor model

outputs frame level

the utterance

we impose over a sequence of acoustic features

well this year s

time delay neural network

well convolutional neural network or used

then

a good coding here

canberra the frame level quality

further frame level features into a fixed to dimensional vector by using the mean and

they're

standard the condition

finally

for connected actually didn't is used to process all utterance level representations

and a final thoughts the next earlier you used it is all those response to

use you have

and the map i

and like to thank next are mostly used for speaker verification task

using the verification task

the extractor the doctors

frontends

that is the used to extract results of contracting agent

you back and

some of them and here or cosine similarity can be used up all common case

for the lid task

the front end up backends approach can also be used

compared to be that jointly row just thinking regression become more widely used directly

classification task

well clusters and

a reading tasks

we can also directly use the network outputs for classification

this work

make a shot authors lid task

not only

but the testing utterance become shorter

so performance also decreases

no degradation is mainly because

and i can think up to ten calls applies a large variation

of the shuttle to resist

to reduce

the variation or short utterances

normalization method using and

corresponding no other varieties

warranty investigated for i-vectors

and neural network based

it is the number that we can also apply stimuli the i-vector extractor

therefore

we inputting we think that

similar idea

two

improves accuracy performance by using vector network

the chair

compensation

well down by reducing the actually then

representation pleading a and the short duration

inputs

there

the s

is that representation overshot of the variance

and there is a representation of the corresponding rhino buttons is

the i-vector space

this education

can be rewriting "'cause" this one

well for training

drastically

which the vector is the network by using an l

duration encodes

then the shot input space to model the trend maybe a function

considering that difference between them out and the shot utterance

the shot boundaries

consis a very limited information

therefore to improve the performance a short utterance

both i and i were extracted and information local phonetic information an important issue

we suppose that

the variance

components the vector kind of that language and describe the information related to local phonetic

information

based on this consideration

but we propose to normalize only seventeen

component it's vector

it is

the representation overlap utterance

well

you mean

so rare in

components

to you the

frame level phonetic information

well alright discriminative features for language identification

the cost of the proposed a method is the only this time

for the representation of the utterance

could be obtained by neural network we assume that all those

so the intended to pass the last

in that program them that's a wine

we use and spectral and the

to supply

representation

and the in proposed a mess of the two we use the rest match

a global calibration pony

to obtain a representation

we evaluate you the proposed method that means that language recognition evaluation

two thousand and seventy set

it's a training data used

clover in this ad

and i dunno three five development data

for a rainy the to seven

and the

the telephone data so that i that line

for the test set it to be used as a close the standard nice to

those

the except that has recently that in section that the study is that okay and

the

this ad

we also program the

a wine one point five and to use against this sense

one of a trust

we used to sixty dimensional all they're pretty bad major

and then you covariance and that the existing as the average of was used for

evaluation metric

for this analysis is you can kind of the rest nets system and that it's

vector systems

the rest analysis to us

so the holy rollers that's

network

they are probably

and that while for the connectivity

the a lot of nist or both

well the i-vectors is to the thing last night to

we use the reliable feature extractor

well the training examples

some examples of our group had between five to ten seconds and the shot utterance

but it is going back to two seconds

in this case we show the results of the baseline and systems

come variation

we also realistic this results with popular by

other is utterance

was anybody can

it's a extractor system are more in fact you on long code utterances

and whatnot shop utterances the rest and

this is done in the better performance

and because of the duration mismatch the model trained with a lot of them is

samples

we form the where on the basis of the data but i'm not problem that

there shall i

the integration of the team here that without the feature compensation method

in this table

the baseline is the olympics vector network trained with the shops examples

the results of mean error rate is the

composition learning

and the two proposed them

mess to whether he's this table

for you the variation

we give a speaker to compare baseline

mean and variance this okay

and the proposed a method

problem of the results

we can say

the channel compensation

by using those

mean and variance

only could improve the performance

well not all utterances

yielding very

according to the best results

i four show the other varieties

compensation by using

me only

this significantly improve the performance

well concluded

in this work

we investigate an improvement of the neural network based the impending techniques

vector for shot about the rest lid task

we compare database that the channel compensation by comparing in various and the need i

think this the last

the proposed to me is the channel compensation only

it is expected to capture high-level or

construct a language information

right our meeting

variance components three because it is for that reason for software that it's

the results show that the proposed method the mock in fact the shock filters right

you task

that's what your attention