0:00:13thank
0:00:14so i make a bit like and uh
0:00:16and i will present our paper called converting a one one all audio recordings to be a format for directional
0:00:22audio coding with blocks
0:00:25right long a let's score
0:00:27so yeah yes line of the presentation so i was that
0:00:31like to be an introduction to the topic
0:00:34and then give you some background information about this and then the actual something
0:00:39and i will and with some conclude
0:00:43so yeah
0:00:44several parametric spatial audio coding methods
0:00:47for example and pixel round or a parametric stereo
0:00:51it's that a
0:00:52but
0:00:53and they are really good but to become a they have some certain kind of reproduction system for some stereo
0:00:57or five point one
0:00:59so it would be perhaps nice if we could have some kind of
0:01:03oh any kind of in to the
0:01:05system and and it would also reduce the sound using any
0:01:09reproduction system but some any loudspeaker layout or what sample head
0:01:16so what you have in deal you've ever been in our universe these this thing all the directional audio coding
0:01:21all the know
0:01:23and
0:01:24this this thing to use this as the form in input
0:01:27and you can use any loss speaker layout as a
0:01:30for reproduction and also had poles
0:01:32so we think that the uh that would be perhaps a suitable can that for this kind of generate
0:01:38only a form
0:01:40and in this study
0:01:41i will
0:01:42think about
0:01:43what that for example white one one or something that is really common on no more well
0:01:47multichannel reproduction so i ones
0:01:49uh
0:01:50study about and of buy been studying about
0:01:53could we can will convert these five one one signal that's to be for
0:01:57and we would be we them with a directional audio coding without that any artifacts
0:02:03and then
0:02:03in this case your case for example
0:02:06we tried to
0:02:07not a point one one and get a set and that of reproduction of a sets of sound
0:02:12and get the same thing
0:02:13through this can of and and the uh these directions audio coding reproduction production
0:02:20so
0:02:21first i will so you something about this
0:02:23the are technology so the basic idea and is it
0:02:27that we don't try to reproduce the sound really physically for right
0:02:31instead we tried to reproduce it in a way that it would be perceptually correct so how we perceive does
0:02:37some real would be the say
0:02:38as in the original space
0:02:40and that
0:02:41possible see its for this technology would be for simple telephone thing
0:02:46and also spatial sound reproduction
0:02:50and that's i set
0:02:51the input to this
0:02:53uh
0:02:53processing is the form
0:02:55three
0:02:56so it contains a the directional signal W
0:03:00and and die X Y and set
0:03:02which uh
0:03:03direct that along the
0:03:05coding case of a to connotation that's just
0:03:10so what we
0:03:12so to is that we process the sum some in time-frequency frequency domain
0:03:16and why we do this is that we tried to me the temporal and also the frequency resolution of of
0:03:22yeah
0:03:23so we try to do with the same resolution solution that's our
0:03:27hearing system works
0:03:30and
0:03:31next let's think about what should we and the last of the sound
0:03:35so we are do the analysis for each time for it's time to get is
0:03:39things without L
0:03:41uh
0:03:42much to the solution
0:03:44and uh
0:03:45first would be to do is we analyzed the direction of sound so so simply just wait it coming for
0:03:49left right up don't
0:03:51so on
0:03:52and then we have this another parameter was called diffusion as
0:03:57which is this uh
0:03:59means that
0:04:00is the sound coming only for certain direction is it like a minute weight
0:04:04or always it got that's completely diffuse coming from all directions but like for example reparation to be
0:04:09or is it something in a we
0:04:14and yeah is the scene that is part of processing and the whole block diagram of we you can see
0:04:20so
0:04:22as you get see yeah
0:04:23we done by this time we is the composition for simple using a short time fourier transform
0:04:29and then we do the processing separately for each frequency that
0:04:33so we get use a direction analysis
0:04:36results from yeah
0:04:38and what we do that the old your that is that we create these kind of work so might of
0:04:41course which can be a sample
0:04:44uh card us to was lots to a store this really similar to first order ambisonics these
0:04:50these H
0:04:51but the take what we do is that we divide the sum of two different
0:04:55it's is for known used three
0:04:57and the is used to be
0:05:00and we break reduce them you know different way
0:05:02and for the don't need used three we assume that it
0:05:05basic a for of the direct sum so we try to
0:05:08rep reduce it that's a point like source using for example i'm to to binding or in three D case
0:05:13vector base
0:05:14base and it would kind
0:05:16and
0:05:17for the diffuse if you stream we ask them that it's rounding so for example real so what we want
0:05:22to get a get a kind of
0:05:24and by her section so what we want to use a for some but the correlation to get it
0:05:29running
0:05:30and in the and we use these two streams and also the all frequency bands and we get a a
0:05:34a a uh
0:05:35lots a C
0:05:39and
0:05:40using this made up with the in it that's been studied for a couple of years and you can get
0:05:44quite nice audio for for real recordings with a a form a microphone
0:05:50so let's scroll next to the actual topic what by we get a like what we were studying in this
0:05:56right
0:05:57so how to really convert this kind of wine
0:06:00five point one audio recordings to be much so at first it seems really seem also
0:06:06what we simply can do which we problem from this kind of well lot record
0:06:10so we position the loudspeakers according to itu standard and then we just assume that we are in a a
0:06:17it works space
0:06:18and we perform kind of words the recording with the people microphone
0:06:22is really simple be can use simple cosine on a sine functions for this
0:06:28but
0:06:29however there were some problems that we power
0:06:32so if if about how
0:06:34human
0:06:35yes this kind of reproduction if you have some
0:06:37the if use of a reverberant something part one one case
0:06:41we perceive is it as the diffuse and around
0:06:44but if you think about how
0:06:46there there a nice is
0:06:47analysis
0:06:48works in this case
0:06:50as i said before that
0:06:52the diffuse nest is one you we have
0:06:54sound coming E from all directions and also these is
0:06:58i think the
0:07:00is also true in that you think about what is stiff use field
0:07:03but in this case we have a most of the last because i in the problem
0:07:07so we don't have equal energy coming from all directions and i so
0:07:11a result we have a low weight use nist and one
0:07:15and what this course is that the we get the by guess of sound source the to send the lot
0:07:20be
0:07:21and it's not received to be completely diffuse
0:07:24and surrounding the solve
0:07:26that this is something of course that we don't cool
0:07:30that we got the second idea how to fix this problem is that we isn't the speakers equally around the
0:07:37we form a microphone and
0:07:39this is
0:07:39was
0:07:40working quite nice that the now you get the diffuse sound coming from all directions T
0:07:46and you get to correct used as well
0:07:50and also you that quite nice all your
0:07:52quality but the problem is of course that
0:07:55no the loudspeakers a
0:07:57it wrong basis because if you think about do you should have a
0:08:01sound so that what simple minus the decrease but now we have a feature that sound source here
0:08:06so we get kind of a by in the reproduction
0:08:09which we of course can
0:08:11point to by in the reproduction side that assume that we have a speakers at those
0:08:15direction
0:08:17but we might have also really people my reporting
0:08:20which you would might perhaps too much much these uh one but it seeing as and in this case you
0:08:25would have a wrong directions in i don't of
0:08:28case
0:08:29so we need something more
0:08:32a lab right to get is
0:08:33as function in all that the time
0:08:36and then we have a
0:08:37i will present this
0:08:38but that that we propose a would be that a
0:08:41so what we do is that for the directional provost
0:08:44some for example
0:08:46a sources or direct sound of the
0:08:49uh
0:08:50sound
0:08:51we use the standard layout so for that part we get the directions right and that is
0:08:56of course the important in that
0:08:58and if we have some reference some then we used even layout
0:09:02to get the diffuse is that you correct
0:09:05and this is really
0:09:06nice in a way because the both directions and the diffuse nist a correct when they are the most in
0:09:11four
0:09:13oh let's look at the block diaphragm
0:09:15so
0:09:16we do the analysis
0:09:18and
0:09:19a a frequency domain so we start by these time because of the composition
0:09:24and what we do is we can you these
0:09:27we watch other be not signals use the even and layout and the standard let loudspeaker layer
0:09:35and what we do is we simply a these two three is a holding to the D C use ms
0:09:40of the summer
0:09:41of the some you and we do that
0:09:43analysis using these you and lots of light
0:09:46and we can see
0:09:48cross fade or was played between these two three
0:09:52to get the most
0:09:53the
0:09:55people three
0:09:57and we have seen that these would be quite close to
0:09:59but it should be ten
0:10:00next we what we do the and less the diffuse as again
0:10:04if this is correct now
0:10:07and we
0:10:07see from here that
0:10:10if we think about this
0:10:12re line each the diffuse this can do with that
0:10:15even a speaker layout we get
0:10:17correct it used
0:10:18where with this standard layout we would get
0:10:21to load diffuse noise
0:10:22so we can see that if we much
0:10:24we we these two
0:10:26streams
0:10:26we get
0:10:28correct if used as if it's really high or really low but in the be no be get a kind
0:10:32of a little bit and the estimation
0:10:35so we need to by a big for
0:10:39so let's look got this if use this take of once small the see
0:10:42would be a would be more five to get a correct if used
0:10:46so we can see yeah that this is the uh
0:10:48pressure and this is the part of as
0:10:51so we can see that these if these are quite P well
0:10:55the whole term yeah is
0:10:56white white and you get that all that the use this is really low
0:11:00and so
0:11:01if that
0:11:02a particle velocity
0:11:04well basically the dipole signals and all me that's of uh he but we get a low here
0:11:09and on the very if you have if these two
0:11:12so it's a
0:11:14i
0:11:14different we can
0:11:16really low uh
0:11:17but for this um so we get right
0:11:21so we can simply money by the diffusion is by
0:11:24at just in the ratio between the only a directional signal and the dipole signals of the beep
0:11:29oh this is actually done
0:11:31you need a bunch of equations and i would not give you hear but you can read from the paper
0:11:35if you in
0:11:38so let's go back to the this
0:11:39block diagram so we can see that we on but the that if used as well use and we get
0:11:44this kind of a
0:11:45uh coefficients for access to the
0:11:47these used
0:11:49and it the next stage we simply
0:11:51molded by uh the the so signal and the died
0:11:55signals so we get the final
0:11:58we almost three
0:12:01so we can use them to a use these thing as reproduce the sound we get a and we also
0:12:06can it with the other the real people are recordings for example
0:12:11yeah we didn't do
0:12:12for only saying this yet but at least according to in formal test we didn't
0:12:17we didn't find to be put course any degradation of of audio or you probably
0:12:25so
0:12:25as of convolution
0:12:27that audio an audio coding or get out
0:12:29is a perceptually motivated method to reproduce spatial sound
0:12:34and it has been a before using mainly be you for my or us that in
0:12:39and so in this paper or the idea was that could we use also for simple life one S to
0:12:44in
0:12:45to these
0:12:46uh reproduction method
0:12:48in order to have for example these kind of check audio for
0:12:51and we
0:12:53showed to use i thought use a two simple matrix in based methods
0:12:57but they had some problems both of
0:13:00and the proposed method
0:13:02in there we analyse the virtual sound field generated by the five point one or
0:13:07and we do the analysis in the time frequency my
0:13:11uh
0:13:13and by using this we were able to do these kind of people
0:13:17a and that we did not find two
0:13:19uh the rate the audio quality
0:13:22and also we
0:13:23and uh
0:13:24makes the resulting be not that's with other be
0:13:28signals
0:13:30yeah some acknowledgements and
0:13:32i would like to check
0:13:41yeah
0:13:46i
0:13:55i
0:14:00hmmm
0:14:00i
0:14:01i
0:14:05i
0:14:06more
0:14:08yeah
0:14:08yeah
0:14:09i
0:14:11this
0:14:12in physical they are not a at all the say what perceptual at the same at these like one to
0:14:17lower in format test so if you look at B Y so it's a completely different but it some the
0:14:21say and that's the uh a little that it puts some the same
0:14:26uh_huh hmmm
0:14:28oh
0:14:30well what what we wanna do this is that what simple we would also read produce using gonna use in
0:14:34must every all seven what one O any
0:14:36a lot of layout are also had also so that's why we want to get also the be or more
0:14:41uh this five one of these people bust
0:14:48right
0:14:49i
0:14:51oh
0:14:53oh
0:14:54no that
0:14:55directional audio coding and
0:14:57also
0:14:58and also the reverse and labeled
0:15:00on a a bit twice
0:15:02same so is it's only for sets choice say
0:15:10i
0:15:13well
0:15:14oh
0:15:14i
0:15:16oh
0:15:17is
0:15:19yeah i
0:15:20one
0:15:21i
0:15:22wow
0:15:23oh
0:15:24i
0:15:24you
0:15:25uh sweet spot of sweet spot
0:15:27is used for
0:15:29no i i
0:15:30it's of a goalie this it does make a big difference at least we didn't for a of any brought
0:15:35class but of course if you go away from the of three spot you are more easily would you to
0:15:39get a a a plus but we mainly due the since text
0:15:42this
0:15:42the sweet
0:15:44but there was really noise
0:15:51uh_huh uh_huh uh_huh
0:15:55hi
0:15:56i
0:15:57okay
0:15:59and
0:16:00this
0:16:01oh
0:16:02i
0:16:04i
0:16:04hmmm
0:16:06no uh we really did i don't
0:16:09see
0:16:10of course we could
0:16:11because we don't wanna at any room at that basically because we want go present but
0:16:16yeah of course
0:16:18yeah
0:16:19i don't know how do we really
0:16:20get it in it but yeah
0:16:27i
0:16:37uh_huh
0:16:40i did it in
0:16:41room which is a a point to this decide to use that that so it's list
0:16:45what there
0:16:46white nice the i we need some really brief test also in nine a will and that we didn't find
0:16:51a but was also that but
0:16:53the main main part of the test was done in really we
0:17:00okay
0:17:02know
0:17:03i
0:17:04yeah