0:00:16thank you very much
0:00:17and a um it's you also see yeah i'm not the first author of this paper
0:00:22but in our case i must say for to T such that this cannot be here
0:00:26to date because see his family has been in by a second door to two weeks ago so he can't
0:00:31be here
0:00:33um
0:00:33and i'm working at the international audio lab or to recent a in which is that
0:00:37joint institution
0:00:39of
0:00:39um the university of a you know back
0:00:42and the problem of a institute for integrated circuits
0:00:48what's the motivation for the work i'm going to present here
0:00:51is that you often do in music production use a lie on
0:00:55mixing prerecorded material
0:00:58samples
0:00:59and um you also need to at that these samples frequently two
0:01:04different to musical context
0:01:06then
0:01:07the context they were recorded in
0:01:09so in in some cases you might need to key mode conversion
0:01:13this means major to minor or vice versa
0:01:16and
0:01:17they the a
0:01:18algorithm for four
0:01:20enabling this task
0:01:22as been presented
0:01:23um in
0:01:24previous conferences
0:01:26this this is called mode clock modulation vocoder
0:01:30um it's some what's you to put to this task
0:01:33but um we also found out that device
0:01:36special enhancements necessary
0:01:38in order to address
0:01:40special requirements for this application
0:01:46so i first want to um give a short overview on this model walk
0:01:51accuracy
0:01:52which performs the single pass and is
0:01:55in a block wise processing
0:01:57which is shown in in a block diagrams here
0:02:01it does
0:02:01first
0:02:02uh signal adaptive band-pass filtering
0:02:05which is aligned with spectral center centres of gravity
0:02:09means we first of the um if T analysis
0:02:15yeah
0:02:15a dft analysis
0:02:17and from the dft spectra
0:02:19the um centres of
0:02:21gravity
0:02:22in perceptually adjusted then
0:02:25uh determined in the band it's uh just it's so they are this decomposition is flexible
0:02:31so from these centres
0:02:32center frequencies
0:02:34um and the around centre center frequencies to construct a bandpass filters
0:02:38and
0:02:39i in the yeah
0:02:40done in the frequency domain
0:02:41and in inverse
0:02:43uh
0:02:44dft T
0:02:45get back for each bandpass signal a
0:02:47to a time domain signal
0:02:49and um this time domain signal
0:02:52bandpass signal
0:02:53is then and lies with and am and fm
0:02:56and that this this
0:02:57so you basically you have the carrier frequency which corresponds to the centre of gravity of this special frequency reach
0:03:04and
0:03:05a uh the F signal which gives the um
0:03:08instantaneous frequency offset
0:03:11quite um relative to this carrier of frequency
0:03:14and you get um get the instantaneous make me to do M P chewed in the A M component
0:03:20and then you can close to the signal in this modulation domain
0:03:24for example you can change the carrier frequencies
0:03:27and still maintain that uh fine temporal structure
0:03:31um by keeping the A M and the F
0:03:34it's
0:03:35um
0:03:35in the synthesis
0:03:36you have to combine the a if M component with the maybe mode you modified
0:03:42um carrier frequency
0:03:43you have to
0:03:44somehow one the different
0:03:46um components from button block to the next block because it's tempered blocks sets it before
0:03:51um
0:03:53or just and yeah
0:03:54and
0:03:55um you to uh
0:03:56and overlap it
0:03:57processing of the am and the F M
0:04:00or frequent instantaneous frequency
0:04:02signals in order to get continuous
0:04:05um parameter
0:04:06and then you two
0:04:07the synthesis
0:04:08and at up
0:04:10um all the sickness from the different bands you had
0:04:13decompose the signal into four
0:04:17so this is the basic structure of the modulation
0:04:19well coder
0:04:20but do you to the structure with the relatively
0:04:23long blocks in the dft analysis
0:04:26you still the um miss some of the um
0:04:30signal
0:04:31uh
0:04:32characteristics by this processing
0:04:35um and this is
0:04:36one of the parts we we address by the enhancement
0:04:40and this
0:04:40the first of these enhancement was the so-called envelope shaping
0:04:44i means
0:04:45temporal envelopes of with in
0:04:47the uh
0:04:49dft blocks
0:04:50might got get um lost or distorted
0:04:54because you um can lose the
0:04:57um this uh to to dispersed and you can
0:05:00whose face
0:05:01a relations between the different tone
0:05:04and
0:05:05this would could cost the temporal smearing of transients
0:05:08and in this case it's better to use
0:05:11then explicit
0:05:12a temporal envelope
0:05:14and you get access to the parameters of these
0:05:16um of this temporal in below
0:05:18but doing an lpc analysis in the frequency domain
0:05:21because correlation in the frequency domain
0:05:24corresponds to multiplication in the time domain
0:05:27this means with it at coefficient
0:05:30you get from from lpc analysis
0:05:33along the frequency axis
0:05:34you get parameters
0:05:35you can use for a um getting
0:05:38and
0:05:39time function
0:05:40you could could say at time response
0:05:42yeah but you can
0:05:43then it at the end
0:05:45a might apply to to get back the temporal and middle
0:05:49this what is done
0:05:50with this
0:05:51read looks
0:05:52these are the um
0:05:53enhancements what the
0:05:55envelopes
0:05:59um
0:06:00in other
0:06:01enhancement
0:06:02the enhancement which is necessary
0:06:04once you um start modifying spectra components
0:06:08is
0:06:09that you have to take into account
0:06:10that
0:06:11um
0:06:12music a sounds are not normally consisting of a fundamental into a lot of harmonics the tone
0:06:18and um you should keep this in mind when you modify frequencies
0:06:24so the overtones tones are um quasi harmonic on uh the yeah frequency scale
0:06:31which are you normally integer multiples of the fundamental frequency on you team integer multiples
0:06:38um on the other hand to musical intervals are based on a logarithmic scale
0:06:43and um now it's
0:06:45a question
0:06:46when you modify frequencies in which way you should modify them
0:06:51um or and of course we want to modify them in the the based way for for the
0:06:56a for what we intend to to for example for the transcription
0:07:00and we have to consider a
0:07:02this
0:07:03because if it's a five it if it's an over of one fundamental to
0:07:07frequency you which have to modified in accordance with the fundamental and not according to the musical scale
0:07:13the the um if it would be and uh signal toll
0:07:17on and that and then other um
0:07:19part of the of the um skater
0:07:23so yeah in this leads to
0:07:25some kind of ambiguity when you get one told in just look
0:07:29um
0:07:29look at it on its own
0:07:31so that's why we have to um get some addition interpretation
0:07:35to find out whether it's uh
0:07:37fundamental frequency
0:07:39are if it's an overtone or uh a harmonic component of uh
0:07:42a more complex sound structure
0:07:47this is just an example
0:07:48of um how in pulse of this key is uh
0:07:52can match the
0:07:54um how morning
0:07:56and um just one example of uh to pick out
0:08:00could be the number five which is
0:08:02five times the
0:08:03uh a fundamental frequency of one to alone
0:08:06could be also
0:08:08um and now that in which is a major it
0:08:10a parts
0:08:11am
0:08:12in this in this diagram that the at might of of tapes and not taking into account so
0:08:17so we you can have
0:08:18um
0:08:19some ambiguities between
0:08:21a
0:08:21second and also the for um
0:08:24harmonic
0:08:25which would then be just put of
0:08:27op tapes and so on
0:08:28so that's why you get
0:08:30kind of an be treaty with um over to ones
0:08:33and
0:08:34music scores
0:08:37and that's why this
0:08:38second enhancement at been added to model clock
0:08:41which is so that hmmm
0:08:42which is called harmonic locking
0:08:45so um is a set before the to estimated fundamental as
0:08:49have to be mapped directory
0:08:51and then you have to um decide for a the components
0:08:55if it's a
0:08:57um
0:08:57oh but
0:08:58then it has to be lot to the
0:09:01transposition of its fundamental
0:09:04just an the processing yeah
0:09:06you decide um for money told if it's
0:09:09um not
0:09:10to another
0:09:11frequency of bits
0:09:12as be transposed on it's all
0:09:14and by this which
0:09:16yeah um just on either it transposition
0:09:18of them G D node based mapping which is done for the fundamental frequency
0:09:23yeah are it
0:09:24um
0:09:25done a transpose according to the to its fundamental
0:09:28if it
0:09:29if it's locked as up apply
0:09:31uh indication here
0:09:33it's not
0:09:34non locked
0:09:35then it's is locked in to test to be looked to the fundamental frequency and its map
0:09:42now we come to the um listening test
0:09:45methodology
0:09:47it's a to
0:09:48a difficult task if you to um
0:09:50this kind of transcription
0:09:52so we uh selected
0:09:55me D samples
0:09:56which we first at in the original domain
0:09:59and we did
0:10:00me transcription to obtain
0:10:03um five which we could then yeah put into the test
0:10:06so these but it is uh transcribe
0:10:09um
0:10:10reference signal which is done by T
0:10:13and then uh transfer to a bay five
0:10:16and on the other hand hand we get the original wave file
0:10:19and be processed it um
0:10:21to to with the transcription and then we can compare the to
0:10:25and we have
0:10:26different versions
0:10:28three versions of of the more folk and one reference
0:10:32transcription
0:10:33system
0:10:34job
0:10:35also present
0:10:36yeah
0:10:37um there's one commercial system available which is the direct note excess in the middle line at each up by
0:10:43a mini
0:10:45and this is available since autumn
0:10:47when a two thousand and nine
0:10:49and it also allows
0:10:50selective editing eating of polyphonic music
0:10:53but it performs a multi-pass pass analysis
0:10:56and it doesn't automatic decomposition into notes and um
0:11:00a heuristic classification rule
0:11:03but it also can be used to perform this scheme mode
0:11:06clean key mode conversion
0:11:07and so that's why we also try to um compare our
0:11:11approach with this one
0:11:15these are the the um items we used
0:11:18um problem with to P a project we use some different signals
0:11:23and
0:11:23different midi files
0:11:24is the set before
0:11:26trash shown here
0:11:27and this B
0:11:28try to get some variety of more complex
0:11:31orchestral music
0:11:33and some more um solo instrument
0:11:36hearts
0:11:37so cup quite a mixture of
0:11:39complexity of of
0:11:41um content
0:11:44these were the results of "'em"
0:11:46so called mass for a test that we don't want to go too much into detail
0:11:50in this test we have a a um
0:11:52normally you hidden reference
0:11:54is
0:11:55um
0:11:56i don't you know to by one
0:11:57we have um
0:11:59uh
0:11:59so quite reference which is just uh
0:12:02low-pass pass filtered signal which just numb do you know to by number two
0:12:06and we have the more work the origin and what block
0:12:09the more rock um is number three what work with the harmonic locking is for
0:12:14and mark work with the a harmonic locking and D um
0:12:17envelope shaping
0:12:19it's
0:12:19five and six is the the N A you the rate um
0:12:24this system be compared to
0:12:26um but not first we want to see how um
0:12:28oh enhancements work in T V C
0:12:31um um for this one example B that um a difference between four and five this means the addition of
0:12:37envelope shaping
0:12:39what's see a for the key tar um
0:12:41the key top once it's a much clearer a
0:12:43and so
0:12:44somewhat preferred by
0:12:46the listen
0:12:48and
0:12:48um
0:12:49here i um we have the difference but a a difference between
0:12:53uh the original remote walk and that mote work with someone it locking
0:12:58with the which
0:12:59um delivered but the for a no signal
0:13:03we also see that uh in in most of the cases
0:13:07um the D N A
0:13:08perform better
0:13:11and
0:13:13um
0:13:14i can make first summer right these sides here that
0:13:17the harmonic locking really improve the term the
0:13:20the envelope shaping also improve the trends in
0:13:23parts
0:13:25but you know was rated better for five
0:13:27out of seven items
0:13:29and um the rating could cover different aspects
0:13:33of
0:13:34this sound change which but was performed here
0:13:36like a natural sounding artifacts on melody or car transcription errors
0:13:41but tampa the preservation or pages
0:13:44um and it is nice in many reported to trend for transposition
0:13:49error us
0:13:50um
0:13:51in the in eighty
0:13:52and
0:13:53uh tampa problems from what talk
0:13:56so we made an additional test which was the formant preference test
0:14:01when these main quality aspects to find out more if this is really the case
0:14:07for this
0:14:08um yet twelve expert listeners
0:14:11mean post technical a musical background
0:14:13and we had now with them the extended model talk
0:14:16and compared it to the N a
0:14:19and
0:14:20um we also found out in the first test
0:14:22that is unknown mailer T which is a
0:14:24a transcribed version of the original the um me D
0:14:28is
0:14:29somehow hard to
0:14:30um to great for for people so we did it the other way around we did the transcription with me
0:14:36D integral tries
0:14:37transcribe it back to the original um score
0:14:41um with a right for with our egg
0:14:44for for signals
0:14:45which are shown yeah also orchestra and some mixture and P know
0:14:50and
0:14:50now we we put this
0:14:52five in the in the preference test
0:14:56and and the outcome was
0:14:59quite clear in the sense
0:15:00is the people that
0:15:01reported before in
0:15:03that's there was a quite the uh preference for
0:15:07uh the melody transcription for more walk which is shown yeah what focus all that the it left side
0:15:14and in these are the results for a for the a transcription music transcription
0:15:19and he uh are the results for time of the
0:15:22which uh
0:15:23show the clear preference for for the D N A
0:15:26i can play an example
0:15:30a can play all the five
0:15:31to get a
0:15:33yeah and short versions in the all that is they are shown here
0:15:36first your reaching a
0:15:46a
0:15:47a
0:15:47a
0:15:49a
0:15:51a
0:15:54um
0:15:55i
0:15:58a
0:15:59i
0:16:02um
0:16:04a
0:16:06a
0:16:07a
0:16:11um
0:16:12a
0:16:15a
0:16:19um
0:16:23i think the some problems in the
0:16:25in the music transcriptions in it in a a
0:16:28a number uh a pressing this listening conditions yeah
0:16:34so um not example is this is the piano no used to have time i play also the
0:16:40um
0:16:41this device here
0:16:45uh um uh uh uh uh uh uh
0:17:05uh um uh uh uh uh uh uh
0:17:25uh um oh uh uh uh
0:17:44oh
0:17:44uh um uh uh uh uh uh um uh uh
0:18:06"'kay" so um just a short summary
0:18:09um
0:18:10we have down now the what work for selective trends
0:18:13position of pitch
0:18:15which is capable of real-time processing
0:18:18and which can put use
0:18:19trends ends
0:18:20and
0:18:21uh also improves the time the by how money clocking
0:18:24and it's
0:18:25um
0:18:26referred over the commercial system in the
0:18:28in terms of transposition position of the melody T but it you know a the um
0:18:33prefer
0:18:34in time proposed preservation
0:18:37so and in maybe in general
0:18:39the
0:18:39the both of the systems were and the range from fair to good so there's room for improvement
0:18:45but the already
0:18:46a somewhat use of yeah
0:18:48the system thank you
0:18:57we questions
0:19:00one question i had as willis was trained listeners was goal years or where there
0:19:05um um for the for the preference test it's it were a of people who were also yeah i had
0:19:10some music background to stressed
0:19:12um quite important for
0:19:14this uh a time to the um grading let's say
0:19:18but they weren't signal processors are not special to a golden yes no
0:19:24and you questions
0:19:27one harder question
0:19:29well would you like to do me if you had all the signal processing power and all smart you could
0:19:33do
0:19:33what would you like to do to
0:19:35oh problem
0:19:36um i think
0:19:38can be
0:19:39that they
0:19:40can be made a bit more complicated if you
0:19:42you can imagine that you have total ones which are
0:19:45a mixture of
0:19:46uh maybe harmonics and find a mentor the frequency and so on
0:19:50a a at different harmonics of different tones which match and the on the grid
0:19:54so then of course the decomposition is much more complicated
0:19:58and it and of course for this you would need to quite a more um up station um so i
0:20:04think this would be one of the ways of a a a a a a further improvement could be achieved
0:20:09a because the see
0:20:11anything else
0:20:14thank
0:20:15okay can use a microphone
0:20:19on your bullet point up there about a reproduction of transients improved by lpc based envelope shaping could you comment
0:20:26on that what that is yeah the it we use the lpc parameters and um be obtained in the frequency
0:20:30domain and apply this is a time envelope in the time domain
0:20:35this is what i showed with the with the rates blocks and uh
0:20:38when overview diagram
0:20:43thank you