0:00:16 | thank you very much |
---|
0:00:17 | and a um it's you also see yeah i'm not the first author of this paper |
---|
0:00:22 | but in our case i must say for to T such that this cannot be here |
---|
0:00:26 | to date because see his family has been in by a second door to two weeks ago so he can't |
---|
0:00:31 | be here |
---|
0:00:33 | um |
---|
0:00:33 | and i'm working at the international audio lab or to recent a in which is that |
---|
0:00:37 | joint institution |
---|
0:00:39 | of |
---|
0:00:39 | um the university of a you know back |
---|
0:00:42 | and the problem of a institute for integrated circuits |
---|
0:00:48 | what's the motivation for the work i'm going to present here |
---|
0:00:51 | is that you often do in music production use a lie on |
---|
0:00:55 | mixing prerecorded material |
---|
0:00:58 | samples |
---|
0:00:59 | and um you also need to at that these samples frequently two |
---|
0:01:04 | different to musical context |
---|
0:01:06 | then |
---|
0:01:07 | the context they were recorded in |
---|
0:01:09 | so in in some cases you might need to key mode conversion |
---|
0:01:13 | this means major to minor or vice versa |
---|
0:01:16 | and |
---|
0:01:17 | they the a |
---|
0:01:18 | algorithm for four |
---|
0:01:20 | enabling this task |
---|
0:01:22 | as been presented |
---|
0:01:23 | um in |
---|
0:01:24 | previous conferences |
---|
0:01:26 | this this is called mode clock modulation vocoder |
---|
0:01:30 | um it's some what's you to put to this task |
---|
0:01:33 | but um we also found out that device |
---|
0:01:36 | special enhancements necessary |
---|
0:01:38 | in order to address |
---|
0:01:40 | special requirements for this application |
---|
0:01:46 | so i first want to um give a short overview on this model walk |
---|
0:01:51 | accuracy |
---|
0:01:52 | which performs the single pass and is |
---|
0:01:55 | in a block wise processing |
---|
0:01:57 | which is shown in in a block diagrams here |
---|
0:02:01 | it does |
---|
0:02:01 | first |
---|
0:02:02 | uh signal adaptive band-pass filtering |
---|
0:02:05 | which is aligned with spectral center centres of gravity |
---|
0:02:09 | means we first of the um if T analysis |
---|
0:02:15 | yeah |
---|
0:02:15 | a dft analysis |
---|
0:02:17 | and from the dft spectra |
---|
0:02:19 | the um centres of |
---|
0:02:21 | gravity |
---|
0:02:22 | in perceptually adjusted then |
---|
0:02:25 | uh determined in the band it's uh just it's so they are this decomposition is flexible |
---|
0:02:31 | so from these centres |
---|
0:02:32 | center frequencies |
---|
0:02:34 | um and the around centre center frequencies to construct a bandpass filters |
---|
0:02:38 | and |
---|
0:02:39 | i in the yeah |
---|
0:02:40 | done in the frequency domain |
---|
0:02:41 | and in inverse |
---|
0:02:43 | uh |
---|
0:02:44 | dft T |
---|
0:02:45 | get back for each bandpass signal a |
---|
0:02:47 | to a time domain signal |
---|
0:02:49 | and um this time domain signal |
---|
0:02:52 | bandpass signal |
---|
0:02:53 | is then and lies with and am and fm |
---|
0:02:56 | and that this this |
---|
0:02:57 | so you basically you have the carrier frequency which corresponds to the centre of gravity of this special frequency reach |
---|
0:03:04 | and |
---|
0:03:05 | a uh the F signal which gives the um |
---|
0:03:08 | instantaneous frequency offset |
---|
0:03:11 | quite um relative to this carrier of frequency |
---|
0:03:14 | and you get um get the instantaneous make me to do M P chewed in the A M component |
---|
0:03:20 | and then you can close to the signal in this modulation domain |
---|
0:03:24 | for example you can change the carrier frequencies |
---|
0:03:27 | and still maintain that uh fine temporal structure |
---|
0:03:31 | um by keeping the A M and the F |
---|
0:03:34 | it's |
---|
0:03:35 | um |
---|
0:03:35 | in the synthesis |
---|
0:03:36 | you have to combine the a if M component with the maybe mode you modified |
---|
0:03:42 | um carrier frequency |
---|
0:03:43 | you have to |
---|
0:03:44 | somehow one the different |
---|
0:03:46 | um components from button block to the next block because it's tempered blocks sets it before |
---|
0:03:51 | um |
---|
0:03:53 | or just and yeah |
---|
0:03:54 | and |
---|
0:03:55 | um you to uh |
---|
0:03:56 | and overlap it |
---|
0:03:57 | processing of the am and the F M |
---|
0:04:00 | or frequent instantaneous frequency |
---|
0:04:02 | signals in order to get continuous |
---|
0:04:05 | um parameter |
---|
0:04:06 | and then you two |
---|
0:04:07 | the synthesis |
---|
0:04:08 | and at up |
---|
0:04:10 | um all the sickness from the different bands you had |
---|
0:04:13 | decompose the signal into four |
---|
0:04:17 | so this is the basic structure of the modulation |
---|
0:04:19 | well coder |
---|
0:04:20 | but do you to the structure with the relatively |
---|
0:04:23 | long blocks in the dft analysis |
---|
0:04:26 | you still the um miss some of the um |
---|
0:04:30 | signal |
---|
0:04:31 | uh |
---|
0:04:32 | characteristics by this processing |
---|
0:04:35 | um and this is |
---|
0:04:36 | one of the parts we we address by the enhancement |
---|
0:04:40 | and this |
---|
0:04:40 | the first of these enhancement was the so-called envelope shaping |
---|
0:04:44 | i means |
---|
0:04:45 | temporal envelopes of with in |
---|
0:04:47 | the uh |
---|
0:04:49 | dft blocks |
---|
0:04:50 | might got get um lost or distorted |
---|
0:04:54 | because you um can lose the |
---|
0:04:57 | um this uh to to dispersed and you can |
---|
0:05:00 | whose face |
---|
0:05:01 | a relations between the different tone |
---|
0:05:04 | and |
---|
0:05:05 | this would could cost the temporal smearing of transients |
---|
0:05:08 | and in this case it's better to use |
---|
0:05:11 | then explicit |
---|
0:05:12 | a temporal envelope |
---|
0:05:14 | and you get access to the parameters of these |
---|
0:05:16 | um of this temporal in below |
---|
0:05:18 | but doing an lpc analysis in the frequency domain |
---|
0:05:21 | because correlation in the frequency domain |
---|
0:05:24 | corresponds to multiplication in the time domain |
---|
0:05:27 | this means with it at coefficient |
---|
0:05:30 | you get from from lpc analysis |
---|
0:05:33 | along the frequency axis |
---|
0:05:34 | you get parameters |
---|
0:05:35 | you can use for a um getting |
---|
0:05:38 | and |
---|
0:05:39 | time function |
---|
0:05:40 | you could could say at time response |
---|
0:05:42 | yeah but you can |
---|
0:05:43 | then it at the end |
---|
0:05:45 | a might apply to to get back the temporal and middle |
---|
0:05:49 | this what is done |
---|
0:05:50 | with this |
---|
0:05:51 | read looks |
---|
0:05:52 | these are the um |
---|
0:05:53 | enhancements what the |
---|
0:05:55 | envelopes |
---|
0:05:59 | um |
---|
0:06:00 | in other |
---|
0:06:01 | enhancement |
---|
0:06:02 | the enhancement which is necessary |
---|
0:06:04 | once you um start modifying spectra components |
---|
0:06:08 | is |
---|
0:06:09 | that you have to take into account |
---|
0:06:10 | that |
---|
0:06:11 | um |
---|
0:06:12 | music a sounds are not normally consisting of a fundamental into a lot of harmonics the tone |
---|
0:06:18 | and um you should keep this in mind when you modify frequencies |
---|
0:06:24 | so the overtones tones are um quasi harmonic on uh the yeah frequency scale |
---|
0:06:31 | which are you normally integer multiples of the fundamental frequency on you team integer multiples |
---|
0:06:38 | um on the other hand to musical intervals are based on a logarithmic scale |
---|
0:06:43 | and um now it's |
---|
0:06:45 | a question |
---|
0:06:46 | when you modify frequencies in which way you should modify them |
---|
0:06:51 | um or and of course we want to modify them in the the based way for for the |
---|
0:06:56 | a for what we intend to to for example for the transcription |
---|
0:07:00 | and we have to consider a |
---|
0:07:02 | this |
---|
0:07:03 | because if it's a five it if it's an over of one fundamental to |
---|
0:07:07 | frequency you which have to modified in accordance with the fundamental and not according to the musical scale |
---|
0:07:13 | the the um if it would be and uh signal toll |
---|
0:07:17 | on and that and then other um |
---|
0:07:19 | part of the of the um skater |
---|
0:07:23 | so yeah in this leads to |
---|
0:07:25 | some kind of ambiguity when you get one told in just look |
---|
0:07:29 | um |
---|
0:07:29 | look at it on its own |
---|
0:07:31 | so that's why we have to um get some addition interpretation |
---|
0:07:35 | to find out whether it's uh |
---|
0:07:37 | fundamental frequency |
---|
0:07:39 | are if it's an overtone or uh a harmonic component of uh |
---|
0:07:42 | a more complex sound structure |
---|
0:07:47 | this is just an example |
---|
0:07:48 | of um how in pulse of this key is uh |
---|
0:07:52 | can match the |
---|
0:07:54 | um how morning |
---|
0:07:56 | and um just one example of uh to pick out |
---|
0:08:00 | could be the number five which is |
---|
0:08:02 | five times the |
---|
0:08:03 | uh a fundamental frequency of one to alone |
---|
0:08:06 | could be also |
---|
0:08:08 | um and now that in which is a major it |
---|
0:08:10 | a parts |
---|
0:08:11 | am |
---|
0:08:12 | in this in this diagram that the at might of of tapes and not taking into account so |
---|
0:08:17 | so we you can have |
---|
0:08:18 | um |
---|
0:08:19 | some ambiguities between |
---|
0:08:21 | a |
---|
0:08:21 | second and also the for um |
---|
0:08:24 | harmonic |
---|
0:08:25 | which would then be just put of |
---|
0:08:27 | op tapes and so on |
---|
0:08:28 | so that's why you get |
---|
0:08:30 | kind of an be treaty with um over to ones |
---|
0:08:33 | and |
---|
0:08:34 | music scores |
---|
0:08:37 | and that's why this |
---|
0:08:38 | second enhancement at been added to model clock |
---|
0:08:41 | which is so that hmmm |
---|
0:08:42 | which is called harmonic locking |
---|
0:08:45 | so um is a set before the to estimated fundamental as |
---|
0:08:49 | have to be mapped directory |
---|
0:08:51 | and then you have to um decide for a the components |
---|
0:08:55 | if it's a |
---|
0:08:57 | um |
---|
0:08:57 | oh but |
---|
0:08:58 | then it has to be lot to the |
---|
0:09:01 | transposition of its fundamental |
---|
0:09:04 | just an the processing yeah |
---|
0:09:06 | you decide um for money told if it's |
---|
0:09:09 | um not |
---|
0:09:10 | to another |
---|
0:09:11 | frequency of bits |
---|
0:09:12 | as be transposed on it's all |
---|
0:09:14 | and by this which |
---|
0:09:16 | yeah um just on either it transposition |
---|
0:09:18 | of them G D node based mapping which is done for the fundamental frequency |
---|
0:09:23 | yeah are it |
---|
0:09:24 | um |
---|
0:09:25 | done a transpose according to the to its fundamental |
---|
0:09:28 | if it |
---|
0:09:29 | if it's locked as up apply |
---|
0:09:31 | uh indication here |
---|
0:09:33 | it's not |
---|
0:09:34 | non locked |
---|
0:09:35 | then it's is locked in to test to be looked to the fundamental frequency and its map |
---|
0:09:42 | now we come to the um listening test |
---|
0:09:45 | methodology |
---|
0:09:47 | it's a to |
---|
0:09:48 | a difficult task if you to um |
---|
0:09:50 | this kind of transcription |
---|
0:09:52 | so we uh selected |
---|
0:09:55 | me D samples |
---|
0:09:56 | which we first at in the original domain |
---|
0:09:59 | and we did |
---|
0:10:00 | me transcription to obtain |
---|
0:10:03 | um five which we could then yeah put into the test |
---|
0:10:06 | so these but it is uh transcribe |
---|
0:10:09 | um |
---|
0:10:10 | reference signal which is done by T |
---|
0:10:13 | and then uh transfer to a bay five |
---|
0:10:16 | and on the other hand hand we get the original wave file |
---|
0:10:19 | and be processed it um |
---|
0:10:21 | to to with the transcription and then we can compare the to |
---|
0:10:25 | and we have |
---|
0:10:26 | different versions |
---|
0:10:28 | three versions of of the more folk and one reference |
---|
0:10:32 | transcription |
---|
0:10:33 | system |
---|
0:10:34 | job |
---|
0:10:35 | also present |
---|
0:10:36 | yeah |
---|
0:10:37 | um there's one commercial system available which is the direct note excess in the middle line at each up by |
---|
0:10:43 | a mini |
---|
0:10:45 | and this is available since autumn |
---|
0:10:47 | when a two thousand and nine |
---|
0:10:49 | and it also allows |
---|
0:10:50 | selective editing eating of polyphonic music |
---|
0:10:53 | but it performs a multi-pass pass analysis |
---|
0:10:56 | and it doesn't automatic decomposition into notes and um |
---|
0:11:00 | a heuristic classification rule |
---|
0:11:03 | but it also can be used to perform this scheme mode |
---|
0:11:06 | clean key mode conversion |
---|
0:11:07 | and so that's why we also try to um compare our |
---|
0:11:11 | approach with this one |
---|
0:11:15 | these are the the um items we used |
---|
0:11:18 | um problem with to P a project we use some different signals |
---|
0:11:23 | and |
---|
0:11:23 | different midi files |
---|
0:11:24 | is the set before |
---|
0:11:26 | trash shown here |
---|
0:11:27 | and this B |
---|
0:11:28 | try to get some variety of more complex |
---|
0:11:31 | orchestral music |
---|
0:11:33 | and some more um solo instrument |
---|
0:11:36 | hearts |
---|
0:11:37 | so cup quite a mixture of |
---|
0:11:39 | complexity of of |
---|
0:11:41 | um content |
---|
0:11:44 | these were the results of "'em" |
---|
0:11:46 | so called mass for a test that we don't want to go too much into detail |
---|
0:11:50 | in this test we have a a um |
---|
0:11:52 | normally you hidden reference |
---|
0:11:54 | is |
---|
0:11:55 | um |
---|
0:11:56 | i don't you know to by one |
---|
0:11:57 | we have um |
---|
0:11:59 | uh |
---|
0:11:59 | so quite reference which is just uh |
---|
0:12:02 | low-pass pass filtered signal which just numb do you know to by number two |
---|
0:12:06 | and we have the more work the origin and what block |
---|
0:12:09 | the more rock um is number three what work with the harmonic locking is for |
---|
0:12:14 | and mark work with the a harmonic locking and D um |
---|
0:12:17 | envelope shaping |
---|
0:12:19 | it's |
---|
0:12:19 | five and six is the the N A you the rate um |
---|
0:12:24 | this system be compared to |
---|
0:12:26 | um but not first we want to see how um |
---|
0:12:28 | oh enhancements work in T V C |
---|
0:12:31 | um um for this one example B that um a difference between four and five this means the addition of |
---|
0:12:37 | envelope shaping |
---|
0:12:39 | what's see a for the key tar um |
---|
0:12:41 | the key top once it's a much clearer a |
---|
0:12:43 | and so |
---|
0:12:44 | somewhat preferred by |
---|
0:12:46 | the listen |
---|
0:12:48 | and |
---|
0:12:48 | um |
---|
0:12:49 | here i um we have the difference but a a difference between |
---|
0:12:53 | uh the original remote walk and that mote work with someone it locking |
---|
0:12:58 | with the which |
---|
0:12:59 | um delivered but the for a no signal |
---|
0:13:03 | we also see that uh in in most of the cases |
---|
0:13:07 | um the D N A |
---|
0:13:08 | perform better |
---|
0:13:11 | and |
---|
0:13:13 | um |
---|
0:13:14 | i can make first summer right these sides here that |
---|
0:13:17 | the harmonic locking really improve the term the |
---|
0:13:20 | the envelope shaping also improve the trends in |
---|
0:13:23 | parts |
---|
0:13:25 | but you know was rated better for five |
---|
0:13:27 | out of seven items |
---|
0:13:29 | and um the rating could cover different aspects |
---|
0:13:33 | of |
---|
0:13:34 | this sound change which but was performed here |
---|
0:13:36 | like a natural sounding artifacts on melody or car transcription errors |
---|
0:13:41 | but tampa the preservation or pages |
---|
0:13:44 | um and it is nice in many reported to trend for transposition |
---|
0:13:49 | error us |
---|
0:13:50 | um |
---|
0:13:51 | in the in eighty |
---|
0:13:52 | and |
---|
0:13:53 | uh tampa problems from what talk |
---|
0:13:56 | so we made an additional test which was the formant preference test |
---|
0:14:01 | when these main quality aspects to find out more if this is really the case |
---|
0:14:07 | for this |
---|
0:14:08 | um yet twelve expert listeners |
---|
0:14:11 | mean post technical a musical background |
---|
0:14:13 | and we had now with them the extended model talk |
---|
0:14:16 | and compared it to the N a |
---|
0:14:19 | and |
---|
0:14:20 | um we also found out in the first test |
---|
0:14:22 | that is unknown mailer T which is a |
---|
0:14:24 | a transcribed version of the original the um me D |
---|
0:14:28 | is |
---|
0:14:29 | somehow hard to |
---|
0:14:30 | um to great for for people so we did it the other way around we did the transcription with me |
---|
0:14:36 | D integral tries |
---|
0:14:37 | transcribe it back to the original um score |
---|
0:14:41 | um with a right for with our egg |
---|
0:14:44 | for for signals |
---|
0:14:45 | which are shown yeah also orchestra and some mixture and P know |
---|
0:14:50 | and |
---|
0:14:50 | now we we put this |
---|
0:14:52 | five in the in the preference test |
---|
0:14:56 | and and the outcome was |
---|
0:14:59 | quite clear in the sense |
---|
0:15:00 | is the people that |
---|
0:15:01 | reported before in |
---|
0:15:03 | that's there was a quite the uh preference for |
---|
0:15:07 | uh the melody transcription for more walk which is shown yeah what focus all that the it left side |
---|
0:15:14 | and in these are the results for a for the a transcription music transcription |
---|
0:15:19 | and he uh are the results for time of the |
---|
0:15:22 | which uh |
---|
0:15:23 | show the clear preference for for the D N A |
---|
0:15:26 | i can play an example |
---|
0:15:30 | a can play all the five |
---|
0:15:31 | to get a |
---|
0:15:33 | yeah and short versions in the all that is they are shown here |
---|
0:15:36 | first your reaching a |
---|
0:15:46 | a |
---|
0:15:47 | a |
---|
0:15:47 | a |
---|
0:15:49 | a |
---|
0:15:51 | a |
---|
0:15:54 | um |
---|
0:15:55 | i |
---|
0:15:58 | a |
---|
0:15:59 | i |
---|
0:16:02 | um |
---|
0:16:04 | a |
---|
0:16:06 | a |
---|
0:16:07 | a |
---|
0:16:11 | um |
---|
0:16:12 | a |
---|
0:16:15 | a |
---|
0:16:19 | um |
---|
0:16:23 | i think the some problems in the |
---|
0:16:25 | in the music transcriptions in it in a a |
---|
0:16:28 | a number uh a pressing this listening conditions yeah |
---|
0:16:34 | so um not example is this is the piano no used to have time i play also the |
---|
0:16:40 | um |
---|
0:16:41 | this device here |
---|
0:16:45 | uh um uh uh uh uh uh uh |
---|
0:17:05 | uh um uh uh uh uh uh uh |
---|
0:17:25 | uh um oh uh uh uh |
---|
0:17:44 | oh |
---|
0:17:44 | uh um uh uh uh uh uh um uh uh |
---|
0:18:06 | "'kay" so um just a short summary |
---|
0:18:09 | um |
---|
0:18:10 | we have down now the what work for selective trends |
---|
0:18:13 | position of pitch |
---|
0:18:15 | which is capable of real-time processing |
---|
0:18:18 | and which can put use |
---|
0:18:19 | trends ends |
---|
0:18:20 | and |
---|
0:18:21 | uh also improves the time the by how money clocking |
---|
0:18:24 | and it's |
---|
0:18:25 | um |
---|
0:18:26 | referred over the commercial system in the |
---|
0:18:28 | in terms of transposition position of the melody T but it you know a the um |
---|
0:18:33 | prefer |
---|
0:18:34 | in time proposed preservation |
---|
0:18:37 | so and in maybe in general |
---|
0:18:39 | the |
---|
0:18:39 | the both of the systems were and the range from fair to good so there's room for improvement |
---|
0:18:45 | but the already |
---|
0:18:46 | a somewhat use of yeah |
---|
0:18:48 | the system thank you |
---|
0:18:57 | we questions |
---|
0:19:00 | one question i had as willis was trained listeners was goal years or where there |
---|
0:19:05 | um um for the for the preference test it's it were a of people who were also yeah i had |
---|
0:19:10 | some music background to stressed |
---|
0:19:12 | um quite important for |
---|
0:19:14 | this uh a time to the um grading let's say |
---|
0:19:18 | but they weren't signal processors are not special to a golden yes no |
---|
0:19:24 | and you questions |
---|
0:19:27 | one harder question |
---|
0:19:29 | well would you like to do me if you had all the signal processing power and all smart you could |
---|
0:19:33 | do |
---|
0:19:33 | what would you like to do to |
---|
0:19:35 | oh problem |
---|
0:19:36 | um i think |
---|
0:19:38 | can be |
---|
0:19:39 | that they |
---|
0:19:40 | can be made a bit more complicated if you |
---|
0:19:42 | you can imagine that you have total ones which are |
---|
0:19:45 | a mixture of |
---|
0:19:46 | uh maybe harmonics and find a mentor the frequency and so on |
---|
0:19:50 | a a at different harmonics of different tones which match and the on the grid |
---|
0:19:54 | so then of course the decomposition is much more complicated |
---|
0:19:58 | and it and of course for this you would need to quite a more um up station um so i |
---|
0:20:04 | think this would be one of the ways of a a a a a a further improvement could be achieved |
---|
0:20:09 | a because the see |
---|
0:20:11 | anything else |
---|
0:20:14 | thank |
---|
0:20:15 | okay can use a microphone |
---|
0:20:19 | on your bullet point up there about a reproduction of transients improved by lpc based envelope shaping could you comment |
---|
0:20:26 | on that what that is yeah the it we use the lpc parameters and um be obtained in the frequency |
---|
0:20:30 | domain and apply this is a time envelope in the time domain |
---|
0:20:35 | this is what i showed with the with the rates blocks and uh |
---|
0:20:38 | when overview diagram |
---|
0:20:43 | thank you |
---|