0:00:15okay the um the next yeah are shown
0:00:19and and you mentioned a structure
0:00:23i one
0:00:25you to long way
0:00:29monica
0:00:30i i being here
0:00:33so i'm to the it was to just to the work and we present a joint work
0:00:36as is prime the will of most wouldn't a of
0:00:39how much whose lose you didn't know was don't you
0:00:42you couldn't be here and if a used to but a written by them
0:00:48so i don't as a very that was means
0:00:51so i try and begin with a fairly broad introduction
0:00:53i don't wanna apologise in advance
0:00:55for not being able to cover all but details
0:00:57that
0:00:58process
0:00:59that something which i think it's more but if
0:01:01leave but even understanding of at least what we like to do and why it's important
0:01:05rather than to try and
0:01:06of the but is
0:01:07is it it is to do with you
0:01:10we also had a as a to P is as to be run this has just recently of your
0:01:15you B C but my i
0:01:17provide
0:01:19i begin can that motivation by don't why is an important problem and talk about are only
0:01:23non coding not an in particular
0:01:25and i i structure dance
0:01:26the not talk about what that's one is a
0:01:28prediction
0:01:29both for single and multiple sequence but that
0:01:32a technique is a what easy but method
0:01:34and that's with analogy with that what it comes send
0:01:36so i present a um at a double for
0:01:39i don't high level
0:01:40and show how this is an it the probabilistic method for decoding of of much lower on a common
0:01:45and a very strong analogy would that would be coding in digital communications put in the way you look at
0:01:50the problem
0:01:51and the tools we saw all and sort of the stage
0:01:54all literature also this point at which we don't that
0:01:58it's quite
0:01:59and be in the business
0:02:01a present experiment results in this process
0:02:03present present how people form as compared to what it out and
0:02:06which are from the same frame
0:02:08and finally and but some
0:02:09right
0:02:11so
0:02:12i think everybody's family would be an and the double helix
0:02:15this is the famous discovery be what's and right
0:02:18what good
0:02:19but it discovered that the in a happen the in is found in this W can back to
0:02:24we have a
0:02:25i don't a complementary based pairs
0:02:28a with each other a process he likes
0:02:31and
0:02:32on knee
0:02:33is
0:02:34very similar to the any
0:02:35except that
0:02:37hi mine
0:02:38is replaced by you
0:02:40or what i was as
0:02:41do you think about audrey
0:02:43as a leading a in one Q
0:02:44these
0:02:45you imaging
0:02:46is a
0:02:47strong a what i wants
0:02:49is that what they can look at the mall
0:02:51as compared to
0:02:52these point which are are stored row
0:02:54to to to to look at a mall
0:02:56and a trend is exponential
0:02:59in the free energy change
0:03:00so these points here
0:03:02are a much harder to break than the one that way
0:03:06i
0:03:07so
0:03:08this
0:03:08is what was the for to a me structure and usually the a are going from the five point to
0:03:13three by man
0:03:14so you listen this
0:03:15i'm i'm structure
0:03:16as a sequence of the autonomy
0:03:18much like a dog of the in human humans you know and the sequence of you don't
0:03:21you will have a similar by me structure of that i as a sequence of be dies
0:03:25along the model
0:03:28i like the in a or what happens in a don't is that it is common for this money to
0:03:31to for the one so
0:03:33so you typically have a
0:03:35single molecule rather than to document model use like to put the point that
0:03:38forming a variety of
0:03:39struck
0:03:41or the longest amount of time people believe
0:03:43that
0:03:44a on so just
0:03:45one function
0:03:46which will being a transient copy of the information in
0:03:49this is a a a simple dot mine it
0:03:52so you you know the size and the
0:03:53new yes
0:03:54a of it gets we
0:03:56transcribe right
0:03:57and do
0:03:58a missing and your in which comes out of the side of button
0:04:02and then it relies
0:04:03protein synthesis
0:04:04in that i was on this is you meant factory for producing proteins
0:04:08and all of that and you know
0:04:11so the in belief for that information for or you that in and fashion from the a or any on
0:04:16routine
0:04:18more recently what is the most how a
0:04:20that you never realise that are a bit a very active role
0:04:23in
0:04:24but
0:04:26i have realised that these
0:04:28additional types of are nice
0:04:30and a characterization of these on an is by what did do not do
0:04:33not but what
0:04:34so they do not hold for me
0:04:36that for to was non coding are nice
0:04:38because there are providing a function
0:04:40without really being translated into protein so they not coding and there all
0:04:44and that in these numbers being really discover
0:04:48or do not but right of the C O a what a different point in time for a variety of
0:04:52these
0:04:53and this is showing but
0:04:54sort of a can you please bass
0:04:56can happen
0:04:57one of these is we must your on it this
0:05:01i think at less is its own splice same
0:05:04which is cutting out of a segment
0:05:06all the concentrate on an a
0:05:08from this to produce actual a money
0:05:10it operate in it and down plates
0:05:12we use probably and the that
0:05:15reactions actions in the cell
0:05:17so there's a idea of these functions that are nice
0:05:21in these rules
0:05:22a like the role that you're familiar with and protein instances
0:05:26with a what on coding is the one is is important
0:05:29in these second roles for a any
0:05:31the it is the structure was determines
0:05:33function
0:05:35and that's almost all in an G for almost all molecules
0:05:38including proteins
0:05:39and that's the reason why this problem would be
0:05:42is a grand challenge problem and science to be
0:05:46based from the relation of structure
0:05:48it's quite challenging
0:05:50it it a well X like
0:05:51it's a about three
0:05:52and this is difficult to do because to purify a sample and so the slice that
0:05:56and then there questions finally with of those conditions actually represent present physiological conditions
0:06:00the body
0:06:01in which the what it actually opera
0:06:04so what will be interested in this computational estimation of
0:06:07are and it's can be structure
0:06:09and if you can do this
0:06:11the kind of things you can on
0:06:12on so that's what is this function of these non putting on
0:06:15because once you know structure
0:06:17you have a we've proving what should be function
0:06:20you also have a ease of understanding this you know
0:06:22i do you know so it is a whole sequence based
0:06:25what one on is
0:06:27for the reasons that a lot lately to the structure is what
0:06:30the function
0:06:31so that we need to work so that you will have more eight sequences were from the same
0:06:35have the same structure of on the same function
0:06:37so you would like to be able to figure out which of these are in different out
0:06:41it's rather than comparing
0:06:43based on sequence a sequence you'd like to compared based
0:06:46structure
0:06:47finally a as the standing close you got like to users
0:06:50a a it's right i
0:06:51that's the the quality
0:06:53of such a prediction
0:06:54to be able to
0:06:55some sized it's
0:06:56rather than
0:06:57just test
0:06:58a right but if you know
0:07:02one thing which was
0:07:03spherical in at feast for i and six are any collection structure prediction as compared to prove
0:07:09and that are any have a
0:07:11our our mention to you that this primary structure consists
0:07:14or linear chain monte you which is laid out the we have from the pipeline to by and and this
0:07:18just rolled away way of for but was a fitting the space
0:07:21is it just the in monte Q
0:07:23this
0:07:23for one itself so the formation of these complementary be spare
0:07:28this is i'm not as to that of mind that in the time i
0:07:31we N betting with side side
0:07:33in the N a
0:07:35a that in this case time as the base that you're so so you have a you
0:07:38in addition you also have a
0:07:40do you pairs
0:07:41and R
0:07:43is also possible
0:07:45so this same one you you see from here by frame to prime
0:07:48is laid out over here
0:07:49and i don't
0:07:50i
0:07:51or for me and the sign to see what you can see that it's going around round i'm coming well
0:07:55the be an on coming back
0:07:58this is referred to as the primary structure which is the sequence
0:08:01and this is done but it is can make a mean this is what high throughput sequencing does
0:08:06what you interested then is predicting the second we structure once to predicted this the dot or take those three
0:08:10structure
0:08:11and you have structure becomes
0:08:13easier because you already know the interactions that the
0:08:17and and i think is this
0:08:18progression of interactions
0:08:20is that's simply strong
0:08:21already mentioned this is very strong ones here a one i one
0:08:25and over here that wants all much speaker
0:08:28a the trash that this little are given you because
0:08:31so there's progression of prediction
0:08:34more prediction provision of formation of structure
0:08:37also guys the mechanisms by but you pretty
0:08:39so our goal in this work will be the prediction of segments
0:08:45this is referred to only as fourteen of a not any you
0:08:49and
0:08:49that much greater variety of structure than the in and up
0:08:53is it an example of an non you
0:08:55this is
0:08:56uh are are is P
0:08:58and you will see that what you have
0:08:59all these various more piece
0:09:01which are made up of you D Cs
0:09:03and loops
0:09:04do the two
0:09:05types
0:09:06to is as bad or just to describe them
0:09:09so this you can not strong flat as a ladder or would here actually is the he likes
0:09:13and the way that a wasn't you'd
0:09:15structure
0:09:16and then you have these these two
0:09:18and i was applied for data
0:09:20what set of base pairings a lotta
0:09:22given the C
0:09:26now
0:09:27like
0:09:28you stop with just was and dynamics
0:09:30will be used only in the
0:09:32dominant
0:09:34in the room
0:09:35you can have a variety of different structures
0:09:38the property of a given structure
0:09:40or not do
0:09:41this one quantity
0:09:42the was meant constant
0:09:44which is
0:09:45but actually using
0:09:47with the free energy change
0:09:48you have a i have free energy
0:09:50but have just like a structure energy that that you want
0:09:54so the most likely structure comes the one which minimize is free energy
0:09:58and accordingly techniques for prediction of secondary structure
0:10:01a by
0:10:02coming up with models
0:10:03which predict this free energy structure
0:10:05what of the most efficient models tends to be one which is called the nearest neighbor model
0:10:09it looks at
0:10:11a based betting interactions sense to the one nearest neighbouring base pair
0:10:15and has a we have become the true free energy change
0:10:18in terms of this
0:10:19based pairing
0:10:20right
0:10:22as work also done
0:10:23i didn't to protest just to a and not in as lab
0:10:26who was a chemistry
0:10:27not for speeding and a work but this
0:10:29a model which is down use but i mean i
0:10:33so one can imagine various algorithms buttons now for predicting second structure
0:10:38by trying to minimize free energy
0:10:39and that's something which is been done prior to a well
0:10:43people and understand we programming
0:10:45the
0:10:46what want what make a were here is this method does a very much that some to do to be
0:10:50a
0:10:52that or what is the minimum free energy structure
0:10:54a set of possible these the dynamic program you do is an of the yeah
0:10:58oh for those of you who were go estimation of the coding you also know that that
0:11:03the or with the south and
0:11:05the P C G are a button
0:11:07does this in a soft sense
0:11:09and it is also the an it was in in the setting which is referred to and the chemistry the
0:11:13noise he of the partition function
0:11:15which because of the property
0:11:17uh
0:11:18a base their location i would be at a location in based G
0:11:21and have a lot of what about how this done but
0:11:24also compute a by using a dynamic program
0:11:29so that a a a a a a new techniques one is a hard decision would you can note like
0:11:32to do what is this
0:11:33think the structure which minimize free energy
0:11:36are there is a prediction of based pairing properties
0:11:39in the C
0:11:42i don't sell
0:11:44what is the connection the
0:11:48double for the
0:11:49yeah but i
0:11:50a the same day
0:11:52and then try for white
0:11:53a joint decoding
0:11:55you got to joint decoding exactly because
0:11:57a computationally expensive
0:11:59so you do
0:12:00approximate joint decoding by using it usually decoding and probably information from one sequence to that
0:12:06well
0:12:07in major don't out
0:12:08the same structure
0:12:11which probably a the same function
0:12:13are encoded as that
0:12:14sequence
0:12:15and
0:12:16that's the connection
0:12:17do would be
0:12:18so over here we showing of what is the R any
0:12:22across different organisms
0:12:24and that of and and this to do with here
0:12:26and what you would see is that the structure is the same and when i C C am i mean
0:12:30of and a logical sense rather than a very exact sense that's
0:12:34some sir
0:12:34a some that about
0:12:35to use
0:12:36but if you look at these closely you will see that there are
0:12:40bases which are modified for instance this do you see where is change to and you below
0:12:45it becomes obvious that as not as you make you patients
0:12:48in a compensating fashion
0:12:49each time you change you do any a you jane the corresponding C do you
0:12:53you can still made in the beast bidding interaction and maintain the integrity of a second be structure
0:12:58so the structure and still be form a log in as a most to be able
0:13:02and will therefore be seen in the H
0:13:04so that multiple encodings
0:13:06all the same structures are provided to us
0:13:09by nature or true it's process of
0:13:11compensating nations
0:13:12and our goal is to try and predict signal structures by using this model of a lost to get a
0:13:18that that in as an would be decoding
0:13:20you want to use them collectively to decode
0:13:23and
0:13:23you can now look at a similarity to in them you can see the same you see the responding regions
0:13:28and
0:13:29also in addition you have the information from the alignment of these two C
0:13:33a a of what alignment
0:13:35but you also have
0:13:36information about alignment from this
0:13:38and so the goal of all are structure prediction structure and alignment
0:13:42i to come up with a production of these structures
0:13:45and well as a conforming a line
0:13:48an obvious this constraints from one which impose constraints on what we can do without
0:13:54so this is
0:13:54in some sense the frame but our goal is to take a number of input sequences
0:13:58the model C construct a prediction but that was pretty
0:14:01structures of these
0:14:02and also up
0:14:03or what's an optional and
0:14:07that's so that you phone like this all so that a programming out there is a mapping by just cycle
0:14:12this
0:14:13and again the similarity but that would coding
0:14:16is very
0:14:17telling
0:14:17this is exponential and topics complexity
0:14:19and the number of sequences
0:14:21you can the joint
0:14:22according
0:14:23two sequences and double
0:14:25you can give indications and the decoding
0:14:27the complex exponential and the interleaving that
0:14:30so this is something which is not feasible
0:14:32you and for two sequences
0:14:34this is something you cannot do without think one
0:14:38so
0:14:38our goal is to try and come up with a proper stick take me
0:14:42does this
0:14:43by iteratively computing
0:14:46single sequence for like properties
0:14:48and updating these as to go from iteration preparation
0:14:51and much the same be as the decoding
0:14:54for
0:14:54i
0:14:57so do talk about this in detail level
0:14:59present this
0:15:00but in this
0:15:01to to form
0:15:03so the way you can do but this and to real form
0:15:05is that you have these two sequences
0:15:08which have this structure but the just can be shown and this
0:15:11lower triangular matrix a with here
0:15:13but issuing showing what are the peace betting interaction
0:15:15so this space
0:15:16at this location
0:15:18a with the base at this location
0:15:20in the screen in this very that we're here
0:15:22and so on so these
0:15:23he is of or at least traces of lines as is you here
0:15:27corresponding to seconds to you have a corresponding
0:15:31set of
0:15:32based pair shown over here
0:15:34and then there is the alignment green the do which is between the two sequence
0:15:38oh the they just try to predict
0:15:40the best possible second structures
0:15:42can be a a a a dynamic program to and figure out
0:15:45what is the bar
0:15:47for
0:15:49alignment
0:15:49and what are the bearing interactions that way
0:15:51maximise
0:15:52you free energy chi
0:15:57in order to do this and the double frame
0:15:59we cannot live with hard decisions
0:16:01so the first thing that you do
0:16:03to to actually to present this
0:16:04in
0:16:05a soft frame but with information is problem
0:16:07so the base pairing properties become
0:16:10properties of base pairing
0:16:11the problem and i and properties become properties of alignment
0:16:15and then if you sequences as the figure you see
0:16:21at at this point
0:16:22you realise that
0:16:24if there is a very likely
0:16:26like a base pair in the sequence
0:16:27and it's highly likely
0:16:29that
0:16:30the fight i'm and of that base spare as a line with a given by prime and
0:16:33a second sequence and G prime and all that base pairs aligned with the T by of the second sequence
0:16:38it's providing you information
0:16:40about what of the second sequence
0:16:42and this is the information that you get a a bit of an alice
0:16:47so
0:16:47we can easily see and four
0:16:50a to your properties
0:16:51or base pairing
0:16:52for a second sequence
0:16:54by using the information
0:16:55or base baiting one sequence along with the alignment problem
0:16:59and because everything is a probabilistic
0:17:01all the information or and saw
0:17:03and this is something we can now incorporate
0:17:05in the voting of the sec
0:17:08you have to a sequence
0:17:09these sequences
0:17:10the process is not much different
0:17:12you can use
0:17:13the do this information to two sequences
0:17:16and
0:17:16in for a what are the properties of base pairing
0:17:19for the third sequence
0:17:21same way and you can be this and that is a weighting scheme that we come up with which
0:17:27so here is essentially a that
0:17:29a scheme works
0:17:29if you're trying to predict what is the
0:17:32extrinsic information what of the information provided you for you for for of a given sequence X M
0:17:37but other sequences
0:17:39use use information from all the other sequences the corresponding alignment property matrices
0:17:43in for these
0:17:44we them an appropriate uh
0:17:46combine them to come up with a
0:17:48extending thing information for paying of a given C
0:17:50this and the information can be incorporated it a frame but in much the same way at as done that
0:17:55would coding
0:17:56it has an interpretation
0:17:57as
0:17:58to the posterior property
0:18:00in the stuff that you have a
0:18:02in the what decoding coding also
0:18:03i well it a lot of to drop the you of see and or or or a good also has
0:18:07the structure of to to the updating the base being property
0:18:13oh is the summary of this
0:18:15and then want to a it this you can find pretty how would the high prediction
0:18:18how
0:18:21oh present this
0:18:22before for i make that presentation of what was point out the computational complexity
0:18:26is also
0:18:27similar to would decoding
0:18:29we get the complex be compared to single sequence folding
0:18:32while i to get the benefits
0:18:34a a joint sequence training
0:18:35the joint sequence for would be exponential in the number of sequences this is
0:18:39you do the for two D and to the part okay
0:18:42a whereas as a complexity is you square
0:18:46so we can uh well it is
0:18:48a look at how these performed
0:18:50and
0:18:51i will give you the results quickly
0:18:53so
0:18:53we are it this or a benchmark dataset but not structures
0:18:58and we can value these by looking at a sense be was as P P V
0:19:01since to really is the number of actual that he predicted directly
0:19:05P B we had the number of predictions that i
0:19:07so that
0:19:08a standard are a of you are what is to be
0:19:10in the upper right corner of what they're
0:19:13and here is
0:19:14double for
0:19:15for three sequence double for for ten sequences
0:19:18and
0:19:19or need techniques this is log on any which is a technique would just probably stick information
0:19:24on a highly four
0:19:26and single sequence for
0:19:28so the message here is that by using this information and initiatives
0:19:32fashion
0:19:33you get do significantly better and what is these that
0:19:36time i'm also be disk
0:19:37if a better than these on an L for is much faster but then you can always give the wrong
0:19:42on so
0:19:42but
0:19:45got load
0:19:45at present a double for a multi sequence
0:19:48structure prediction the
0:19:49which has strong and is with that would be coding and is motivated by this and as you hear
0:19:54and provide that is the close to or high that everything is forming
0:19:58well having like city
0:19:59similar to sing
0:20:02i this collection
0:20:03coding T
0:20:04and for the
0:20:05to
0:20:31yeah
0:20:38so i given weeks is the one who was on shape based techniques
0:20:43we you collaborating with him trying to see how we can incorporate shape
0:20:46i that in addition to the data that incorporating
0:20:49there's a very strong analogy in the way the she that can also be an now incorporated
0:20:53it's also was to get property which you can be
0:20:57in do the forming of sequence
0:20:59traditionally that has been a a sequence you can single sequence for
0:21:02you working on trying to see how you and are in that the much as C
0:21:07that's right
0:21:08i will
0:21:10and more recently also like to see how we can apply this to a I V N S I B