hi everyone and change
and this is mike and we're gonna talk about that big to inputs
and first of all thank you all for attending i'll talk was
when i don't here and this is a force time to glottic
and i was talking to seven people about input so that nobody was kind of
interesting stuff so i guess you have the guy is kind of interested in and
that is really good for us
so first of all i would like to time t-norm being because they are the
first one that for that
the there S all the audience as well
who are really interested in non in using you all the languages
and maybe last year we integrate that i would see the norm and that was
you most were listing but we had some this solar discussion i don't it around
ignore mailing list and things but honestly for us
is it by testing which put how have one in the back stop
and i would really like to thank john than the T S
and we for the work
and maybe let's talk that then i'll start let's talk
you will be i'm going to talk about more about
what are can put them at the side
then why i
help protect input matters that a quite and a bit of terror ticket part behind
it and then
the projects currently what we are working on so that you really get to know
about more in a boat
you predicted stuff and
that's just for the i didn't have to the
and if you are having any questions at any time nice feel to interrupt us
so that we can and so at that point at so i'll be happy to
take down the questions as well
so all let starts
a one of the input matters because i did this slide
because most of you are not over know what i input like this are because
most of the new bodies are using the
in the this spanish keyboard or all the english keyboard or the next a keyboard
so i thought it would be really good idea to use it to have the
slice like this
so
then i put ice of input matters
roughly
one is kind of the rest input matters and all the rest and dispose input
methods
so characterbased input matters basically in D and
cool year or vietnamese we call you at as a transliteration best input matters why
be qualities transliteration based bit because we have the conversion between be
ask al products or like you know products in the other are to be similar
we can
all the languages so that is why we called be characterbased input matters and for
the in chinese and japanese stuff the core let's
it's a sentence was input matters because in those input matters you do you don't
have a
space in between the words so it's really complex to have these such important matters
if you see how job a japanese input methods are the japanese
a sentence looks like
this looks like this
this one
a that is a one
this that is the whole sentence
and is nothing but we are names in japanese
honestly i really don't know much about japanese but mike knows here so he has
inputted those characters if you see that on most basis in between the characters
but there are but
naturally they are more strict be space in between the chinese and or
japanese stick so that becomes really hot
to buy you japanese and chinese onto the computer
because apparently we have only i guess thirty to a in general i'm speaking about
but you to alphabets at such what to buy be a cactus other than the
english or be lacking characters it's really difficult job and
if you see right now if i use
you know the computer in my mother tongue that is not what i think is
moderately i of this full force at it and if you see this state of
current
input matters
the state of input matters on the next all
after typing something you see like this
i wasn't makes its kind of face
why was for example
i mean you want about like norm on my own on language on the deck
still
i ideally it should take twenty fives you still but apparently
it takes our own it nine you strokes and that's makes me mad why need
to die ninety still by a word which i could buy in a english or
be or
i know like a keyboard profile it us
so the predictive text is one of the way we are trying to solve that
problem so that you that
have to buy the less
you get some solutions and
maybe use this life will make this
happy
and
the need for such
that big input methods i and it dislike
baby force today because i was a listening to keynote by
a date and let's more when that actually arms
i mean four buttons now but he has shown he had shown the you with
the next
that that's okay
and
he shown some more statistics about the brazil so i thought why not why not
are working because we have like one point two one billion a population out of
which seventy four percent are
you can just read alright and in the language and out of reach what you
five to six of the whole population bunch of population they can understand english i
explicitly i did this because i've been telling on europe since last seven days and
i met several people and the have the misconception awarding get that everyone in get
can understand english it's really false
in there's a out of this population five two four six percent a percent of
the total population their billion just an english and i potentially could be one percent
of the you open the one point two billion
they have the you want and they use your technology they use in the operating
system or anymore well devices and for then if you don't you do better prediction
kind of thing they're gonna not they are not going to use E
do you a softer for example in
in the last year officially someone be more when you companies they sell more than
two million and burn devices and why it's so popular in india because in and
right you get lot of three acts as well as you get good input matters
apparently in this room as well we use all kinds of input matters indeed more
while or one devices
and if you can see the dallas adjusting we have twenty two of which any
recognise languages and i'm not just groups
and
if you can see that the rest of the world could be so the and
i good languages and the users you should provide good input matters to them so
that they can so that it will be have to present the languages
and another point is a are we are also having the that inputs or normal
on tablet kind of thing
and maybe for that we need putting matter size but
and another thing for example if you know we one language and you got really
good in typing one language and apparently you more stuff us be more than one
language and we know one language really but what do we really don't know the
are the language and to typing such kind of languages it makes a really hot
for example if you go to china and david data like really good in chinese
what if you tell them to type in english it because makes
because they know the language but they are not
really good in the particular language
so that is the need of such input matters
and
let's talk about how we can implement such things in fact is because to get
this additions
it's really hard because we have the number of words in the school you know
was
and how you can predict the next one
because you really don't know that okay what i'm going to say next
so there are two techniques what is just we use some several techniques such a
statistical techniques and you probably did very a pretty the next one
so
i'll be on it as a language model
so language model is nothing but
of we just
consider the problem
in and you and language what is the probability that one what would follow before
that word
for example like no i'm speaking something some something about the predicted X so you
can guess my next flawed all my neck sentences would be are something regarding the
language model
so similarly in probably get ready
are incomplete us or any and but matters that does the same thing then we
have be simple language model in that what you can see that is the number
of a princess of words
and divided by the number of hold what's in the language so that you get
the probability because somewhat some sentences some words they try to getting together
well for example i'm going so whenever i say a i then probability of the
next what would be and
the more score and saying it's not be exactly what
but just you probably
so
if you know little about do mathematics ideally don't want to going to the that
a good that direction what its kind of boring and will not like you much
so the amount goes sent is in i guess in nineteen sixties or seventies had
propose a really good
more T V that a visa like
if you know the idea of history
and
in the hysteria meant the same than you can calculate the future
so saint at is been using machine learning technics
but
you can just this team next word but you can just betting the next what
but that probability is kind of eighty percent you client base a hundred percent goes
wide so
because we are humans and human mind these kind of "'em" because we really don't
know what
we would do next
so that makes a really hard for the text prediction
so you probably don't do would depends on the probability of D and probably previous
words that is the basic thing what we of what is been used in the
text prediction so we calculate do you need honest bigrams and by bigrams unigrams is
nothing but that's a single word by defence is nothing but set up to words
and diagrams is nothing but a set of to us so for example know normally
so unique it on
well known these
is a kind of bigram and norm is also is a trigram
so
you can relate such probabilities on a huge part of course say we have will
be and so words on a given sentence so we try to calculate the unique
don's diagrams and
or trigrams and
depending on that to try to calculate we try to predict the next work support
example
so for example containing said you have to instances
aborting think is also norm is also and norm shall is also so there are
two different words
and start and stop on the team but
space is in what you can consider the special symbol
so that you can guess this sentence has been started and this sentence has been
finished
so in this example say it should know would be vocabulary in you a document
or in your corpus here
you will contribute start what i ease also stall and that show
and if you want to calculate the you need a model you need ample a
probably just for this morning is D probably you might want to consider the probability
of you what glottic so it's one S to sixteen how com is one S
to sixteen a because to got it is used when you understand the whole corpus
and the number of words in the corpus to sixteen so the probabilities one it
into sixteen
similarly the probability of ease is do what is
a team into sixteen this if you can apply to see mythological here so you
can get D
you need on model so similarly if you want to apply the same logic into
the background model as i said trigram or the lizzie a set of keywords
so it's so you can on
and divided by D starts time that means that a be probably the norm using
used
placing this whole corpus and in the number of sentences starting with just a startup
scene so it's politics to but see so if you apply the same logic to
the whole sentence
for example of
a probability of noam got X is also meant and start
you want to do like this you want to a lady same logic to the
was and then you will get like probably you go text asked you into probably
you'll ease glottic starts a single and to that end
so it's kind of motivated by
so that's all about the paralegal part which is kind of then that is again
beeps and that's like
if you don't if you get the unknown synthesis kind of thing but i to
so how to normalise such sentences but i really don't want to will be getting
to that complexity
so let's talk about the projects we are working on so one of the project
is i was type english the that do we are working on so at this
point of time
i didn't get to them i couldn't talk i would be
i posted melissa
so i tried to
demonstrated it's
so it
should
so
i guess okay so we implemented something like that as and i was in to
implement that it supports most language which
can be easily transmitted weighted so it doesn't support astonish already said it doesn't support
chinese and japanese because
extra more complicated step to conversion to chinese characters is necessary but
practically all other languages which can be well after consultation it's already finished are supported
and all where directly what input is already enough
and it users the way known input method from the M seventeen and lot of
the so users who know D's don't a need to get used to use stuff
and the hope is to improve typing speed a lot by getting very good predictions
and typing on the if you look have to select the hard work
and
most of the prediction comes from what do you the user types it learns from
the user input
and it one can speed it up by
giving some topeka text for what the user usually types to it used to time
needed for learning
and if i mean explain these the prediction is based on the previous two thoughts
on that i com database and if no suitable word can lose most suitable type
them can be found in the database it for expect to i'm spare dictionary some
shows predictors from huntsville dictionaries and it also uses times pay for collecting minor spelling
it was
and currently it's implemented in the front end five what's implemented in python and this
a database for you see collide
and i
why should shoulder little bit how it works
so
so i'm kind the german i was typing was to
first of all i delete everything which has learned been done so far too
to demonstrate that
S G and it
so if i'd type some german text
so you see the second time i typed at
at
i quit just selected that typing one that and see like because it be men
but the next about based on the previous context
actually i this type the last about so that support on the last to be
because i did a typing mistake and so the first say a suggestion is no
longer if i want to delete this from the database i can selected not this
one but this control one and sell so know this suggestion this one from the
database
and to speed up this learning process
i can that we didn't some
no not text file
i can select lot context five
so some example i have few have some
some book which that the system a date
and now if
look at some text in this book i can easily input the
the same text again this very little typing
the because it are just
you see that i'm using the german typing boost actually what i typed years english
so for the it doesn't really matter for that items what language you are using
you can mix the languages freely just like this with key application for the on
the way it does
and
currently we still have different engines for every language but i want to much is
in much un languages you much few engines
to support the same them saying which is in
on use more number of engines
it's to something else like for a nice model to
so you can also do the same system for practically and the
i don't know what this means that company come out here
and
or queen you see that the
suggestions the first character of suggestion
is in i'm will actually so we see only the first john more of the
i've typed only one jumble and the first act of that suggested lots as the
first run most is
korean
okay that's the or did i think for the demonstration
and
cool
well
i think
so the current problem solved i was
right
you
you can't use the same code to go other in jeans
or if you want to use the same girl it's really tedious so we have
started one more project
and if you can it's
it's an X prediction library of which is written in the vol a so that
you can using audit of projects as well
just nothing but you had to well the lab is nothing but
V handle all the key here but key variance and decline have to just subscribe
product expectation so that once you have subscribed you'll get a prediction as it it's
and
the next the next service we honestly need you have
we need help in testing
then this additions for improvements what new features because you are you guys at the
uses and if you have some suggestions we
we have a happy to implement those kind of things
and again they huntsville additional is what we are using know i honestly don't think
nobody meant instance will dictionaries this mean or a if your C D
i don't know i mean loss of difference billy studies
it's kind of maybe five to six years ago somebody created them
and all that this to something huntsville dictionaries and we would like improve grows
and also
a creation of we got was
that is the thing which is really need it for us
and
in all what we it's really hard to get if we call was for this
additions
and
so in future we might want to add some grammatical analysis as well so for
that corpus might be interesting at the moment we are doing only this markov model
stuff and having a big corpus doesn't actually had that much if you need to
know you which takes like all of picky pdf for english and the prediction based
on the simple markov model for the next about this something one out of two
hundred fifty or one out of five hundred which isn't very good so it works
only where at the moment if it's the
textual on from is what the user actually uses so
normal users don't hide and all the don't try to know complicated style like oscar
wilde or people tend to write a better vehicle for lunch or something like this
or the button to be could use that type just much more repetitive and having
really learning from the user input is the markov model much more help for them
the meeting at be corpus
and
and maybe that's thank you thank you only thing
but
you all your book on predictive implemented this are that your demonstrated also held at
E users
five we didn't get and if you become katie use us so far actually V
to get pretty very little feedback so i'm is asked for test as i asked
some of the type colleagues to tested in court some nice suggestions for improvements that
right implemented but there wasn't that much is a feedback and that kind remember anybody
from katie it works katie don't know so it's
so obviously for the i think and useful and it's context but roughly make a
production in terms of one thing keyboards
so i'm wondering you know what you thought if you give a thought to how
we can take this and apply it when somebody's
using on screen keyboard results we have more general issue of how we integrate i'd
methods with on screen keyboards but i was curious what that you had
the county doesn't get work this on screen keyboard spot and we want to make
it work in future this one's thinking about and that this also one of the
reasons why on each wants to put it into a liability because the nets will
be easier to use from an on screen keyboard and with the current implementation and
i've just one time
problems can see what i think it makes much more sense for actually for myself
when i type german or english i'm typing too fast so usually for me it's
easier to just finish typing the about instead of looking and selecting but a nice
at that many people in india are not comfortable with the way that consultation this
time and hard time figuring it out and so for people who use computer for
the first time in india it's very helpful if they get some suggestions after typing
only if you let us similar like people on the touch us clean
have difficulties typing
i guess that it makes me wonder a question have you thought about whether they
should be enabled by default in some languages should just we wanted if you
choose indian input language program should just work like this by default yes
of the on planning like to people but one meeting the people do we need
to fix them up to code bugs for example when you try to integrate it
as a text of input method
but you need to fix shootings for example you if you're typing in a say
if you're typing something in google
you wouldn't want to situations
i guess i'm which means that but it it's have display some suggestions and they
don't say it don't function
look up table gets into the way of the good suggestion so they overlap each
other
so it's
minute to switch it off and on all the time
actually we need that would be then what that for example if you want to
type something in those the
and in that case as well you wouldn't require suggestions as well
we need to do to my
i mean indies you can actually there is maybe i to control that in some
way now so there are these input hints that you can apply to text entry
fields you can say i don't want it's you know this calculator i want and
you mac stuff and this field or you can say then in your inhibit the
on screen keyboard which you know you could then maybe imply okay and want to
hear well prediction so maybe we can extend that technique and apply that to other
toolkits and things that we have no good at a for something like the google
search the field at the moment because sometimes of course if you type in the
balls i wanted if you remain used it also for checking or whatever and how
to find out that the user is typing into the google search for years so
i don't know how to do that at the moment
i think that it may do the right thing on and right i i'm not
completely sure but i think might be maybe in is
in H T M L so we just have to your out of expose them
through to get the to the right place a you mean that's and they hmms
to that page maybe
i what we should we should listen deca see there
okay so another questions thank you very much
okay