Pages

Friday, March 12, 2010

syntax in SMT

Statistical Machine Translation is widely used in European and Chinese languages. The idea behind Statistical Machine Translation is pretty simple, how human can decode or translate unknown language?. If you give some large thousand parallel word aligned corpus or a huge dictionary to a person who is not familiar with the language. Then, how person will understand or learn the language. Similarly you can give parallel corpus to machine and machine will learn the word alignment and process it. The machine can learn and read the huge parallel corpus which for human takes years to understand. But, question arises who is best, human or machine, weather human learns only lexical mapping. I still remember, the last class of course of natural language processing, when head of my department, Prof Rajeev Sangal sir asked students for their last doubts. And one of my dear friend GVS Reddy asked, why we build these parse trees?, why machine needs parse trees?. Is this is usual way, how human learns the language, weather humans also design the parse trees in their mind. Well at that Prof. Rajeev Sangal sir, had given answer in favor of question. Right now I know, there is whole branch of cognitive parsing which deals with this area, and one of my friend Phani Gadde doing working in it.
The same thing is realized by machine translation researchers, and new branch of statistical machine translation emerges which is known as syntax based machine translation. As human also learns bilingual grammar with learning word mapping. Similarly Syntax based Machine Translation also uses bilingual synchronous grammar and word mapping. Most of syntax based machine translation uses synchronous context free grammar. There are different variations of syntax based machine translations 1)string to parse Tree 2)parse Tree to parse Tree 3) parse tree to string. Although, most of the syntax based machine translation system does reordering, insertion and deletion of words in parse tree. The benefit of syntax based machine translation is that it supports long distance reordering. As most of the statistical machine translation system penalties the long distance reordering, but in syntax based SMT reordering is done at node level of parse tree. So it supports long distance reordering.

No comments:

Post a Comment