Vineet NLP Blog: March 2010

Wednesday, March 31, 2010

Natural Language Generation and Information Extraction: Sisters by chance

Many natural language processing applications use Natural language generation(NLG) and Information Extraction(IE) together.

Natural language generation is used for generation of natural language sentence from given bag of words.
Information Extraction is used extraction of information to fill up template from given text. Many people get confused between the data-extraction and information extraction. Typical information extraction problem is the extraction of information from news. For example one want to fill up score card automatically from cricket commentary text. In this problem, score card is the information extraction template used. Score card has definite structure, like batman name, run scored, no of sixes, no of fours.

Information Extraction as tagging Problem:- Information extraction can be seen as tagging problem where tags are fields of template of Information extraction. The fields mostly composed of named entity recognition. This is the reason why most of information extraction system (Stanford NER and GATE ANNIE) are the extension of named entity recognition system.
Information Extraction as relation extraction:- Information extraction is not simply Named entity recognition system, it also extracts the relation which exists between the entities. The relation can be implicit or explicit. The anaphora resolution comes under this category, as anaphora resolution represents relation between the reference and entities. The relation can be ontology or domain specific.

They are used together in many natural language processing applications. Some of the applications are

Abstractive Summarization:- Summarization of two basically two types, extractive summarization and abstractive Summarization. Extractive summerization extracts the important sentences, paragraphs from the sentences. Whereas abstractive summerization generates the summary. It uses information extraction to fill up the template and uses Natural language generation to generate the summery.
Question answering system:- Many Question answering system use IE and NLG system together. Question answering system can have following module

Question understanding Module
Information retrieval system to collect text.
Information extraction system to extract the possible candidate answer.
Generation of answer from the template

For example general question types are who,whom,when and others. You can extracts the information and give back the answer.
3. Spoken dialog Manager:-You may have seen Spoken Dialog Manager in many customer care service. They understand your question and give answer interactively. You can divide spoken dialog manager in three parts, 1)Automatic speech recognition, 2)Interactive question answering system 3)speech synthesis.
As it has interactive question answering system in its pipeline so it uses Information extraction and natural language Generation together.
4. Machine Translation:- Machine Translation is one of the applications where Natural language Generation system is used alone. Machine Translation has two basic part structural translation and lexical translation. structural translation module is used to translate target language order from source language order. Most of Machine Translation system use lexical translation module after structural translation module. They use bilingual parameters and rules for structural translation. Anyways, if one is doing structural translation after lexical translation then he can use Natural language generation for structural translation.

Monday, March 29, 2010

Introduction

I am Vineet Yadav from IIIT Hyderabad currently pursuing Master degree (M.tech in computational linguistics) in NLP. I worked at Language Technology Research Center, which is one of the largest NLP research groups in India. This is my personal and Technical blog . I am currently working in Serendio, a text mining company.

Saturday, March 27, 2010

Machine Translation: Divide and Rule

Divide and rule is not new paradigm. Divide and rule technique is first introduced by British people who ruled India and rest of world during Queen Victoria time. Later, technique is used in computer science and known as divide and conquer. In Divide and Conquer approach, first problem is divided into smaller tasks and process each task and combine the result. Same approach is used in natural language processing, breaking down the stories into paragraphs, paragraph into sentences, sentences into clauses, clauses into phrases, phrases into words. Each sub part acts as one unit, and one can divide text into these units and combine the results. There are some restrictions also. For example Context free grammar is capable to generate infinite length sentences or long sentences. But this thing is never used, since human can't understand very long length sentence. While understanding also, human also break down the text in smaller units.
In Machine Translation, also researchers divides the text into different units and process each units and get the translated results. I will talk about each of them one by one.

Statistical Machine Translation(IBM model):- Statistical Machine Translation looks source language sentences as a sequence of words. It uses EM (Expection Maximization) algorithm over the word aligned corpus to learn parameters and initialize the model. The Machine Translation system paneltize for reordering so doesn't work well for languages belonging distant families.
Phrase Based Machine Translation:- The Phrase based Machine Translation system(Kohen et. al, 2003, och et, al, 1999) works at phrases level. The Phrase based Machine Translation learns the source language and target language pair of phrases. The Phrase based Machine Translation is also capable to translate non-conventional phrases.
Hierarchical phrase based Machine Translation System:- Hierarchical phrase based Machine Translation System( Chiang 2005) is same as above system, except that the phrases learned in Machine Translation system are hierarchical.
Syntax based Machine Translation system:- Syntax based Machine Translation system(Yamada and Knight, 2001) is basically parse tree to string translation system. They parsed the sentence using CKY algorithm and then done reordering, insertion and deletion at node level. Each node of phrase structure parse tree can be looked as Hierarchical phrase.
Dependency Treelet Translation:- Dependency Treelet Translation( Quirk, 2005) is Microsoft approach for Machine Translation. Their approach is called a dependency treelet translation system because in contrast to standard phrase based MT system that learns phrase pairs, (Quirk et. al. 2005) learn treelet pairs. They use a source dependency parser and word-aligned source and target sentences. Then, they project source dependency structure to target and learn treelet translation pairs between source and target. They have used Maximum Likelihood method for extracting treelet translation. The advantage is that they can learn non-continous phrases also.
chunk based Machine Translation:- In chunk based Machine Translation ( Watanabe, 2003) does machine translation over chunked sentences. The chunk and phrases are similar expect that chunk doesn't have recursive nature like phrases. In other words chunk doesn't contain another chunk inside it.

Thursday, March 25, 2010

The locus of word sence disambiguation

The word sense disambiguation plays important part in Machine Translation. The word sense disambiguation is more important for the languages which belong to different language families.
For example, Word sense disambiguation is important in Hindi- Panjabi Machine Translation as compared to English-Hindi MT.
Since social and historical aspect of language also plays role. People use word in same sense in the languages which belong to same region/area or same language family,.
For example in Hindi-Urdu, the problem of word sense disambiguation doesn't exists as you can get Urdu sentence from Hindi string by using only transliteration.
The Ambiguity is present in two forms in language.

Lexical Ambiguity:- A word can have more than one sense, e.g. bank can be river bank, and it can be financial bank also. This depends on context in which word is used. But please note word 'bank' is noun in both sense. So word sense disambiguation efficiency will be more if you use it to resolve similar category senses. In other words use the WSD on POS-tagged sentence. Word sense disambiguation is used in mainly two types of problem, Machine Translation and Topic or concept expansion. Machine Translation uses POS-tagging, chunking and others process in pipeline and WSD module come above them, so this thing is handle in ideal way in which I discussed. The second problem is topic expansion, summarization, key phrase extraction. For this problem, Latent Dirichet Allocation, Latent Semantic Analysis and Latent Semantic Indexing . They uses lexical chain for context expansion. These approaches uses only lexical information for disambiguation. I will talk about WSD in Machine Translation.
Structural Ambiguity:- Structural Ambiguity exists because of nature of language. The sentence structure is responsible for ambiguity. For example 'Ram is looking boy with telescope'. In this sentence, it is possible that Ram is using the the girl using telescope. It is also possible that girl is having the telescope. Whenever structural ambiguity exists in the sentence, there are more than one possible parse tree.

Now I will talk about the locus of the problem of ambiguity in Machine Translation. The word sense disambiguation problem solution depends on where ambiguity is present in the language pair.

Source language or target language:-Is Word sense disambiguation part of source language or it is part of target language? Ambiguity can be present in both the languages. If it is part of source language then word sense disambiguation is done by using mono-lingual source language dictionary before lexical translation. Other wise word sense disambiguation should be done after lexical translation and using bilingual dictionary. But it depends on language pair which we are using. For example if languages are closely related then you will find only one type of word sense disambiguation problem. One can mark source type sense with target type sense in parallel sentences, can look how ambiguity varies between language pair and what is default sense.
Writer, Reader or text:- Ambiguity can be present in writer usage. Every writer has different controlled vocabularies. And ambiguity can be present in reader who is understanding it. It may be possible that ambiguity is present as reader is not giving full attention and missing something. It can also possible reader has less or different view or knowledge about the topic. In old days, researchers consider that meaning is inside the text only. But that is not only case. The meaning can be varied with writer and reader. If word sense disambiguation problem is writer specific, then mono-lingual writer text or monolingual writer specific dictionary. You can use them using active learning algorithm. Active learning algorithm is semi-supervised machine learning algorithm and gives support to use plain text with tagged corpus . If it is reader specific then you can use reader favorites in topic, or bookmark texts or reader logs. The similar type of problem is personalized search, where we uses the user logs.
Media:- Media also effect peoples as people generally follow them and sometime they start using non-conventional meaning. Media can be printing media or cinema. I will example of some of movies which gave non-conventional word sense to words. Two year back, one movie came (Dostana), which mean friendship. The movie focus is a lie that two of lead actors in movie are homo-sexual. People starts commenting in social networks as Dostana, on one another pictures and others. Although Dostana doesn't mean homo-sexual relationship. Then later one movie came, love aaj kal, they used word 'Mango people' as 'ordinary people' as fancy word. As in English-Hindi language pair.
mango=>AAM(noun), ordinary=> AAM(adjective). So some people start calling ordinary people as Mango people. The latest movie which is biggest hit comedy in Indian cinema, is 3 idiots. The film is not mad or crazy people but it is story about the people who follows there heart. The movie changed the definition of idiots. And people liked to be called as idiots. This types of non-conventional senses since social networking websites have such examples.

Sunday, March 21, 2010

Singularity in Natural language Processing

The Singularity is very old phenomena in Science and physics. For example electromagnetism shows singularity behavior between the electricity and magnetism. Similarly, Ebert Einstein later on, came up with the string theory. He combined all four different fundamental forces(gravitational, electromagnetic, weak and strong interactions) together and explained all with one formula. The Singularity phenomena brings the simplicity and independence in system. The physics and natural language are not much different. As researchers look both of them in terms of mathematics. You can treat the collations unit, word group as mass and attractions between the words can be treated as semantics which revolve around them and distance(Physics) is equivalent to word, sentence difference and even syntax.
Coming to the point, we have n-numbers of natural language tools, which extract or tag different type of information. How intelligently can we used maximum of them together. The whole system architecture some times depends on the order of execution and what are all the resources and tools one going to used. Can't we used them in parallel? Can't we able to design the perfect data structure which hold all type of information and which has fundamental operations like update, insert and delete? A data-structure which can hold the information at word level, word group level, parse tree level, sentence level. A data-structure which can hold information given be basic NLP toolkit like tagger, morph-analyzer to complex NLP toolkit like information extraction. A data-structure where we can map different resources like word net, frame net, propbank and others.

Friday, March 19, 2010

The Context Free Grammar and Machine Translation

My friends who worked with me might know all these fancy words like Machine Translation. But most of my friends who don't know can refer my previous post. Here I will talk about the Synchronous context free grammar, chart parsing and Machine Translation. It is extension of my previous post 'syntax in MT'. Well some of computer science friends may also know about context free grammar and chart parsing. Some of them might seen chart parsing in compiler design and context free grammar in Theory of Computation. Well context free grammar, can be looked as simple conversion rules.
The context free grammar for infix expression looks like
A->A+A|A-A|A*A|A/A|a|b|c

where A is non-terminal and a,b,c,+,-,*,/ are terminal symbols.
The parse tree of operations 'a-b+c' will looks for the context free grammar is shown below.

The process of getting parse tree from the set of operation('a-b+c') is known as parsing. There are many parsing techniques but they are divided into basically two types of approaches

TopDown Parsing:-
Bottom up Parsing:-

But how context free grammar is useful for machine translate. The above context free grammar is for infix operations. Suppose i want to convert infix string to post-fix string, using the context free grammar. Then post-fix grammar will look like
A->AA+|AA-|AA*|AA/|a|b|c
And we should combine both infix and post-fix grammar such that they can be use for translation of infix to postfix.
Then the grammar is known as synchronous context free grammar and looks like
A->A+A;AA+|A-A;AA-|A*A|AA*|A+A;AA+|A/A;AA/|a;a|b;b|c;c
where symbols before ; represents the infix operations and symbols after ; represents the post-fix.
So translation process is shown below.

In the translation, rules are used for reordering operation only and first A+A is converted to AA+ using rule A->A+A;AA+ and then second rule is applied A->A-A;AA- . But Machine Translation supports insertion, deletion also, which are done at the node level only.
We get target postfix string as ab-c+

Friday, March 12, 2010

syntax in SMT

Statistical Machine Translation is widely used in European and Chinese languages. The idea behind Statistical Machine Translation is pretty simple, how human can decode or translate unknown language?. If you give some large thousand parallel word aligned corpus or a huge dictionary to a person who is not familiar with the language. Then, how person will understand or learn the language. Similarly you can give parallel corpus to machine and machine will learn the word alignment and process it. The machine can learn and read the huge parallel corpus which for human takes years to understand. But, question arises who is best, human or machine, weather human learns only lexical mapping. I still remember, the last class of course of natural language processing, when head of my department, Prof Rajeev Sangal sir asked students for their last doubts. And one of my dear friend GVS Reddy asked, why we build these parse trees?, why machine needs parse trees?. Is this is usual way, how human learns the language, weather humans also design the parse trees in their mind. Well at that Prof. Rajeev Sangal sir, had given answer in favor of question. Right now I know, there is whole branch of cognitive parsing which deals with this area, and one of my friend Phani Gadde doing working in it.
The same thing is realized by machine translation researchers, and new branch of statistical machine translation emerges which is known as syntax based machine translation. As human also learns bilingual grammar with learning word mapping. Similarly Syntax based Machine Translation also uses bilingual synchronous grammar and word mapping. Most of syntax based machine translation uses synchronous context free grammar. There are different variations of syntax based machine translations 1)string to parse Tree 2)parse Tree to parse Tree 3) parse tree to string. Although, most of the syntax based machine translation system does reordering, insertion and deletion of words in parse tree. The benefit of syntax based machine translation is that it supports long distance reordering. As most of the statistical machine translation system penalties the long distance reordering, but in syntax based SMT reordering is done at node level of parse tree. So it supports long distance reordering.

Tuesday, March 9, 2010

What is the rule

Every Natural language processing application is evolved from rule based system, weather it is machine translation system, parsing, classification or any other field. Later some of the rule based applications are replaced by statistical applications, but still rule based system is more appreciated for the languages which are suffering from low resources. So for understanding rule based application, we should understand first what is the rule.
Rule is binary class condition which has three 3 Cs Constraints, Class and Consequences. you can also called consequences as actions or reactions. Rules can be integrated in rule based application in two forms.

1) Embedded or hard-coded Rules:- Rules can be embedded as "if..else if..else" statements in programs. Where each if and "else if" conditions are the constraints of rules, you can add class and consequences inside "if.. else if.." statement. The else statement is default rule. But now a days "if..else.." branch looks very odd. So it is better to use rule in form of dictionary or hash where the key of dictionary is the class and values are constraints.
for example
if cat1 ='A1' and cat2='A2':
return class1
elif cat1='B1' and cat2='B2':
return class2
------------------------------
-----------------------------
else:
return classDefualt

the rule in piece of code is really weird and loses readability. Instead of that you can make rule code separately and call the rules whenever needed.

#rules
rule= class1: {rcat1:'A1', rcat2:'A2'......}, class2:{rcat1:'B1',rcat2:'B2'..}.....
#calling rule
for each in rule:
if cat1==rule[each][rcat1] and cat2==rule[each][rcat2]... :
return each

This piece of code maintain readability and it provides independence as user can modify the rules independently.
2)Rule file:- Second way of integrating rules with system is by having separate rule file. This provides flexibility to the linguists to write the rules and computer scientists to integrate it. But before that linguists and computer scientists should come up with common definite template of rule.

Rules in different scenarios:- Rules or grammar can be used in different applications. I will discuss some of them in this section.
1) Classification:- This is the primary function of rule based system. In this type of problem the consequences or action part of rule is missing. The rules only contain Constraints and Class. One rule is used for binary classification(True or False) for a given class. You can use different rules together so that it can perform multi-class classifications. You can also order the rules so that one rule can trigger another rule. In other words one rule acts as constraint for another rule.
2) Extraction:- Pattern based rules can be used for extraction. Remember that every rule has constraints, class and consequences. These types of rules uses patterns as constraints. One rule has one or more than class associated with it. Different classes present different types of entities. Each class can be part of pattern or each pattern is classified under different class. For example take a NER rule, PERSON_NAME Public School=> ORGANIZATION_NAME. In this rule PERSON_NAME is present inside the pattern but whole extracted pattern is classified as ORGANIZATION_NAME.
3)Parsing:- Rules, now a days, are used in natural language parsing. There are the two variations for parsing 1)Phrase structure Parse and 2)Dependency Structure Parse. Phrase Structure is more popular for fixed word order language and uses context free grammar or tree adjoining grammar. The example of context free grammar is
VP=>V NP
where left hand side (VP) can be treated as class of rule, Right hand side ( V NP ) is treated as constraint of rule, and parsing is treated as action or consequences. The dependency structure parsing is another variation of parsing. One can convert dependency structure tree from phrase structure tree. One can also includes all different linguistics cues inside the rules for dependency parsing. One of the example of such system is LDM (Linguistics Discourse Model) which is used in discourse parsing.
4)Machine Translation:- Rules based system is also used in machine translation for language pair which don't have enough resource. The rules which are used in such system are known as transfer grammar.
For example, for English-Hindi language pair, simple transfer grammar is given by
NP VP NP=> NP NP VP
as English is SVO(subject verb object) language and Hindi is SOV(Subject Object Verb) language. So in above transfer grammar, SVO structure is converted to SOV structure.
So in transfer grammar, left hand side (NP VP NP) is constraint and Right hand side(NP NP VP) represents as re-ordering of phrases. This act as action or consequence of transfer grammar. The class can be present or absent.

I will talk about the advantage and disadvantage of rule based system.
1)Advantage:- It does not require much resources. You can have different class of rule such as robust rules. Robust rules are the rules which has high precision and low recall. The confidence level of robust rule is high and you can use them to improve statistical system.
2)Disadvantage:- Every application has different type of rules. There is no master rule which can be fit in all applications. For example you can use same machine learning algorithm across different domains but this is hard for rule based system. You can have some misunderstanding between the linguists team and computer scientists in rule based system. Sometime linguists don't understand the applications and frame too much general or specific rule. Sometime computer Scientists don't trust the linguists.

Thursday, March 4, 2010

From Stop-words to Grammatical words

Stop words are the words which occurs most of time in documents and carries no general meaning. Function words are words that have little lexical meaning or have ambiguous meaning, but instead serve to express grammatical relationships with other words within a sentence, or specify the attitude or mood of the speaker. This definition of Function words is taken from Wikipedia. Stop words are similar to function words. Function words are also called as grammatical words by the linguistics and stop words by Information retrieval guys. Linguists believe that they are important participant of sentence but Information Retrieval and access guys usually avoid them since they don't carry much information or lexical meaning. They are usually also known as closed class words, since one can't keep on adding new function words every time. Indian language being morphological rich language, function words are more important as compared to English language. In Indian languages, function words should be treated more like grammatical words rather than stop words.
The function words in a falls into following categories in following categories.
1)Article
2)Pronoun
3)Adposition
4)conjunctions
5)auxiliary verbs
6)pro-sentences
I would like to discuss few myths and facts related to stops words.
1) Are stop words language specific:- Generally stop word list is available for English and European related language. So can we translate the stop word list for the language which does not have stops words list. I think we can translate in some language pair which are closely related but not in all the languages. For example Indian language is morphological rich language and in some language stop words acts like inflections and become part of other words. Some category of stop words does not exists in some language or some language may have some new categories in stop words. For example, in Indian languages articles don't exists. Indians language uses definite pronouns and numbers in place of articles. But maximum words of stop words are same. So we can translate stop words list from one language to another.
2) Are stop words domain specific:- Some scientist generally get confused between the domain specific keywords and stop words. domain specific keywords are words which have high occurrence in domain specific text but in normal text they have negligible occurrences. Stop words are words which have high occurrence in all type of language documents. So in my view stop words are domain independent. If one has domain dependent stop words then make sure he/she is not losing any information. Since domain specific keywords are important for the domain. If one neglect them than he may lose some of information.
3) Are stop words source type dependent:- In my point of source can taken as writer of corpus or type of corpus. Type of corpus can be email, news corpus. Some time stop words are source-type dependent. Since sometime different source-type has different vocabularies. For example chat and email vocabulary is different from normal text vocabulary.
4) Are stop words Unambiguous:- This statement may be true for English language, But for morphological rich languages like Indian languages, stop words are generally ambiguous. Take a example of post-position. Some post-positions have more than four-fives senses. But on the average post-positions have more one sense.

Pages