Vineet NLP Blog: The locus of word sence disambiguation

The word sense disambiguation plays important part in Machine Translation. The word sense disambiguation is more important for the languages which belong to different language families.
For example, Word sense disambiguation is important in Hindi- Panjabi Machine Translation as compared to English-Hindi MT.
Since social and historical aspect of language also plays role. People use word in same sense in the languages which belong to same region/area or same language family,.
For example in Hindi-Urdu, the problem of word sense disambiguation doesn't exists as you can get Urdu sentence from Hindi string by using only transliteration.
The Ambiguity is present in two forms in language.

Lexical Ambiguity:- A word can have more than one sense, e.g. bank can be river bank, and it can be financial bank also. This depends on context in which word is used. But please note word 'bank' is noun in both sense. So word sense disambiguation efficiency will be more if you use it to resolve similar category senses. In other words use the WSD on POS-tagged sentence. Word sense disambiguation is used in mainly two types of problem, Machine Translation and Topic or concept expansion. Machine Translation uses POS-tagging, chunking and others process in pipeline and WSD module come above them, so this thing is handle in ideal way in which I discussed. The second problem is topic expansion, summarization, key phrase extraction. For this problem, Latent Dirichet Allocation, Latent Semantic Analysis and Latent Semantic Indexing . They uses lexical chain for context expansion. These approaches uses only lexical information for disambiguation. I will talk about WSD in Machine Translation.
Structural Ambiguity:- Structural Ambiguity exists because of nature of language. The sentence structure is responsible for ambiguity. For example 'Ram is looking boy with telescope'. In this sentence, it is possible that Ram is using the the girl using telescope. It is also possible that girl is having the telescope. Whenever structural ambiguity exists in the sentence, there are more than one possible parse tree.

Now I will talk about the locus of the problem of ambiguity in Machine Translation. The word sense disambiguation problem solution depends on where ambiguity is present in the language pair.

Source language or target language:-Is Word sense disambiguation part of source language or it is part of target language? Ambiguity can be present in both the languages. If it is part of source language then word sense disambiguation is done by using mono-lingual source language dictionary before lexical translation. Other wise word sense disambiguation should be done after lexical translation and using bilingual dictionary. But it depends on language pair which we are using. For example if languages are closely related then you will find only one type of word sense disambiguation problem. One can mark source type sense with target type sense in parallel sentences, can look how ambiguity varies between language pair and what is default sense.
Writer, Reader or text:- Ambiguity can be present in writer usage. Every writer has different controlled vocabularies. And ambiguity can be present in reader who is understanding it. It may be possible that ambiguity is present as reader is not giving full attention and missing something. It can also possible reader has less or different view or knowledge about the topic. In old days, researchers consider that meaning is inside the text only. But that is not only case. The meaning can be varied with writer and reader. If word sense disambiguation problem is writer specific, then mono-lingual writer text or monolingual writer specific dictionary. You can use them using active learning algorithm. Active learning algorithm is semi-supervised machine learning algorithm and gives support to use plain text with tagged corpus . If it is reader specific then you can use reader favorites in topic, or bookmark texts or reader logs. The similar type of problem is personalized search, where we uses the user logs.
Media:- Media also effect peoples as people generally follow them and sometime they start using non-conventional meaning. Media can be printing media or cinema. I will example of some of movies which gave non-conventional word sense to words. Two year back, one movie came (Dostana), which mean friendship. The movie focus is a lie that two of lead actors in movie are homo-sexual. People starts commenting in social networks as Dostana, on one another pictures and others. Although Dostana doesn't mean homo-sexual relationship. Then later one movie came, love aaj kal, they used word 'Mango people' as 'ordinary people' as fancy word. As in English-Hindi language pair.
mango=>AAM(noun), ordinary=> AAM(adjective). So some people start calling ordinary people as Mango people. The latest movie which is biggest hit comedy in Indian cinema, is 3 idiots. The film is not mad or crazy people but it is story about the people who follows there heart. The movie changed the definition of idiots. And people liked to be called as idiots. This types of non-conventional senses since social networking websites have such examples.

Vineet NLP Blog

Pages

Thursday, March 25, 2010

The locus of word sence disambiguation

No comments:

Post a Comment