Natural Language Processing: Difference between revisions

From Computer Science Wiki
(Created page with "right|frame|Artificial Intelligence<ref>http://www.flaticon.com/</ref> Natural language processing (NLP) is a field of computer science, artificial intelligen...")
 
Line 16: Line 16:
|-  
|-  
| Tokenization || A part of text normalization. Given a character sequence and a defined document unit, tokenization is the task of chopping it up into pieces, called tokens , perhaps at the same time throwing away certain characters, such as punctuation<ref>https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html</ref>
| Tokenization || A part of text normalization. Given a character sequence and a defined document unit, tokenization is the task of chopping it up into pieces, called tokens , perhaps at the same time throwing away certain characters, such as punctuation<ref>https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html</ref>
|-
| Lemmatization || A part of text normalization. The task of determining that two words have the same root, despite their surface differences.<ref>https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf</ref>
|-
| Stemming || Stemming refers to a simpler version of lemmatization in which we mainly just strip suffixes from the end of the word.<ref>https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf</ref>
|}
|}


== Standards ==
== Standards ==

Revision as of 20:12, 6 September 2017

Artificial Intelligence[1]

Natural language processing (NLP) is a field of computer science, artificial intelligence and computational linguistics concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language corpora. [2]

The big ideas in AI[edit]

Natural Language Processing

Terms[edit]

Term Definition
Text Normalization Normalizing text means converting it to a more convenient, standard form [3]
Tokenization A part of text normalization. Given a character sequence and a defined document unit, tokenization is the task of chopping it up into pieces, called tokens , perhaps at the same time throwing away certain characters, such as punctuation[4]
Lemmatization A part of text normalization. The task of determining that two words have the same root, despite their surface differences.[5]
Stemming Stemming refers to a simpler version of lemmatization in which we mainly just strip suffixes from the end of the word.[6]

Standards[edit]

References[edit]

[[Category:Artificial Intelligence]