Natural Language Processing: Difference between revisions
Mr. MacKenty (talk | contribs) (→Terms) |
Mr. MacKenty (talk | contribs) (→Terms) |
||
Line 9: | Line 9: | ||
== Terms == | == Terms == | ||
{| style="width: 95%;" class="wikitable" | {| style="width: 95%;" class="wikitable sortable" | ||
|- | |- | ||
! Term !! Definition | ! Term !! Definition | ||
Line 33: | Line 33: | ||
| Lemma ||A lemma is a set of lexical forms having the same stem, the same major part-of-speech, and the same word sense. | | Lemma ||A lemma is a set of lexical forms having the same stem, the same major part-of-speech, and the same word sense. | ||
<ref>https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf</ref> | <ref>https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf</ref> | ||
|- | |||
| Word Form || The word form is the full inflected or derived form of the word.<ref>https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf</ref> | |||
|} | |} | ||
Revision as of 19:37, 6 September 2017
Natural language processing (NLP) is a field of computer science, artificial intelligence and computational linguistics concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language corpora. [2]
The big ideas in AI[edit]
Terms[edit]
Term | Definition |
---|---|
Text Normalization | Normalizing text means converting it to a more convenient, standard form [3] |
Tokenization | A part of text normalization. Given a character sequence and a defined document unit, tokenization is the task of chopping it up into pieces, called tokens , perhaps at the same time throwing away certain characters, such as punctuation[4] |
Lemmatization | A part of text normalization. The task of determining that two words have the same root, despite their surface differences.[5] |
Stemming | Stemming refers to a simpler version of lemmatization in which we mainly just strip suffixes from the end of the word.[6] |
Sentence segmentation | breaking up a text into individual sentences, using cues like periods or exclamation points. [7] |
Corpus | In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (nowadays usually electronically stored and processed).[8] |
Utterance | In spoken language analysis, an utterance is the smallest unit of speech.[9] |
Disfluency | A speech disfluency, also spelled speech dysfluency, is any of various breaks, irregularities (within the English language, similar speech dysfluency occurs in different forms in other languages), or non-lexical vocables that occurs within the flow of otherwise fluent speech. These include false starts, i.e. words and sentences that are cut off mid-utterance, phrases that are restarted or repeated and repeated syllables, fillers i.e. grunts or non-lexical utterances such as "huh", "uh", "erm", "um", "well", "so", and "like", and repaired utterances, i.e. instances of speakers correcting their own slips of the tongue or mispronunciations (before anyone else gets a chance to).[10] |
Fillers or filled pauses | A part of disfluency. Words like uh and um are examples of fillers.[11] |
Lemma | A lemma is a set of lexical forms having the same stem, the same major part-of-speech, and the same word sense.
[12] |
Word Form | The word form is the full inflected or derived form of the word.[13] |
Standards[edit]
References[edit]
- ↑ http://www.flaticon.com/
- ↑ https://en.wikipedia.org/wiki/Natural_language_processing
- ↑ https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
- ↑ https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html
- ↑ https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
- ↑ https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
- ↑ https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
- ↑ https://en.wikipedia.org/wiki/Text_corpus
- ↑ https://en.wikipedia.org/wiki/Utterance
- ↑ https://en.wikipedia.org/wiki/Speech_disfluency
- ↑ https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
- ↑ https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
- ↑ https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
[[Category:Artificial Intelligence]