Popular models that learn word representations ignore the morphology of words, by assigning a distinct vector to each word. Article proposes a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram; words being represented as the sum of these representations.
Continue ReadingDistributed Representations of Words and Phrases and their Compositionality
An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. Article presents a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Continue Reading