Showing 51 Result(s)

Attention Is All You Need

Articles proposes a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. The model achieves 28.4 BLEU on the WMT 2014 English- to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. The Transformer also generalizes well to other tasks.

Continue Reading

Enriching Word Vectors with Subword Information

Popular models that learn word representations ignore the morphology of words, by assigning a distinct vector to each word. Article proposes a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram; words being represented as the sum of these representations.

Continue Reading