Articles proposes a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. The model achieves 28.4 BLEU on the WMT 2014 English- to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. The Transformer also generalizes well to other tasks.
Continue ReadingLearning to Compare: Relation Network for Few-Shot Learning
A general framework for few-shot learning. The method, called the Relation Network (RN), is trained end-to-end from scratch. Besides providing improved performance on few-shot learning, our framework is easily extended to zero-shot learning.
Continue ReadingEnriching Word Vectors with Subword Information
Popular models that learn word representations ignore the morphology of words, by assigning a distinct vector to each word. Article proposes a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram; words being represented as the sum of these representations.
Continue Reading