Reading OpenAI paper on Unsupervised Multitask Learners

On Feb 14 the team of OpenAi, a famous Elon Musk and Peter Thiel AI startup, published information about their recent breakthrough. It instantly electrified press and received significant coverage by Business Insider, The Guardian, The Verge, and others. The headlines were alarming. But what actually is behind this “revolutionary AI” and how much is there to fear? [footnote] Just look at the headlines! Business Insider “An Elon Musk-backed AI firm is keeping a text generating tool under wraps amid fears it’s too dangerous,” or The Guardian: “AI can write just like me. Brace for the robot apocalypse.”[/footnote]

What spins the story even more is the fact that for now OpenAi won’t release the full source code of their Machine Learning model. As stated by Jack Clark, policy director at OpenAI, the team wants to encourage academics and the public to have a conversation about the potential harms of this technology before it becomes widely available, as it may be used for automated trolling which might influence online debate, allowing existing trollers to scale their efforts overnight.

Readings from the source for the diligent reader

A good starting point is to understand what has actually been published. Let’s find out what a diligent yet not purely scientific reader could learn about the, so called, GPT-2 Model from OpenAi reading the source paper.

Here’s a breakdown of the piece for the sake of simplicity, allowing for high-level understanding:

It goes without saying that researchers are looking to find a path toward general artificial intelligence. The piece in question is a step on that path.

“Current ML (machine learning) systems need hundreds to thousands of examples to induce functions which generalize well.”

It’s a well known limitation of existing Machine Learning models. Not only are they data-hungry, but also domain-specific. However, some scientists have started to suggest 1 that task-specific architectures are no longer necessary as the so-called, “self-attention block-based architecture”, makes it possible to escape both the narrow scope of use and the necessity of the tedious preparation of data which limits progress. That’s big news!

What is “Self-attention”? A mechanism used in a model conceived by Google Brain specialists 2 that the OpenAi team used with some modifications. A common practice in science to seek progress reusing the existing findings of others.

But a real breakthrough comes with the suggestion that a general system (domain-independent) should be able to use language to learn without supervision and, based on acquired knowledge, perform different tasks. In more specific words: reserchers suggest that systems could use the syntax of human language for its own conditioning, as it provides a flexible way to specify tasks, inputs, and outputs as a sequence of symbols. The team expressed the claim:

“Our speculation is that a language model with sufficient capacity will begin to learn to infer and perform the tasks demonstrated in natural language sequences in order to better predict them, regardless of their method of procurement.”

The work is therefore a quest to answer a question – can a neural network learn by itself using human language and, as a result, perform aw ide range of tasks?

In order to learn from a language, the OpenAi model interprets word constructs as byte chunks. The team reports it as a “practical middle ground between character and word level language modeling.” The team used three variations of the model which differ by capacity: starting from 345 million of parameters, 3 all the way up to 1 542 million.

All models were trained by using a dataset consisting of 40GB of raw text snippets from over 8 million documents, all scraped from social media platforms. The team filtered only those texts which could be assumed to have a decent quality thanks to being marked with at least three karma points.

What has been demonstrated?

A Machine Learning system, trained using language should, in practice, provide answers to questions as a result of effective symbol-processing. The hope is, of course, that successful systems will demonstrate increased comprehension of the question context even independently from the language that the system was trained on. In order to measure how well a system is doing, researchers came up with a set of tests over time which make it possible to examine answers.

This makes results quantifiable, allowing for both system-to-system result comparisons and also a point of reference to human performance. What’s interesting is that the results of the OpenAi model in text generating is better than previous attempts and is sometimes even close to the results expected from humans.

Scores of GPT-2

One of the tests, designed by the scientists measures how well the system finds the omitted word in a sentence. GPT-2 peformed great on this so called, ‘Children’s Book Test,’ with a 93,3% hit rate for common nouns and 89,1% for named entities.

The other test was design to measure understanding of a difficult context, where at least 50 tokens of context are needed for humans to successfully predict the final word of an unfinished sentence. The test is called ‘Lambada’ 4. Here GPT-2 increased the machine accuracy from previous record of 19% to a whopping 52,6%

Researchers also tested Reading Comperhension by using a set of documents from seven different domaines and asking questions related to the content of the documents. It’s worth noting that highly specialized systems already exist that are en par with humans in this isolated task. However systems designed and trained to excel only at a single task cannot really compare to a more general GPT-2 trained without supervision. From that perspective it’s score of 55 F1 5 is halfway to that of humans and specialized systems.

The score was obtained by conditioning GPT-2 to perform on specific documents, so the result should be seen as a result of an isolated test. The more general measure resulted from testing what answers the model is providing when asking factoid-style questions. Here the success ratio was only 4,1%, ten times worse than systems designed for this task alone.

One curious finding was related to how well the system performs translation between languages. Given an English dataset with some French words, the OpenAI model performed well in English-French translations, achieving 11.5 BLEU points 6. In comparison, the best unsupervised algorithm achieve 33,5 BLEU.

The OpenAI team also reports a score of 70,70% for common sense reasoning, tested using the Winograd Schema Challenge. These results should be taken with a pinch of salt, as the task was to identify the antecedent of an ambiguous pronoun in a statement in a limited set of 273 text samples.

This only demonstrates how high our expectation are in relation to what is actually tested. It also shows how actual results of Machine Learning models are often misinterpreted. OpenAI GPT-2 clearly demonstrated progress on a path of symbol processing systems with results closer to those of our cognition. But it’s a long way to what newspapers already make us fear with their bloated headlines.

Read more about history of symbol processing and hopes to make computers think in this article.


  1. See Radford et al, 2018 and Devlin et al., 2018

  2. See “Attention Is All You Need” by Ashish Vaswani et al. and “Improving Language Understanding by Generative Pre-Training” by Alec Radford et al. 

  3. Parameter is a learnable filter in a building block of the network

  4. LAMBADA stands for LAnguage Modeling Broadened to Account for Discourse Aspects

  5. F1 is a measure of a test’s accuracy. It considers both the precision (how useful the answer is) and and the recall (how complete the results are). High precision means that an algorithm returned substantially more relevant results than irrelevant ones, while high recall means that an algorithm returned most of the relevant results.

  6. BLEU: Bilingual Evaulation Understudy Quality is considered to be the correspondence between a machine’s output and that of a human on a scale from 0-100. The closer a machine translation is to a professional human translation, the better it is.

Why Business Software is Pushing for Playfulness

The entire development of a higher culture is basically a transformation of work into play — as Alan Watts famously puts it, “It’s about bringing delight into everything that is done.” How stark a contrast if we apply this line of reasoning to business software! Yet, at least one area of professional endeavor exists that actually requires a capacity for playfulness, spontaneity and joy. It is the creative process of coming up with ideas.

“We use simple signs to communicate the message, but when we discover how beautiful it is to draw a message — the work becomes play.”

Most of us abandon visual communication at an early age and exchange it for numbers and written language. We expect visuals in architecture or engineering, but don’t make any connections between the visual language and critical thinking, or problem solving, or innovation. The tools that support visual literacy span across different categories, like notepads, scratchpads, or interactive whiteboards, each category seen as more applicable to education than to business. And it makes sense, observing how education serves both the verbal and visual need of language acquisition. Around the age of 8, focus on the verbal begins to outweigh the visual. Sunni Brown in her book “The Doodle Revolution” suggests that our innate capacity for visual language is lost in this process to the supposedly “real” tools, dominated by words and numbers. In the process, taught symbols replace the capability to generate visual language. Yet we cling to visual doodling or sketching when necessary.

Patterns of System Thinking visualized

In reality most disciplines in business and management call for visual skills. Such as the skill for sharing vision with the purpose to portrait the future and foster genuine commitment in result. When looking forward we’re often using illustrations, pictures and graphs. After all, the very word ‘vision’ implies the state of seeing. The opposite process takes place in legal or compliance departments, that look to ensure obedience instead of inspiration. In result all regulations and zero-tolerance rules are shared as clean, sanitized text documents.

Creating and sharing ideas by using visuals is also helpful in other business areas, such as working with ingrained assumptions and generalizations called “Mental Models”, which represent how we understand the world. These are often already represented by pictures, diagrams or flowcharts. As good as tools for creating diagrams are, they fail to support exploration of more complex human actions. In effect, sketching on a napkin prevails over its digital counterparts.

There is no software category to combine modes of operating helpful to deal with visual exchange of ideas. When providing software support we’re stretched between communicators, whiteboards, video editors and knowledge repositories.The best we can do as of yet is to label the area by the tag words ‘divergent thinking,’ using the term coined by J.P. Guilford, an early researcher of creativity.

A missing software category

An american psychologist, Joy Paul Guilford, was trying to create a model for intellect in the fifties, but in doing so also created a model for creativity. He came out of the process with one-hundred and eighty different mental tasks, and by identifying vital components of creativity he managed to label them as ‘divergent thinking.’ There are twenty-four of them. These thought processes allow one to explore many possible solutions in a spontaneous, free-flowing, non-linear manner. They are often used in conjunction with their cognitive counterpart — convergent thinking, which employs logical steps to arrive at one “correct” solution. Divergent and convergent thinking are often used together as two parts of the whole.

This can be seen in the ‘double diamond’ process introduced by The Design Council. This process maps the divergent and convergent stages of using different modes of thinking. What is common is that divergent thought initiates and opens new possibilities in order to then apply logic and scrutiny to narrow and refine the scope. Yet the objective of the ‘divergent’ is to keep one’s perspectives wide to allow for a broad range of ideas and influences. LEGO refers to this stage of the process as ‘Exploring,’ Microsoft calls it ‘Understand,’ while Starbucks has coined the term ‘Concept Heights.’ None of these companies used catalog of Computer-Aided Divergent Thinking software to choose their tools!

Why the predominance of the narrow instead of the open-ended?

The software supporting this mode of operation is misrepresented. The current best options for the business application of software-supported tinkering have more to do with video conferencing and knowledge repositories instead of a digital alternative to the napkin. Yet those very tools currently support team learning, a process where we’re meant to “think together” and discover insights not attainable individually. It seems clear that the process is less about group conversation or categorizing articles. It’s more about providing a means to suspend assumptions and enter into a genuine process of the expression of ideas — by visual or other (any) means.

Why the predominance of the narrow instead of the open-ended?

The reason why software largely supports convergent instead divergent thinking goes back to the mathematical rigidness making up the foundation of all digital tools. So far, any logic-driven accumulation of rules works well. Anything loose, fuzzy, and outside the logic or the reality of math was out. This lack of space for ambiguity initially made it possible to produce software to support processes composed of known parts, rather than of those yet to be defined. Although it’s hard to model unconstrained concepts using a rigid stack of technologies, it is still not impossible! Since the fifties brave attempts made it possible to pave the way to our current set of machine learning technologies which can successfully escape the rigidness of the classical algorithmic foundation. They do that by the use of a set of solutions designed to address problems where no clear direction is provided. And if we stretch our imagination, as some believe, we may eventually arrive at a less-messy artificial reasoner capable of making order out of our mess. But even if we do, how would you want to have clues and suggestions provided to you by this hypothetical artificial reasoner? As a single logical narrative? Or would you prefer to take suggestions from digital overlords as lists of options for the sake of better understanding?

It seems there’s no escape from the divergent and it leads us again to tools for the exploration and communication of ideas, no matter where those ideas originate. The good news is that clues for business exist in the education sector.

What can Education teach Business?

Strong support for visual literacy is provided before school begins, as those skills are learned before our first exposure to written text as they are understood as advantageous to children’s comprehension and cognitive abilities. Drawing is not only used as a teaching device, it also provides a means of orchestrating a conversation with yourself. Putting thoughts down allows students to step outside of themselves and tap into our visual system to see and understand relations. We thus expand our thinking by distributing it between conception and perception, engaging both simultaneously. We draw not to transcribe ideas from our heads but to organize them in search of greater understanding. And understanding is what education is all about. The use of the visual is still supported by the deployments of devices and tablets equipped with whiteboard software. We experience this first hand by providing one of these solutions, the Explain Everything platform. The same trend has recently been supported by Microsoft with their interactive whiteboard for Education. Inherently, visual processes that support differentiation and drawing connections are helpful also beyond education.

Divergent Thinking in Business

The areas supported by software in business revolve mostly around planning and execution. The end-result of both processes is to narrow available options. The more narrowing the result, the more convergent modes of thinking are required. Good decisions, however, require multiple inputs. Also, many workshops and nearly all business meetings with new prospects provide an opportunity to explore or learn; Due to their nature, divergent thinking is the way to proceed. Therefore, multiple views, not just a single view, are more helpful. The perception of a business opportunity or a singular solution to a complex problem can be constructed from a multitude of views. Yet to postpone judgement and gather different opinions requires the open-ended, the joyful, the divergent. Anchoring our viewpoint would interrupt the dynamic of relations between participants.

This process of reunited thinking, the playful and the visual into one mode of operation that is divergent by nature. The process is supported by notepads, mindmaps, whiteboards and communicators. We already use tools that go beyond text and incorporate not only emoticons but also other novel means of expression. The innovation that already happens in business software is fragmented for anyone that might expect a complete package to support divergent-thinking. However, as Robert Green puts it — creativity is in the mud, the dirt, and the grime of daily life. When we introduced our whiteboard to business to support the creatives, we followed the need, not a finished list of expectations. The variety of these is what can be coined as the definition of a new software category. My guess is that future divergent-thinking aids will be a crossroads of collaborative communicators, interactive whiteboards and repositories designed to hold visually expressed knowledge, provided both by audio and video. This might influence or merge with core resource planning systems and become a layer of communication enhancing their adoption and usefulness. What we experienced when introducing an EDU-originated interactive whiteboard to enterprises is the unconstrained collaborative setting that can also be fun for business oriented customers. The first thing most participants did after recognizing their lack of constraints was to play. The second thing they noticed is that it’s easier to create meaning when barriers of expression are taken down.

The early adopters in business are the ones responsible for inside training, sales enablement or customer success departments, where the need to improve communication is most obvious. The need for new software support is already recognized by the professionals that understand innovative methodologies well. Meeting some of them might already be a future shock. Such as when Pierre-Denis Autric, Culture Hacker at CapGemini, entered a meeting supplied with a set of iPads and a simple goal to immerse participants in the digital experience of the collaborative whiteboard instead of using flipcharts.

Using joy as a means to increase efficiency seems the right way for playfulness to be acknowledged as an important factor in enterprise. After all, it is a way to make our work more rewarding. When Alan Watts spoke of the transformation of work into play, he surely had no software in mind. Yet software is precisely were his vision can be achieved. It’s already transformed entertainment. However, the successful transformation of business could have a far more powerful impact for several reasons. Not only could the perception of business change but also, due to the more exuberant nature of meetings — more meaning can be generated. By allowing for more visual means for expression, we might find more creatives in our teams that have perhaps been muted by the textual, and who know how to draw a message well and, by that method, connect with a fresh dose of authenticity. And finally, if what we do makes more sense, it will, in the long run, benefit the organizations we work for. Perception, in the end, is not mere decoration but an integral and fundamental partner in making meaning.