In the course of writing this article, I received occasional creative input from my nine month old daughter babbling in her playpen. She squeals at a curious cat who has ventured too close, but quickly loses interest and decides to continue her work of using her tiny fingers to pry up the foam alphabet letters that make the floor of her play enclosure soft in case of falls. Such tumbles are becoming less frequent by the day as small movements in her tiny muscles more expertly reinforce her ability to remain upright. As a daily hobby in parallel with her own language development, I have decided to start learning Greek under the watchful eye of a certain green digital owl. This has led me to consider the state of machine learning in the field of natural language processing and how deep learning models in this field are using reinforcement learning strategies to score the quality of models.
Natural language processing (NLP) is concerned with the automated understanding of human speech and writing as well as generating meaningful examples of such. It is hard to imagine a person who hasn’t yet had a surprising, frustrating or surprisingly frustrating encounter with Amazon’s Alexa, Apple’s Siri, Microsoft’s Cortana or Google’s voice assistant. Anyone who remembers diagramming sentences in grade school grammar can attest that human languages have many tricks to break general rules and require a lot of time and examples to master. Many of the cutting edge technologies in NLP are powered by a type of machine learning called deep learning.
Deep learning models are best described in brief as a network of layers where decisions on how important specific connections between layer elements are driven by data used to train the model to accomplish some task. Usually models are characterized by the number of layers they have (depth) in addition to the number of parameters they have to finesse to achieve good accuracy. A model to distinguish pictures of dogs from cats might make decisions on if the pupil of the subject’s eye is slitted (definitely a cat) or round (all dogs, but it could be a cat under some circumstances), and this would map back in a convoluted fashion to the weights given to different nodes in some underlying network. However, not all decision criteria are easily human interpretable, which leads to difficulty in determining potential ethical issues like racial bias. Deep learning in speech is big business as well, with Microsoft recently acquiring the AI speech tech company Nuance, who are best known for their Dragon speech-to-text application, for $19.7 billion dollars. One of the hallmark examples of deep learning research progress in NLP is OpenAI’s GPT-3 which follows up on the hype of the GPT-2 (once described by its originators as “too dangerous to release”) by increasing the number of parameters and depth of their previous GPT-2 deep learning model. In many key ways, GPT-3 is just the GPT-2 paradigm trained on a much larger data set. Importantly, GPT-3 is considered by some to be a general solution for NLP tasks not yet encountered if given enough data, but it has high computational demands and it is expensive to train such models. Additionally, the human understanding of the model’s decision making process is an area of active research.
Reinforcement learning is a machine learning paradigm which seeks to maximize some function or score in the course of the learner working in a new task. This stands in contrast to alternatives like supervised learning where the model is given curated examples that meet some criteria (pictures labeled “cat” or “dog” in the above example) or unsupervised learning where the model uses unlabeled data to group similar examples on features to be interpreted later. An analogy for reinforcement learning often given is the process of tweaking the corrective motions required to balance an object like an upright pencil in the palm of your hand. A variant of the Google DeepMind AlphaGo digital Go player, AlphaGo Zero uses reinforcement learning, and similar models were demonstrated to do well in learning how to play a selection of old Atari video games. OpenAI has recently used GPT-3 alongside reinforcement learning to deliver high quality automated summarized text.
The implications for society given these advances in NLP are readily apparent. For example, this technology could help researchers to wrangle the overwhelmingly large body of scientific literature to address the “reproducibility crisis” by finding studies that potentially represent weak links that might suggest further scrutiny or reanalysis. There are also examples in the fields of drug discovery and repurposing. Of course, there are dark sides to this new technology as well, including the future capability to conduct deliberate disinformation campaigns using social media to drive political divisions. These problems have already appeared in image and video in the form of “deep fakes” powered by similar deep learning technology, which fools the finely tuned human vision systems into thinking they have seen a political figure or celebrity say or do something compromising or graphic that would have previously required an expensive team of digital special-effects artists to achieve. You may have argued (though perhaps only in your head) with someone spouting lies about the COVID-19 pandemic who is actually a simple chat-bot powered by relatively simple machine learning algorithms from an earlier era. Indeed, humans have fallen for easily created digital fakes of textual communication for decades. ELIZA, one of the earliest chatbots, alarmed its creator Joseph Weizenbaum in the 1960s when his secretary would ask for privacy to have a discussion with the “digital psychotherapist” over a simple text interface. It is easy to imagine the potential upheaval caused by chatbots interacting maliciously with social media bubbles. However, I prefer to end on a more positive note where I imagine a future in which my daughter has just finished reading the latest book in her favorite series and finds herself to be just one click away from automatically generating a story that she will enjoy while simultaneously appreciating the piece as computational artifice.
Peer edited by Jamshaid Shahir