Learning Language by Exploration Agent develops language skills through simulated exploration tasks

Published

Mar 13, 2024

Reading time

3 min read

Machine learning models typically learn language by training on tasks like predicting the next word in a given text. Researchers trained a language model in a less focused, more human-like way.

What’s new: A team at Stanford led by Evan Zheran Liu built a reinforcement learning agent that learned language indirectly by learning to navigate a simulated environment that provides text clues.

Key insight: Reinforcement learning agents learn by discovering actions that maximize rewards. If the training environment provides text that explains how to achieve the highest reward, an agent will benefit by learning to interpret written language. That is, learning to comprehend written instructions will correlate with success in maximizing rewards.

How it works: The authors built a series of simulated two-dimensional environments using Minigrid, a reinforcement learning library that contains grid-world environments. They trained the agent to find a particular room according to the DREAM reinforcement learning algorithm.

The authors designed a two-dimensional layout of rooms connected by hallways. The layout included 12 rooms, each painted in one of 12 colors that were assigned randomly. A consistent location held instructions for finding the blue room.
The authors created many variations of the layout by reassigning the colors and updating the text instructions for finding the blue room. The instructions were either direct (for instance, “the second office in the third row”) or relative (“right of the first office in the second row”).
The agent received a reward when it found the blue room and a penalty for each time step. At each time step, it received a subset of the office environment (a 7-by-7 grid in its direct line of sight) and could take one of several actions (turn left or right, move forward, or open or close a door). When it reached the location that held the instructions, it received an image of the text. It continued to explore for a set time or until it found the blue room.

Results: The authors tested the agent’s ability to generalize to text it had not encountered in training: They trained the agents on layouts that excluded text that described the blue room as the “third office in the second row” and tested it on layouts that included these words. The agent found the blue room every time without checking every room. They also tested the agent in layouts where the hallways were twice as long as in the training set. It always found the blue room. To determine whether the agent understood individual words in the instructions, the authors collected its embeddings of many instructions and trained a single-layer LSTM to extract the instructions from the embeddings. The LSTM achieved a perplexity (a measure of the likelihood that it would predict the next word of instructions that were not in its training data, lower is better) of 1.1, while a randomly-initialized network of the same architecture achieved 4.65 perplexity — an indication that the agent did, indeed, learn to read individual words.

Yes, but: The choice of reinforcement-learning algorithm was crucial. When the authors replaced DREAM with either RL² or VariBAD), the agent did not learn language. Instead, it learned to check all the doors.

Why it matters: The discovery that reinforcement-learning agents can learn language without explicit training opens avenues for training language models that use objectives different from traditional text completion.

We’re thinking: The authors focused on simple language (instructions limited to a few words and a very small vocabulary) that described a single domain (navigating hallways and rooms). There's a long road ahead, but this work could be the start of a more grounded approach to language learning in AI.

Subscribe to The Batch