Memory in Language Model-Enabled Agents

Memory in Language Model-Enabled Agents

Created by Yuwei Sun
All posts


Language models emerge as potential planners and world models for agents in virtual environments. This post delves into the unique capabilities of LLMs for decision-making and environmental understanding within simulated worlds.

Memory of Language Models

The simplest form of memory for language models involves limiting the number of tokens it can store. If the conversation history becomes excessively long and exceeds the model's token limit, earlier messages may be truncated, potentially leading to a loss of context. Scaling up the input context length has been explored in LLM studies; for example, GPT-3 increases the input length from 1k in GPT-2 to 2k tokens. However, this approach typically results in computation-intensive training, constrained by the quadratic computation complexity of self-attention.


Language Model-Enabled Agents

MemoryBank: Enhancing Large Language Models with Long-Term Memory [1]

MemoryBank enables the models to recall relevant memories, continually evolve through continuous memory updates, and adapt to a user’s personality over time by summarizing information from previous interactions.

Fig.1 - Overview of the MemoryBank [1].

Augmenting Language Models with Long-Term Memory [2]

Language Models Augmented with Long-Term Memory (LONGMEM) enables LLMs to memorize long history. A decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memory retriever.

Fig.2 - Overview of the memory caching and retrieval flow of LONGMEM. The long text sequence is split into fix-length segments, then each segment is forwarded through large language models and the attention key and value vectors of $m$-th layer are cached into the long-term memory bank. For future inputs, via attention query-key based retrieval, the top-$k$ attention key-value pairs of long-term memory are retrieved and fused into language modeling [2].

There are three key components: the frozen backbone LLM, SideNet, and Cache Memory Bank:


Reflexion: Language Agents with Verbal Reinforcement Learning [3]

Reflexion uses verbal reinforcement to help agents learn from prior failings. Reflexion converts binary or scalar feedback from the environment into verbal feedback in the form of a textual summary, which is then added as additional context for the LLM agent in the next episode.

Fig.3 - Diagram of Reflexion [3].

Generative agents: Interactive simulacra of human behavior [4]

This work simulates 25 agents for 2 days in a simulated sandbox environment. The agents interact with the world through their actions and engage in natural language communication with each other. Social dynamics unfold among multiple agents.

Fig.4 - The simulated sandbox environment [4].

Success in this simulation necessitates an approach that can retrieve relevant events and interactions over an extended period, reflect on those memories to generalize and draw higher-level inferences, and apply that reasoning to formulate plans and reactions that make sense both in the current moment and in the longer-term trajectory of the agent's behavior.

To construct an agent, a memory stream is employed to record, in natural language, a comprehensive list of the agent's experiences. Based on their perceptions, the architecture retrieves relevant memories and utilizes those retrieved actions to determine subsequent actions. These retrieved memories also contribute to the formation of longer-term plans and the generation of higher-level reflections, both of which are incorporated into the memory stream for future reference.

Fig.5 - The long-term memory system of the agent [4].

Memory retrieval

There are many possible implementations of a retrieval function, depending on what is important for the agent to consider when deciding how to act. One effective approach is to directly ask the language model to output an integer score.

On the scale of 1 to 10, where 1 is purely mundane (e.g., brushing teeth, making bed) and 10 is extremely poignant (e.g., a break up, college acceptance), rate the likely poignancy of the following piece of memory.

Memory: buying groceries at The Willows Market and Pharmacy

Rating: <fill in>


Reflection

Reflections are higher-level, more abstract thoughts generated by the agent. They are included alongside other observations during retrieval. Reflections are generated periodically, roughly two or three times a day. Then, we prompt the language model to extract insights and cite the particular records that served as evidence for the insight.

Findings

First, synthesizing an increasingly larger set of memory not only posed a challenge in retrieving the most relevant pieces of information but also in determining the appropriate space to execute an action, given the increasing number of locations that the agent learned about. As a result, some agents chose less typical locations for their actions, potentially making their behavior less believable over time.

Second, erratic behaviors caused by misclassification of what is considered proper behavior, especially when the physical norms of certain locations that are hard to convey in natural language did not percolate to the agents. The instruction tuning also seemed to make the agents overly cooperative with one another.





[1] Wanjun Zhong, Lianghong Guo, Qiqi Gao, and et al. Memorybank: Enhancing large language models with long-term memory. arXiv:2305.10250, 2023.

[2] Weizhi Wang, Li Dong, Hao Cheng, and et al. Augmenting language models with long-term memory. arXiv:2306.07174, 2023.

[3] Noah Shinn, Beck Labash, and Ashwin Gopinath. Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv:2303.11366, 2023.

[4] Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, and et al. Generative agents: Interactive simulacra of human behavior. In UIST, pages 2:1–2:22. ACM, 2023.