The simplest form of memory for language models involves limiting the number of tokens it can store. If the conversation history becomes excessively long and exceeds the model's token limit, earlier messages may be truncated, potentially leading to a loss of context. Scaling up the input context length has been explored in LLM studies; for example, GPT-3 increases the input length from 1k in GPT-2 to 2k tokens. However, this approach typically results in computation-intensive training, constrained by the quadratic computation complexity of self-attention.
MemoryBank enables the models to recall relevant memories, continually evolve through continuous memory updates, and adapt to a user’s personality over time by summarizing information from previous interactions.
Language Models Augmented with Long-Term Memory (LONGMEM) enables LLMs to memorize long history. A decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memory retriever.
There are three key components: the frozen backbone LLM, SideNet, and Cache Memory Bank:
Reflexion uses verbal reinforcement to help agents learn from prior failings. Reflexion converts binary or scalar feedback from the environment into verbal feedback in the form of a textual summary, which is then added as additional context for the LLM agent in the next episode.
This work simulates 25 agents for 2 days in a simulated sandbox environment. The agents interact with the world through their actions and engage in natural language communication with each other. Social dynamics unfold among multiple agents.
Success in this simulation necessitates an approach that can retrieve relevant events and interactions over an extended period, reflect on those memories to generalize and draw higher-level inferences, and apply that reasoning to formulate plans and reactions that make sense both in the current moment and in the longer-term trajectory of the agent's behavior.
To construct an agent, a memory stream is employed to record, in natural language, a comprehensive list of the agent's experiences. Based on their perceptions, the architecture retrieves relevant memories and utilizes those retrieved actions to determine subsequent actions. These retrieved memories also contribute to the formation of longer-term plans and the generation of higher-level reflections, both of which are incorporated into the memory stream for future reference.
There are many possible implementations of a retrieval function, depending on what is important for the agent to consider when deciding how to act. One effective approach is to directly ask the language model to output an integer score.
On the scale of 1 to 10, where 1 is purely mundane (e.g., brushing teeth, making bed) and 10 is extremely poignant (e.g., a break up, college acceptance), rate the likely poignancy of the following piece of memory.
Memory: buying groceries at The Willows Market and Pharmacy
Rating: <fill in>
Reflections are higher-level, more abstract thoughts generated by the agent. They are included alongside other observations during retrieval. Reflections are generated periodically, roughly two or three times a day. Then, we prompt the language model to extract insights and cite the particular records that served as evidence for the insight.
First, synthesizing an increasingly larger set of memory not only posed a challenge in retrieving the most relevant pieces of information but also in determining the appropriate space to execute an action, given the increasing number of locations that the agent learned about. As a result, some agents chose less typical locations for their actions, potentially making their behavior less believable over time.
Second, erratic behaviors caused by misclassification of what is considered proper behavior, especially when the physical norms of certain locations that are hard to convey in natural language did not percolate to the agents. The instruction tuning also seemed to make the agents overly cooperative with one another.
[1] Wanjun Zhong, Lianghong Guo, Qiqi Gao, and et al. Memorybank: Enhancing large language models with long-term memory. arXiv:2305.10250, 2023.
[2] Weizhi Wang, Li Dong, Hao Cheng, and et al. Augmenting language models with long-term memory. arXiv:2306.07174, 2023.
[3] Noah Shinn, Beck Labash, and Ashwin Gopinath. Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv:2303.11366, 2023.
[4] Joon Sung Park, Joseph C. O’Brien, Carrie Jun Cai, and et al. Generative agents: Interactive simulacra of human behavior. In UIST, pages 2:1–2:22. ACM, 2023.