The Modular Mind Blog

The Modular Mind Blog

Created by Yuwei Sun


Memory in Language Model-Enabled Agents

January 06, 2024

Language models emerge as potential planners and world models for agents in virtual environments. This post delves into the unique capabilities of LLMs for decision-making and environmental understanding within simulated worlds.


Memory: Natural and Artificial

October 01, 2023

Intelligent agents use two learning systems - neocortex for structured knowledge and hippocampus for rapid experience learning. Hippocampus acts as an intermediary, preserving memories without disrupting neocortical knowledge. This post provides a literature review on natural memory to support our recent study on the Associative Transformer.


Global Workspace Theory and System 2 AI - Part Ⅱ

July 21, 2023

Global Workspace Theory (GWT) proposes that the brain operates as a network of specialized modules, with a central "global workspace" where selected information is integrated and broadcast. This post is about the alignment between the GWT and Transformer models, with a specific focus on the attention mechanism.


On the Versatility of Transformers

May 12, 2023

Transformers are a type of neural network architecture that can model long-term dependencies and process sequences in parallel on GPUs. They lack inherent biases, making them flexible but requiring large amounts of training data to generalize well.


Neural Network Approaches to the Binding Problem

March 14, 2023

Contemporary neural networks struggle with flexible and dynamic information binding, and tend to learn surface statistics instead of underlying concepts. Attractor dynamics, such as Hopfield networks, are an approach to addressing representational dynamics in binding, using multiple stable equilibria that correspond to different decompositions of a scene.


Continual Learning, Compositionality, and Modularity - Part Ⅱ

January 10, 2023

Modular Neural Networks introduce sparsity in neuron connections between neural network layers. MNNs are divided into smaller, more manageable parts, they can be trained more efficiently and with fewer resources than a traditional, monolithic neural network.


Continual Learning, Compositionality, and Modularity - Part Ⅰ

December 22, 2022

Continual learning refers to the ability of an AI system to learn and adapt to new information and experiences over time, without forgetting previous knowledge. To achieve lifelong learning in AI systems, compositionality and modularity in neural networks have been intensively studied to overcome the challenges of catastrophic forgetting and data efficiency.


Global Workspace Theory and System 2 AI - Part Ⅰ

September 01, 2022

The global workspace theory demonstrates that in the human brain, multiple neural network models cooperate and compete in solving problems via a shared feature space for common knowledge sharing, which is called the global workspace (GW). Conscious attention selects which module and conscious content is gated through and remains available shortly in working memory.


Visual Question Answering

July 10, 2022

Visual Question Answering (VQA) is a common problem in multimodal machine learning. Given an image and a question about the contents of the image, the VQA model is trained to answer the question correctly. These questions require an understanding of vision, language, and commonsense knowledge to answer. Several approaches such as Attention have been employed in VQA.


Domain Shift and Transfer Learning

May 31, 2022

One of the most challenging problems in decentralized AI is to improve the generality of the global model based on client data from different data domains. They are usually used for the same classification task but with particular sample features due to different data collection conditions of clients.


Self-Supervised Learning and Multimodal Learning

April 12, 2022

Information in the real world usually comes as different modalities. When searching for visual or audio content on the web, we can train a model leveraging any available collection of web data and index that type of media based on learned multimodal embeddings.