The global workspace theory [1] demonstrated that in the human brain, multiple neural network models cooperate and compete in solving problems via a shared feature space for common knowledge sharing, which is called the global workspace (GW).
GW is closly connected to System 1 and System 2 AI [2]. System 2 cognition is thought to reject, alter, or overcome impressions, intuitions, feelings, and reactive tendencies that are issued by System 1. System 2 can also fully endorse and select from the bottom-up inputs provided by System 1. System 1 cognition translates to all automatic, over-learned processes that may engage relatively local specialized networks. In contrast, System 2 cognition would engage effortful cognitive processes that rely on more distributed processing of collective networks and flexible interactions between these specialized neural systems. Conscious attention selects which module and conscious content is gated through this bottleneck and remains available shortly in working memory [3].
Within the global workspace theory, using different kinds of metadata about individual neural networks such as measured performance and learned representations, shows the potential to learn, select, or combine different learning algorithms to efficiently solve a new task. The learned knowledge or representations from different neural network areas are leveraged for reasoning and planning. This research area studies how a meta agent can solve novel tasks by observing and leveraging the world models built by these individual neural networks. Common sense is not just facts but a collection of models of the world.
The proliferation of AI applications is reshaping the contours of the future knowledge graph of neural networks. Decentralized NNs is the study of knowledge transfer from different individual neural networks trained on separate local tasks to a global model. In a learning system comprising many replica neural networks with similar architecture and functions, the goal is to learn a global model that can generalize to unseen tasks without large-scale training [4]. Sometimes, we call the different replica models the local models. In particular, two piratical problems in Decentralized NNs are being intensively studied, i.e., learning with non-independent and identically distributed (non-iid) data and multi-domain data.
Notably, non-iid refers to data samples across local models are not from the same distribution, which hinders the knowledge transfer between local models. To tackle the non-iid problem, we proposed the Segmented-Federated Learning (Segmented-FL) [5] that employs periodic local model performance evaluation and learning group segmentation that brings neural networks training over similar data distributions together. Then, for each group, Segmented-FL trains a different global model by transferring knowledge from the local models in the group. The global model can only passively observe the local model performance without access to the local data. Segmented-FL can achieve better performance in tackling non-iid data compared to the traditional federated learning [6].
On the other hand, multi-domain refers to data samples across local models are from different domains with domain-specific features. For example, an autonomous vehicle learns to drive in a new city might leverage the driving data of other cities learned by different vehicles. Since different cities have different street views and weather conditions, it would be difficult to directly learn a new model based on the knowledge of the models trained on multi-domain data. This problem is closely related to multi-source domain adaptation, which studies the distribution shift in features inherent to specific domains that brings in negative transfer degrading a model's generality to unseen tasks. A detailed blog on Domain Shift and Transfer Learning.
Hierarchical neural networks consist of multiple neural networks concreted in a form of an acyclic graph. An early theory of the global workspace theory (GWT) [1] refers to multiple neural network models cooperating and competing in solving problems via a shared feature space for common knowledge sharing. Built upon the GWT, the conscious prior theory [7] demonstrated the sparse factor graphs in space of high-level semantic variables and simple mapping between high-level semantic variables. The hierarchy of neural networks usually comprises two learning frameworks, i.e, fast learning and slow learning [2]. The fast learning framework comprises different individual modules while the slow learning framework is more like an attention mechanism for long-term planning. In particular, Homogeneous learning [8] introduced a self-attention mechanism where a local model is selected as the meta for each training round and leverages reinforcement learning to recursively update a globally shared learning policy. The meta observes the states of local models and its surrounding environment, computing the expected rewards for taking different actions based on the observation. As mentioned in [9], with a model of external reality and an agent's possible actions, it can try out various alternatives and conclude which is the best action using the knowledge of past events. The goal is to learn an optimized learning policy such that the Decentralized NNs systems can quickly solve a problem by planning and leveraging different local models' knowledge more efficiently. The results showed that the learning of a learning policy greatly reduced the total training time for a given classification task by 50.8%.
Information in the real world usually comes in different modalities. The degeneracy[10] in neural structure refers to any single function can be carried out by more than one configuration of neural signals and different neural clusters participate in several different functions. Intelligence systems build models of the world with different modalities where spatial concepts are generated via different modality models. The cross-modal learning in multimodal models such as the Visual Question Answering (VQA) problem can be tackled with approaches such as self-supervised learning [11]. For instance, UniCon [12] leverages the contrastive learning of different model components to align the modality representations encouraging the similarity of the relevant component outputs while discouraging the irrelevant outputs. Such that the learning framework learns better-refined cross-modal representations for unseen VQA tasks based on the knowledge learned from different VQA tasks of local models. A detailed blog on Self-Supervised Learning and Multimodal Learning.
[1] Baars, B. J. 1988. A Cognitive Theory of Consciousness. Cambridge University Press.
[2] Kahneman, D. 2011. Thinking, Fast and Slow. In New York: Farrar, Straus and Giroux.
[3] 2021. How does hemispheric specialization contribute to human-defining cognition? Neuron, 109(13): 2075–2090.
[4] Sun, Y.; Ochiai, H.; and Esaki, H. 2021. Decentralized Deep Learning for Multi-Access Edge Computing: A Survey on Communication Efficiency and Trustworthiness. In IEEE Transactions on Artificial Intelligence.
[5] Sun, Y.; Ochiai, H.; and Esaki, H. 2020. Intrusion Detection with Segmented Federated Learning for Large-Scale Multiple LANs. In IJCNN.
[6] McMahan, B.; Moore, E.; Ramage, D.; and et al. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In AISTATS.
[7] Bengio, Y. 2017. The Consciousness Prior. In arXiv. Craik, K. 1967. The Nature of Explanation. In CUP Archive.
[8] Sun, Y.; and Ochiai, H. 2022. Homogeneous Learning: Self-Attention Decentralized Deep Learning. In IEEE Access, volume 10, 7695–7703.
[9] Craik, K. 1967. The Nature of Explanation. In CUP Archive.
[10] Linda B. Smith and Michael Gasser. The development of embodied cognition: Six lessons from babies. 2005.
[11] Alec Radford, Jong Wook Kim, Chris Hallacy, and et al.. Learning transferable visual models from natural language supervision. 2021.
[12] Sun, Y.; and Ochiai, H. 2022. UniCon: Unidirectional Split Learning with Contrastive Loss for Visual Question Answering. In arXiv preprint.
[13] Ashish Vaswani, Noam Shazeer, Niki Parmar, and et al. Attention is all you need. In NeurIPS, 2017
[14] Sarthak Mittal, Alex Lamb, Anirudh Goyal, et al. Learning to combine top-down and bottom-up signals in recurrent neural networks with attention over modules. ICML 2020.
[15] Anirudh Goyal, Alex Lamb, Jordan Hoffmann, et al. Recurrent independent mechanisms. ICLR 2021.
[16] Anirudh Goyal, Aniket Rajiv Didolkar, Alex Lamb, et al. Coordination among neural modules through a shared global workspace. ICLR 2022.
[17] Dianbo Liu, Alex Lamb, Kenji Kawaguchi, et al. Discrete-valued neural communication. NeurIPS 2021.