Cognitively-Inspired Foundation Model Pre-training; Enlightenment or Extinction? The Open and Closed Source Divide in Artificial Intelligence Research

Gladstone, Alexi, School of Engineering and Applied Science, University of Virginia
Seabrook, Bryn, EN-Engineering and Society, University of Virginia
Elliott, Travis, EN-Engineering and Society, University of Virginia
Li, Jundong, EN-Elec & Comp Engr Dept, University of Virginia

The field of Artificial Intelligence (AI) has experienced tremendous growth within recent years. Technologies such as ChatGPT and Midjourney have been used by millions of people and shaped the way organizations work. Simultaneously, several consulting companies have forecasted that many jobs within the next couple of years will be automated by AI. These changes caused by current AI capabilities have brought forth several questions from the public regarding the future of society under ever-increasing AI capabilities. Therefore, over the last year my goal has been to research the future landscape of AI from two perspectives—from the Science Technology, and Society (STS) perspective and from the technical perspective. This desire to understand the future of AI, under the context of AI research, has served as a central theme in both my Technical Capstone as well as in my STS Research Paper. Particularly, my STS Research Paper has focused on understanding a divide in the nature of AI research. This serves to help understand the nature of how AI research will be conducted. Simultaneously, my Technical Capstone has researched a future paradigm for training AI models that are highly capable—providing insight into what future AI may be capable of doing.

Capstone Project Summary:
Note: The following paragraph is adapted from the Technical Capstone Report of Alexi Gladstone which is not currently available publicly and thus unavailable to be cited.

One of the primary techniques used to pre-train foundation models is prediction in the output space of the next element of an autoregressive sequence. However, this differs from human cognition in three respects. First, when humans make predictions about the future, these shape internal cognitive processes. This characteristic is not possible with traditional autoregressive models. Second, humans naturally evaluate the strength of predictions regarding future states. This capability is also impossible with traditionally trained autoregressive models. Third, humans naturally leverage a dynamic amount of time when making predictions about the future—which traditionally trained autoregressive models cannot do internally. These differences between human cognition and current autoregressive models motivated the exploration of an architecture and training paradigm that could allow these fundamental limitations of modern models. Consequently, we trained an Energy-Based Model with a Transformer architecture to predict the compatibility of a current prediction and the existing autoregressive sequence. Training models in this manner gives them the capability to achieve all three aspects of human cognition discussed, providing a fascinating new path towards training highly capable AI.

STS Research Paper Summary:
The capabilities of Artificial Intelligence (AI) have taken the world by storm, and technologies such as ChatGPT have become a household name. For a long period of time, AI was purely a focus of academic research, having little to no commercial value due to its lack of performance. However, this has recently changed with the surprising new capabilities of models, sending the world into a frenzy about the commercial opportunities, the ongoing legal battles for control, and the potential long-term impact of AI to offset human labor. Central to several of these events is a divide regarding the nature of how AI research should occur—known as the “Open and Closed Source Divide in Artificial Intelligence”. On one side of this divide is people who believe AI research should continue in an open manner, following the trend of a lot of the most popular software in the world. Alternatively, people are worried about the potential safety of highly intelligent AI models. This danger is used as motivation to discontinue open research, with the goal of ensuring future super-intelligent AI will have a net-positive impact on society. As evidenced by this argument, and embedded within many discussions, is a view of intelligent AI as an inherent future technology that will be developed regardless of what humans do. As
such, technological determinism is used to analyze the divide, providing insight into the nature of AI development and its potential long-term impact on science, technology, and society.

Concluding Reflection:
The pursuance of both the technical capstone project as well as the STS research paper simultaneously have provided a duality of perspectives from which to analyze the future of AI research. On one hand, the technical project enabled me to delve deep into the current capabilities of AI–giving me a fine-grained technical perspective on what the current technologies are capable of now and what they may be capable of in the future. Concurrently, the pursuance of the STS research paper has given me a more macroscopic perspective on AI research in general, helping me understand what trajectory the landscape of AI research is on. Together, these perspectives have provided me with a holistic understanding of AI research that I will use to help guide my future career in AI.

BS (Bachelor of Science)
Artificial Intelligence, Open Source, Divide, Research

School of Engineering and Applied Science

Bachelor of Science in Computer Science

Technical Advisor: Jundong Li

STS Advisors: Bryn Seabrook, Travis Elliott

Technical Team Members: Ganesh Nanduru, Mofijul Islam, Aman Chadha, Jundong Li, Tariq Iqbal

Issued Date: