“While many of the iconic foundation models at the time of writing are language models, the term language model is simply too narrow for our purpose: as we describe, the scope of foundation models goes well beyond language.”

Rishi Bommasani et al in On the Opportunities and Risks of Foundation Models paper

With the advent of ChatGPT and large language models (LLMs), there has been an exponential interest in generative artificial intelligence (Gen AI), or “AI 2.0”. With this surge of curiosity about generative AI in medicine and healthcare, there is ongoing discussion, debate, and also understandable confusion about two related AI tools: foundation models and multimodal generative AI.

Foundation Models


These models, also known as general purpose artificial intelligence (GPAI) models, are large-scale generative AI models that are pre-trained (with self-supervised learning with billions of parameters) on vast and diverse unlabeled datasets; this model can then be highly adaptable to perform a wide range of many downstream applications (such as text synthesis, image manipulation, audio generation, or video creation). It should be noted that these foundation models do require some supervised fine-tuning. The researchers at the Stanford Center for Research on Foundation Models (CRFM), an interdisciplinary initiative from the Stanford Institute for Human-Centered Artificial Intelligence (HAI), coined the term in 2021.

The usage of the term foundation for this model is to designate that these models serve as a general purpose template upon which more specialized applications can be adapted from fine-tuning for other tasks. The foundation models utilize transfer learning so that learned patterns can go from one task to another. The popular ChatGPT is a large language model (LLM) that falls into this category of foundation models. In short, foundation models are characterized by generality and adaptability, and some argue represent the beginning of artificial general intelligence (AGI). It should be noted here that generative adversarial networks (GANs), the AI model that was promulgated by Ian Goodfellow a decade ago in which two neural networks contest with each other, are generative AI tools that are not new nor built on top of these foundation models.


Examples of foundation models include: OpenAI’s GPT-4 (used by Microsoft’s Bing Chat and OpenAI’s ChatGPT) as well as DALL-E 2, Google’s BERT (bidirectional encoder representations from transformers), Anthropic’s Claude, and Midjourney’s Midjourney 5.1.

Foundation Models in Healthcare

The use of foundation models in healthcare holds great promise as these models are closer to how clinicians practice medicine and healthcare on a day-to-day basis. Thus far, only unimodal large language models like ChatGPT and other LLM congeners that are more biomedically-focused using EHR have had significant deployment in healthcare with mixed performance. In the near future, multiple data sources such as electronic health records (EHR) including CPT/ICD codes, laboratory data, and unstructured notes as well as medical images can be the input for these foundation models, and the multiple data sources can lead to outputs (chart summarization, image analysis, risk stratification, etc) that would be very valuable for the clinician without the tedious process of labeling (as the pre-training process is self-supervised).

The foundation model in healthcare is also valuable for its robust adaptability to new clinical situations so that generalizability of foundation models should be improved over prior AI models. It is also hoped that this innovative combination of self-supervised learning and wide adaptability can result in significant cost savings and manpower reduction in creating and maintaining these AI models. Finally, a continuing synergy between AI and clinicians can be included in foundation models by embedding a natural language processing interface that can enriched with domain expertise.

These insights and discussions on AI and both foundation models and multimodal AI will be discussed at the in-person AI-Med Global Summit 2024 scheduled for May 29-31, 2024 in Orlando, Florida. Book your place now!

See you there!