IBM Demonstrates Groundbreaking Artificial Intelligence Research Using Foundational Models And Generative AI

AI has already demonstrated its power to revolutionize industries and accelerate scientific investigation. One field of AI research that has made stunning advancements is in the area of foundation models and generative AI, which enables computers to generate original content based on input data. This technology has been used to create everything from music and art to fake news reports.

OpenAI recently showcased the impressive capabilities of artificial intelligence by offering free access to ChatGPT, a state-of-the-art generative transformer model. The move generated widespread media attention and excitement among users, highlighting the massive potential of AI. This demonstration came just three months after the release of ChatGPT to the public.

Faced with the disruptive impact of OpenAI’s GPT-3 model, Google and Microsoft were compelled to reveal AI integration plans for their respective search engines. The demonstration of AI’s practical and powerful capabilities by OpenAI will no doubt raise the public’s expectations and demand for more advanced AI products in the future. OpenAI’s move sparked one of the quickest and most significant disruptions in an industry segment that has ever been witnessed.

It is universally acknowledged that human life is of paramount importance. In this article, we shed light on the life-saving potential of AI by examining its practical applications in the creation of new antibiotics and other scientific AI tools. Innovative use of foundation models and generative AI has the capability to increase revenues, optimize processes, and streamline the creation and accumulation of knowledge, however, it also has the potential to save millions of lives around the world. This discussion aims to increase visibility for the importance of AI’s potential to save lives and highlight the need to expand its development and deployment in these areas.

From simple algorithms to breakthrough advances

Artificial intelligence (AI) had rather simple beginnings in the 1950s. It tackled simple algorithms and mathematical models designed for specific tasks. Much later, in the 1990s, AI research underwent a major shift towards machine learning algorithms that enabled computers to improve their performance by analyzing patterns in data and transferring that knowledge to new applications. This shift gave rise to numerous breakthroughs in the field, including the development of deep learning algorithms that revolutionized areas such as computer vision and natural language processing (NLP). These advances have led to even more new achievements and further expanded the potential of AI.

Today, AI researchers continue to push the boundaries by developing new algorithms and models that can tackle increasingly complex tasks. AI and the size of models continues to evolve at an unprecedented pace, producing responses that are more human-like and expanding the range of tasks it can perform. Breakthroughs and applications are still being made in areas such as natural language processing (NLP), computer vision, and robotics. Despite its limitations and challenges, AI has proven to be a transformative force across a wide array of industries and fields, including healthcare, finance, transportation, and education.

Cutting-edge AI research by an IBM Master Inventor

IBM has one of the largest and most well-funded AI research programs in the world and I recently had the opportunity to discuss its program with Dr. Payel Das, principal research staff member and manager at IBM Research who is also an IBM master inventor.

Dr. Das has served as an adjunct associate professor in the department of Applied Physics and Applied Mathematics (APAM) at Columbia University. She is currently serving as an advisory board member of AMS at Stony Brook University. Dr. Das received her B.S. from Presidency College in Kolkata, India, and her M.S. from the Indian Institute of Technology in Chennai, India. She was awarded a Ph.D. in theoretical biophysics from Rice University in Houston, Texas. Dr. Das has coauthored more than 40 peer-reviewed publications. She has also received awards from Harvard Belfer Center TAPP 2021 and IEEE open source 2022. She also has a number of IBM awards including the IBM Outstanding Technical Achievement Award (the highest technical award at IBM), two IBM Research Division Awards, one IBM Eminence and Excellence Award, and two IBM Invention Achievement Awards.

As a member of the Trustworthy AI department and the generative AI lead within IBM Research, Dr. Das is currently focused on developing new algorithms, methods, and tools to develop generative AI systems that are created from foundation models.

Her team is also working on using synthetic data to make the AI models more trustworthy and to ensure fairness and robustness in downstream AI applications.

The power of synthetic data and how it advances AI

In our data-driven era, synthetic data has become an indispensable tool for testing and training AI models. This computer-generated information is cost-effective to produce, comes with automatic labeling, and avoids many of the ethical, logistical, and privacy challenges associated with training deep learning models on real-world data.

Synthetic data is critical for business applications as it offers solutions when real data is scarce or inadequate. One of the key advantages of synthetic data is its ability to be generated in vast quantities, making it ideal for training AI models. Furthermore, synthetic data can be designed to encompass a diverse range of variations and examples, leading to better generalization and usability of the model. These attributes make synthetic data an indispensable tool in the advancement of AI and its real-world applications.

It is crucial that the generated synthetic data adheres to user-defined controls to ensure it serves its intended purpose and minimizes potential risks. The specific controls required vary depending on the intended application and desired outcome. Ensuring that synthetic data aligns with these controls is essential to guarantee its effectiveness and safety in real-world applications.

Transforming the future with universal representation models

Transformers have been widely adopted for many different applications and proven to be highly effective for processing complex data such as natural language IBM

The first AI models utilized feedforward neural networks, which were effective in modeling non-sequential data. However, they were not equipped to handle sequential data. To overcome this limitation, recurrent neural networks (RNNs) were developed in the 1990s, but it wasn’t until around 2010 that they saw widespread implementation.

This breakthrough in technology expanded the capabilities of AI to process sequential data and paved the way for further advancements in the field. Then another type of AI model called a transformer, radically improved AI capabilities.

The transformer made its first appearance in a 2017 Google research paper that proposed a new type of neural network architecture. Transformers also incorporated self-attention mechanisms that allowed models to focus on relevant parts of an input and made more accurate predictions.

The self-attention mechanism is a defining feature that sets transformers apart from other encoder-decoder architectures. This mechanism proves especially beneficial in natural language processing as it enables the model to grasp the relationships between words in a sentence and recognize long-term dependencies. The transformer accomplishes this by assigning weights to each element in the sequence, based on its relevance to the task. This way, the model can prioritize the most crucial parts of the input, resulting in more context-aware and informed predictions or decisions. The integration of self-attention mechanisms has greatly advanced the capabilities of AI models in natural language processing.

According to Dr. Das, in recent years, there has been a shift away from RNNs as the primary architecture for natural language processing tasks. RNNs can be difficult to train and can suffer from vanishing gradient problems, which can make it challenging to learn long-term dependencies in language data. By contrast, transformers have been shown to be more effective in achieving state-of-the-art results on a variety of natural language processing tasks.

Unlocking the power of foundational models

Models that are trained using large-scale data and self-supervision techniques can produce a universal representation that is not specific to any particular task. This representation can then be utilized in various other applications with little to no further adjustment.

These models are referred to as “foundational models,” a term coined by Stanford University in a 2021 research paper. Many of today’s foundational models adopt transformer architecture and have proven versatile in a broad range of natural language processing (NLP) tasks. This is due to their pre-training on vast datasets, which results in powerful machine learning models ready for deployment. The use of foundational models has greatly impacted and improved the field of NLP.

Dr. Das and the IBM research team have been involved in a significant amount of AI research with foundation models and generative AI.


The above graphic shows how a foundation model can be used to build models for different fields by using text as the input data. They may or may not use transformer architecture. On the left side of the graphic, a large language model is shown, which progressively maps letters to words to sentences and finally to language.

The illustration on the right side of the graphic depicts a chemistry transformer model, which connects atoms to molecules and to chemistry. The same concept could be applied to build foundation models for biology or other related fields by representing biological or chemical molecules as text.

It’s crucial to note that the transformer architecture is adaptable to a diverse array of fields, as long as the input data can be expressed in textual form. This versatility makes the transformer architecture a valuable tool for creating machine learning models in many domains.

Pushing the boundaries of creativity with generative AI

Generative models have the ability to create new and unique images, audio, or text for a variety of applications. These models have also enabled AI systems to become more effective at processing complex data and have opened up new possibilities for using AI in a wide range of applications.

Foundation models serve as a strong basis for creating generative models due to their ability to handle and learn from vast amounts of data. By adjusting the parameters of these models to focus on a specific task, like generating images or text, new generative AI models can be created that produce unique content within specific fields.

Dall.E2 by OpenAI DELL E2

As an illustration, if the objective is to develop a generative AI model for art, a pre-trained foundational model would first be trained on a vast collection of art images. After successful training, it could then be utilized to produce novel and original pieces of art. Above is a sample of art created by an AI program named Dall.E2 in response to a prompt requesting it to generate a painted portrait of a human face, as perceived by AI.

Overcoming the small data challenge in generative AI

Implications of foundation models go well beyond NLP

“When we first started working on generative AI,” Dr. Das said, “it occurred to us that one of our problems was learning from small data for any domain-specific or any industry-specific application.”

Generative AI models require large amounts of data to accurately learn and generate new, similar data. When working with small data sets, the performance and usefulness of these models can be limited. Dr. Das recognizes this challenge and understands that techniques like transfer learning and data augmentation can help improve their performance in these situations.

Despite the challenges posed by small data sets for generative AI models, for each of the domains in the above graphic, a vast amount of unlabeled data exists in businesses. This data provides an opportunity to train custom foundational models, enabling the solution of previously thought unsolvable problems. This aligns with IBM Research’s focus on exploring new AI capabilities through generative AI and pushing the boundaries of AI science.

Broad generative AI research


IBM has made significant contributions in each of the domains represented in the image. Their work is so extensive, it is difficult to cover all their achievements in a single article.

Synthesizing antimicrobials with generative AI


AI has the potential to revolutionize various fields and speed up scientific progress. As an example, Dr. Das and her research team have leveraged AI to develop innovative antimicrobials to fight against lethal antibiotic-resistant bacteria.

The Fight Against Superbugs

Antibiotics were first used to treat serious infections in the 1940s. Since then, antibiotics have saved millions of lives and transformed modern medicine. Yet the CDC estimates that about 47 million antibiotic treatments are prescribed each year for infections that don’t need antibiotics.

The overuse of antibiotics is a critical problem because it contributes to the development of antibiotic-resistant infections caused by common bacteria like E.coli and staphylococcus, as well as more dangerous and rare bacteria such as MRSA. These resistant infections are challenging to treat and can result in serious consequences like sepsis, organ malfunction, and death.

When traditional antibiotics are no longer able to effectively kill bacteria, it becomes much more difficult or even impossible to treat and control infections. These antibiotic-resistant bacteria—commonly called superbugs—can spread quickly and cause serious infections, particularly in hospitals and other healthcare settings. Superbugs can also be found in the environment, in food, and on surfaces, plus they can be transmitted from person to person.

It is a serious global health problem. Drug-resistant diseases kill 700,000 people annually around the world; by 2050, that number is expected to rise to 10 million deaths per year.

How bacteria outsmart antibiotics

Bacteria and viruses transform into superbugs through the activation of innate defense strategies that render antibiotics ineffective. These defense mechanisms can involve physical, chemical, or biological processes that safeguard the germs and enable them to escape or counteract danger to their existence. Such processes may produce enzymes that inactivate antibiotics, alter the bacterial cell wall making the organism less responsive to the drugs, or allow the bacteria to obtain genetic information from other bacteria that possess inherent immunity to antibiotics.

Streamlining drug development with AI

The conventional method of creating a new antimicrobial drug is a lengthy and expensive undertaking, frequently taking many years and a hefty sum of money before it can be commercially available. But recent advancements in Artificial Intelligence (AI) are revolutionizing the drug discovery and development process.

By utilizing AI’s ability to generate and evaluate numerous possible drug candidates, researchers can swiftly pinpoint the most promising options and concentrate their efforts on them. This streamlines the drug development process, cutting down the time and cost involved and leading to the production of more efficient antimicrobial drugs at a quicker pace.

Overview and 48-day timeline of the research team’s proposed AI-driven approach for accelerated antimicrobial design Source: IBM Research, Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics. Nature Biomed. Eng., March 2021 IBM

In a collaborative effort between Dr. Das and her team at IBM, as well as other organizations, they conducted a study to find innovative solutions to the problem of antimicrobial resistance. The study utilized AI to synthesize and evaluate 20 unique antimicrobial peptide designs, chosen from a pool of 90,000 sequences.

The AI models were specifically designed to combat antibiotic resistance, incorporating controls for broad-spectrum efficacy and low toxicity, and slowing down the emergence of resistance. This approach aimed to create effective solutions that not only fight against resistant bacteria but also minimize the risk of harmful side effects and prevent further resistance from developing.

The team tested these designs against a diverse range of gram-negative and gram-positive bacteria, which led to the identification of six successful drug candidates. The toxicity of these candidates was further evaluated in both a mouse model and a test tube.

AI-powered success 

Dr. Das expressed excitement about the success of the design, pointing out that it embodies many of the sought-after characteristics expected in the next generation of drug candidates. The accompanying illustration outlines the plan and estimated duration of using AI to speed up the antimicrobial design process, which can be accomplished in just one and a half months, significantly quicker than the conventional method that takes several years.

The use of AI in accelerating the discovery of new antimicrobial drugs has proven to be a game-changer, offering clear benefits such as faster speed and reduced expenses. Moreover, AI models offer a more streamlined approach by directing the attention of researchers to the most promising leads. Additionally, generative AI enables scientists to design innovative drug compounds that boast unique features and elevated efficacy compared to existing drugs.

The researchers at IBM have harnessed the power of generative AI to streamline the development of new antimicrobial drugs. Additionally, they have used AI to create valuable tools, such as MolFormer and MolGPT, for predicting the properties of chemical molecules which plays a crucial role in various fields including drug discovery and material design.

Wrapping up

Generative AI has captured the attention of various industries, including music, art, healthcare, and pharmaceuticals, as one of the most exciting advancements in AI in recent times. Despite its limitations and challenges, AI continues to demonstrate its potential to revolutionize different fields.

Its ability to swiftly create and test life-saving medicines for antibiotic-resistant bacteria and other pathogens is a testament to its significance and promise.

With the recent buzz surrounding OpenAI’s GPT-3 trial and the subsequent developments by Google and Microsoft, it’s likely we will not only see a surge in AI-powered products in the coming year, I expect further disruptions to occur. Some may be trivial, but the hope is that many will feature meaningful integrations of AI that will be beneficial to the markets.

Analyst Notes:

  1. While some may question the absence of a discussion on the combination of facial recognition and AI, it is important to note that facial recognition technology and GPT models are separate AI technologies with distinct functions and methods. IBM, which was once a leader in human face data, has chosen not to work in the field of facial recognition due to the controversial political and privacy issues surrounding it. However, IBM is still actively involved in other AI modalities such as language processing, image recognition, graphics analysis, speech recognition, and various combinations in multimodal AI applications.
  2. A final remark on the market response to the release of GPT-3: It was noteworthy that Microsoft appeared well-prepared when the GPT-3 news broke, whereas Google seemed caught off guard and was forced to hold an emergency meeting with its founders to come up with a plan. In contrast, Microsoft had already planned how it was going to integrate AI into its operations. There is a significant disparity in search revenue between the two companies, with Microsoft earning a total of $22 billion in 2022 search revenue, while Google had $59 billion in the last quarter of 2022. It is surprising that Google was not more prepared to defend against the potential impact of GPT-3, considering the model’s search-applicable capabilities and the obvious threat it posed to one-third of Google’s total revenue.
  3. DALL.E2 mentioned in the article is a cutting-edge deep learning model that generates digital images based natural language input. It is based on a version of OpenAI’s GPT-3.
  4. For more information about more of IBM’s AI research, you might be interested in my previous articles:

IBM CodeNet: Artificial Intelligence That Can Program Computers And Solve A $100 Billion Legacy Code Problem

IBM’s AutoAI Has The Smarts To Make Data Scientists A Lot More Productive – But What’s Scary Is That It’s Getting A Whole Lot Smarter