Skip to main content

See How AI Generates Images from Text

Generative AI algorithms use probability to create visuals from noise

Person surrounded by black boxes. Each black box has a glowing screen with a similar base image projected on it. A few versions of the image are crisp. One includes static.

Matthew Twombly


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


Last year the Internet got its first taste of image-generating artificial intelligence. Suddenly, technology that had once been offered only to specialists was available to anyone with a web connection. The enthusiasm shows no signs of abating, and AI-generated images have won a major photography competition, created the title credits of a television series and tricked people into believing the pope stepped out in a fashionable puffer coat. Yet critics have noted how training the algorithms on existing works could potentially infringe on copyright, and using them could put artists' jobs in jeopardy. Generative AI also risks supercharging fake news: the pope coat was fun, but a generated photograph supposedly showing an attack on the Pentagon briefly inspired a dip in the stock market.

How did programs such as DALL-E 2, Midjourney and Stable Diffusion get to be so good all at once? Although AI has been in development for decades, the most popular of today's image generators use a technique called a diffusion model, which is relatively new on the AI scene. Here's how it works:

Credit: Matthew Twombly (graphic), Amanda Hobbs (research)

Sophie Bushwick is tech editor at Scientific American. She runs the daily technology news coverage for the website, writes about everything from artificial intelligence to jumping robots for both digital and print publication, records YouTube and TikTok videos and hosts the podcast Tech, Quickly. Bushwick also makes frequent appearances on radio shows such as Science Friday and television networks, including CBS, MSNBC and National Geographic. She has more than a decade of experience as a science journalist based in New York City and previously worked at outlets such as Popular Science,Discover and Gizmodo. Follow Bushwick on X (formerly Twitter) @sophiebushwick

More by Sophie Bushwick

Matthew Twombly is a freelance illustrator and infographic designer. His work can be viewed at www.matthewtwombly.com.

More by Matthew Twombly

Amanda Hobbs is a freelance research, writer and visual content editor specializing in storytelling via art and information graphics. Her work can be viewed at www.athcreative.com

More by Amanda Hobbs
Scientific American Magazine Vol 329 Issue 3This article was originally published with the title “How AI Generates Images from Text” in Scientific American Magazine Vol. 329 No. 3 (), p. 66
doi:10.1038/scientificamerican1023-66