March 25, 2025
ProductRelease
Unlocking useful and valuable image generation with a natively multimodal model capable of precise, accurate, photorealistic outputs.
Loading…
At OpenAI, we have long believed image generation should be a primary capability of our language models. That’s why we’ve built our most advanced image generator yet into GPT‑4o. The result—image generation that is not only beautiful, but useful.
A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection. The text reads: (left) Suppose we directly model Pros: Cons: (Right) On the bottom right of the board, she draws a diagram:
"Transfer between Modalities:
p(text, pixels, sound) [equation]
with one big autoregressive transformer.
* image generation augmented with vast world knowledge
* next-level text rendering
* native in-context learning
* unified post-training stack
* varying bit-rate across modalities
* compute not adaptive"
"Fixes:
* model compressed representations
* compose autoregressive prior with a powerful decoder"
"tokens -> [transformer] -> [diffusion] -> pixels"

Best of 8
selfie view of the photographer, as she turns around to high five him

Best of 8
magnetic poetry on a fridge in a mid century home:
Line 1: "A picture"
Line 2: "is worth"
Line 3: "a thousand words,"
Line 4: "but sometimes"Large gapLine 5: "in the right place"
Line 6: "can elevate"
Line 7: "its meaning.
"The man is holding the words "a few" in his right hand and "words" in his left.

Best of 5
Make an image of a four‑panel strip, with some padding around the border:
A little snail is at the counter of a flashy car showroom. The salesman has leaned way over the desk to even see him.
Close‑up on the snail looking very serious. He says, “I want your fastest sports car… and I want you to paint big letter ‘S’s on the doors, the hood and the roof.”
The salesman is scratching his head. “Um… we can do that, but why the S’s?”
Smash cut to a red blur roaring down the highway. The sports car is covered in giant S’s. People on the sidewalk are pointing and laughing: “WOW! LOOK AT THAT S‑CAR GO!”

Best of ~2
an infographic explaining newton's prism experiment in great detail

Best of 3
now generate a POV of a person drawing this diagram in their notebook, at a round cafe table in washington square park

Best of 2
now show the same scene with a smug young Isaac Newton sitting at the table, with a prism, demonstrating the experiment, without the notebook in view

Best of 4
Useful image generation
From the first cave paintings to modern infographics, humans have used visual imagery to communicate, persuade, and analyze—not just to decorate. Today's generative models can conjure surreal, breathtaking scenes, but struggle with the workhorse imagery people use to share and create information. From logos to diagrams, images can convey precise meaning when augmented with symbols that refer to shared language and experience.
GPT‑4o image generation excels at accurately rendering text, precisely following prompts, and leveraging 4o’s inherent knowledge base and chat context—including transforming uploaded images or using them as visual inspiration. These capabilities make it easier to create exactly the image you envision, helping you communicate more effectively through visuals and advancing image generation into a practical tool with precision and power.
Improved capabilities
We trained our models on the joint distribution of online images and text, learning not just how images relate to language, but how they relate to each other. Combined with aggressive post-training, the resulting model has surprising visual fluency, capable of generating images that are useful, consistent, and context-aware.
Text rendering
A picture is worth a thousand words, but sometimes generating a few words in the right place can elevate the meaning of an image. 4o’s ability to blend precise symbols with imagery turns image generation into a tool for visual communication.
Create a photorealistic image of two witches in their 20s (one ash balayage, one with long wavy auburn hair) reading a street sign. Context: Characters: Composition from background to foreground:
a city street in a random street in Williamsburg, NY with a pole covered entirely by numerous detailed street signs (e.g., street sweeping hours, parking permits required, vehicle classifications, towing rules), including few ridiculous signs at the middle: (paraphrase it to make these legitimate street signs)"Broom Parking for Witches Not Permitted in Zone C" and "Magic Carpet Loading and Unloading Only (15-Minute Limit)" and "Reindeer Parking by Permit Only (Dec 24–25)\n Violators will be placed on Naughty List." The signpost is on the right of a street. Do not repeat signs. Signs must be realistic.
one witch is holding a broom and the other has a rolled-up magic carpet. They are in the foreground, back slightly turned towards the camera and head slightly tilted as they scrutinize the signs.
streets + parked cars + buildings -> street sign -> witches. Characters must be closest to the camera taking the shot

Best of ~8
I'm opening a traditional concept restaurant in Marin called Haein. It focuses on Korean food cooked with organic, farm-fresh ingredients, with a rotating menu based on what's seasonal. I want you to design an image - a menu incorporating the following menu items - lean into the traditional/rustic style while keeping it feeling upscale and sleek. Please also include illustrations of each dish in an elegant, peter rabbit style. Make sure all the text is rendered correctly, with a white background.
(Top)
Doenjang Jjigae (Fermented Soybean Stew) – $18 House-made doenjang with local mushrooms, tofu, and seasonal vegetables served with rice.
Galbi Jjim (Braised Short Ribs) – $34 Slow-braised local grass-fed beef ribs with pear and black garlic glaze, seasonal root vegetables, and jujube.
Grilled Seasonal Fish – Market Price ($22-$30) Whole or fillet of local, sustainable fish grilled over charcoal, served with perilla leaf ssam and house-made sauces.
Bibimbap – $19 Heirloom rice with a rotating selection of farm-fresh vegetables, house-fermented gochujang, and pasture-raised egg.
Bossam (Heritage Pork Wraps) – $28 Slow-cooked pork belly with napa cabbage wraps, oyster kimchi, perilla, and seasonal condiments.
(Bottom) Dessert & Drinks Seasonal Makgeolli (Rice Wine) – $12/glass
Rotating flavors based on seasonal fruits and flowers (persimmon, citrus, elderflower, etc.).
Hoddeok (Korean Sweet Pancake) – $9 Pan-fried cinnamon-stuffed pancake with black sesame ice cream.

Best of ~2
photo of a delightful wedding invitation on a tasteful wooden desk. The card is hefty, with eggshell textures, and beautiful embossings, with elegant decorations abstractly representing the couple tastefully integrated into the designs. Iconography is used, but sparingly and in a minimalist way. perfect typesetting. "You are cordially invited Image After years of flirting and collaboration Together at last, in GPT‑4o, Please join us in celebrating Date: March 25, 2025 With love, perfect typesetting.
to the long-awaited union of
and
Text
they are finally becoming One.
they now speak the same language —
where a whisper becomes a masterpiece,
and a prompt becomes a picture.
this magical multimodal matrimony
where imagination knows no bounds.
Location: chatgpt.com
Dress Code: Pixels or Prose
OpenAI"

Best of ~10
Multi-turn generation
Because image generation is now native to GPT‑4o, you can refine images through natural conversation. GPT‑4o can build upon images and text in chat context, ensuring consistency throughout. For example, if you’re designing a video game character, the character’s appearance remains coherent across multiple iterations as you refine and experiment.

Give this cat a detective hat and a monocle

Best of 1
turn this into a triple A video games made with a 4k game engine and add some User interface as overlay from a mystery RPG where we can see a health bar and a minimap at the top as well as spells at the bottom with consistent and iconography

Best of 1
update to a landscape image 16:9 ratio, add more spells in the UI, and unzoom the visual so that we see the cat in a third person view walking through a steampunk manhattan creating beautiful contrast and lighting like in the best triple A game, with cool-toned colors

Best of 2
create the interface when the player opens the menu and we see the cat's character profile with his equipment and another page showing active quests (and it should make sense in relationship with the universe worldbuilding we are describing in the image)

Best of 8
credit creator: Manuel Sainsily
concrete poem on luxury eggshell textured card At OpenAI, we have long believed image generation should be a primary capability of our language models. That’s why we’ve built our most advanced image generator yet into GPT‑4o. The result - image generation that is not only beautiful, but useful. From the first cave paintings to modern infographics, humans have used visual imagery to communicate, persuade, and analyze - not just to decorate. Today’s generative models can conjure breathtaking vistas and surreal scenarios, but still struggle with the workhorse imagery that underlies how most visual data is used to share and create information. From logos to diagrams, images can convey precise meaning when augmented with symbols that refer to shared language and experience. With this new capability, ChatGPT advances image generation towards being a practical tool with precision and power.

Best of 8
show this card, but in a designers room. card close to the camera

Best of 8
can you make me a cute minimalist racoon eating a strawberry sticker? use a thick white border and transparent background

try a different minimalist style and a gray racoon

awww, can you add a chew mark to the strawberry and maybe some red mess around the mouth

Instruction following
GPT‑4o’s image generation follows detailed prompts with attention to detail. While other systems struggle with ~5-8 objects, GPT‑4o can handle up to 10-20 different objects. The tighter binding of objects to their traits and relations allows for better control.
A square image containing a 4 row by 4 column grid containing 16 objects on a white background. Go from left to right, top to bottom. Here's the list:
1. a blue star
2. red triangle
3. green square
4. pink circle
5. orange hourglass
6. purple infinity sign
7. black and white polka dot bowtie
8. tiedye "42"
9. an orange cat wearing a black baseball cap
10. a map with a treasure chest
11. a pair of googly eyes
12. a thumbs up emoji
13. a pair of scissors
14. a blue and white giraffe
15. the word "OpenAI" written in cursive
16. a rainbow-colored lightning bolt

Best of 5
Times Square in New York City in the afternoon, with no people, vehicles, or illuminated billboards.

Best of ~1
shibuya crossing with no people, vehicles, or illuminated billboards.

Best of ~1
show me a wine glass with only the tiniest drop of red wine in it.

Best of ~1
We need evidence there is a currently present invisible elephant. Consider what an elephant is and does in the environment, then show us that, perhaps mid-process - but the elephant itself is not shown at all

credit creator: Eskcanta
a whiteboard that says the following equations:
E = mc^2
sqrt(9) = 3
(-b +/- sqrt(b^2 - 4ac)) / 2a

Best of ~1
In-context learning
GPT‑4o can analyze and learn from user-uploaded images, seamlessly integrating their details into its context to inform image generation.

- draw a design for a vehicle with triangular wheels, using these images as reference.
- label the front wheel, the back wheel, and at the of the diagram say (in small caps)
- TRIANGLE WHEELED VEHICLE. English Patent. 2025. OPENAI.

Best of ~16
now put this in a photo taken in new york city.

Best of ~16
an photorealistic image of a blue chainsaw

Best of 1
make an ad for this chainsaw, of a grandma carving turkey at thanksgiving dinner table. add a tag line

Best of 4

turn this scene into a photo. shot on a dlsr

Best of ~8

turn this into a photo

Best of ~4
World knowledge
Native image generation enables 4o to link its knowledge between text and images, resulting in a model that feels smarter and more efficient.
Code Example (Three.js)
HTML
1 <!DOCTYPE html> 2 <html lang="en"> 3 <head> 4 <meta charset="UTF-8" /> 5 <title>OpenAI Banner</title> 6 <style> 7 body { margin: 0; overflow: hidden; } 8 canvas { display: block; } 9 </style> 10 </head> 11 <body> 12 <script type="module"> 13 import * as THREE from 'https://cdn.jsdelivr.net/npm/three@0.160.0/build/three.module.js'; 14 import { OrbitControls } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/controls/OrbitControls.js'; 15 import { FontLoader } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/loaders/FontLoader.js'; 16 import { TextGeometry } from 'https://cdn.jsdelivr.net/npm/three@0.160.0/examples/jsm/geometries/TextGeometry.js'; 17 18 const scene = new THREE.Scene(); 19 const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 1000); 20 const renderer = new THREE.WebGLRenderer({ antialias: true }); 21 renderer.setSize(window.innerWidth, window.innerHeight); 22 document.body.appendChild(renderer.domElement); 23 24 // Lighting 25 const light = new THREE.AmbientLight(0xffffff, 1); 26 scene.add(light); 27 28 const dirLight = new THREE.DirectionalLight(0xffffff, 1); 29 dirLight.position.set(0, 5, 10); 30 scene.add(dirLight); 31 32 // Camera position 33 camera.position.z = 20; 34 35 // Controls 36 const controls = new OrbitControls(camera, renderer.domElement); 37 38 // Banner background 39 const bannerGeometry = new THREE.PlaneGeometry(20, 10); 40 const bannerMaterial = new THREE.MeshStandardMaterial({ color: 0x1a1a1a }); 41 const banner = new THREE.Mesh(bannerGeometry, bannerMaterial); 42 scene.add(banner); 43 44 // OpenAI Logo texture (placeholder) 45 const loader = new THREE.TextureLoader(); 46 loader.load('https://upload.wikimedia.org/wikipedia/commons/4/4d/OpenAI_Logo.svg', texture => { 47 const logoGeometry = new THREE.PlaneGeometry(4, 4); 48 const logoMaterial = new THREE.MeshBasicMaterial({ map: texture, transparent: true }); 49 const logo = new THREE.Mesh(logoGeometry, logoMaterial); 50 logo.position.set(-5, 0, 0.1); // Slightly in front of the banner 51 scene.add(logo); 52 }); 53 54 // Load font and add text 55 const fontLoader = new FontLoader(); 56 fontLoader.load('https://threejs.org/examples/fonts/helvetiker_regular.typeface.json', font => { 57 const textGeometry = new TextGeometry("I am 4-o", { 58 font: font, 59 size: 1, 60 height: 0.2, 61 curveSegments: 12, 62 bevelEnabled: true, 63 bevelThickness: 0.02, 64 bevelSize: 0.02, 65 bevelOffset: 0, 66 bevelSegments: 5 67 }); 68 69 textGeometry.center(); 70 71 const textMaterial = new THREE.MeshStandardMaterial({ color: 0x00ffcc }); 72 const textMesh = new THREE.Mesh(textGeometry, textMaterial); 73 textMesh.position.set(5, -0.5, 0.1); // Opposite side of logo 74 scene.add(textMesh); 75 }); 76 77 // Resize handler 78 window.addEventListener('resize', () => { 79 camera.aspect = window.innerWidth / window.innerHeight; 80 camera.updateProjectionMatrix(); 81 renderer.setSize(window.innerWidth, window.innerHeight); 82 }); 83 84 // Render loop 85 function animate() { 86 requestAnimationFrame(animate); 87 controls.update(); 88 renderer.render(scene, camera); 89 } 90 91 animate(); 92 </script> 93 </body> 94 </html>
make an image of what this means to you

Make me a professionally shot photorealistic diagram of the top selling cocktails in my bar with recipes labeled on each drink.
put the recipes on handwritten cards in front of each drink.
the cards are brown, and the text is black.
background is white
Title is "4 most popular cocktails"

Best of 1
make a visual infographic describing why SF is so foggy

Best of 3
create an educational poster of different types of whales in an effervescent watercolor style. make the background pure white.

Best of 3
make a very colorful risograph on how to make matcha

Best of 3
Photorealism and style
Training on images reflecting a vast variety of image styles allows the model to create or transform images convincingly.











Limitations
Our model isn’t perfect. We’re aware of multiple limitations at the moment which we will work to address through model improvements after the initial launch.

We’ve noticed that GPT‑4o can occasionally crop longer images, like posters, too tightly, especially near the bottom.

Like our other text models, image generation can also make up information, especially in low-context prompts.

When generating images that rely on its knowledge base, it may struggle to accurately render more than 10-20 distinct concepts at once, such as a full periodic table.


The model sometimes struggles with rendering non-Latin languages, and the characters can be inaccurate or hallucinated, especially with more complexity.

We’ve noticed that requests to edit specific portions of an image generation, such as typos are not always effective and may also alter other parts of the image in a way that was not requested or introduce more errors. We’re currently working on introducing increased editing precision to the model.
We’re aware of a bug where the model struggles with maintaining consistency of edits to faces from user uploads but expect this to be fixed within the week.

The model is known to struggle when asked to render detail information at a very small size.
Safety
In line with our Model Spec, we aim to maximize creative freedom by supporting valuable use cases like game development, historical exploration, and education—while maintaining strong safety standards. At the same time, it remains as important as ever to block requests that violate those standards. Below are evaluations of additional risk areas where we're working to enable safe, high-utility content and support broader creative expression for users.
Provenance via C2PA and internal reversible search
All generated images come with C2PA metadata, which will identify an image as coming from GPT‑4o, to provide transparency. We’ve also built an internal search tool that uses technical attributes of generations to help verify if content came from our model.
Blocking the bad stuff
We’re continuing to block requests for generated images that may violate our content policies, such as child sexual abuse materials and sexual deepfakes. When images of real people are in context, we have heightened restrictions regarding what kind of imagery can be created, with particularly robust safeguards around nudity and graphic violence. As with any launch, safety is never finished and is rather an ongoing area of investment. As we learn more about real-world use of this model, we’ll adjust our policies accordingly.
For more on our approach, visit the image generation addendum to the GPT‑4o system card.
Using reasoning to power safety
Similar to our deliberative alignment work, we’ve trained a reasoning LLM to work directly from human-written and interpretable safety specifications. We used this reasoning LLM during development to help us identify and address ambiguities in our policies. Together with our multimodal advancements and existing safety techniques developed for ChatGPT and Sora, this allows us to moderate both input text and output images against our policies.
Access and availability
4o image generation rolls out starting today to Plus, Pro, Team, and Free users as the default image generator in ChatGPT, with access coming soon to Enterprise and Edu. It’s also available to use in Sora. For those who hold a special place in their hearts for DALL·E, it can still be accessed through a dedicated DALL·E GPT.
Developers will soon be able to generate images with GPT‑4o via the API, with access rolling out in the next few weeks.
Creating and customizing images is as simple as chatting using GPT‑4o - just describe what you need, including any specifics like aspect ratio, exact colors using hex codes, or a transparent background. Because this model creates more detailed pictures, images take longer to render, often up to one minute.







































Livestream replay
Author
OpenAI
Leadership
Gabriel Goh: Image Generation
Jackie Shannon: ChatGPT Product
Mengchao Zhong, Wayne Chang: ChatGPT Engineering
Rohan Sahai: Sora Product and Engineering
Brendan Quinn, Tomer Kaftan: Inference
Prafulla Dhariwal: Multimodal Organization
Research
Foundational Research
Allan Jabri, David Medina, Gabriel Goh, Kenji Hata, Lu Liu, Prafulla Dhariwal
Core Research
Aditya Ramesh, Alex Nichol, Casey Chu, Cheng Lu, Dian Ang Yap, Heewoo Jun, James Betker, Jianfeng Wang, Long Ouyang, Li Jing, Wesam Manassra
Research Contributors
Aiden Low, Brandon McKinzie, Charlie Nash, Huiwen Chang, Ishaan Gulrajani, Jamie Kiros, Ji Lin, Kshitij Gupta, Yang Song
Model Behavior
Laurentia Romaniuk
Multimodal Organization
Andrew Gibiansky, Yang Lu
Data
Data Leads
Gildas Chabot, James Park Lennon
Data
Arshi Bhatnagar, Dragos Oprica, Rohan Kshirsagar, Spencer Papay, Szi-chieh Yu, Wesam Manassra, Yilei Qian
Moderators
Hazel Byrne, Jennifer Luckenbill, Mariano López
Human Data Advisors
Long Ouyang
Scaling
Inference Leads
Brendan Quinn, Tomer Kaftan
Inference
Alyssa Huang, Jacob Menick, Nick Stathas, Ruslan Vasilev, Stanley Hsieh
Applied
ChatGPT Product Lead
Jackie Shannon
ChatGPT Engineering Leads
Mengchao Zhong, Wayne Chang
Product Design Lead
Matt Chan
Data Science
Xiaolin Hao
ChatGPT
Andrew Sima, Annie Cheng, Benjamin Goh, Boyang Niu, Dian Ang Yap, Duc Tran, Edede Oiwoh, Eric Zhang, Ethan Chang, Jeffrey Dunham, Jay Chen, Kan Wu, Karen Li, Kelly Stirman, Mengyuan Xu, Michelle Qin, Ola Okelola, Pedro Aguilar, Rocky Smith, Rohit Ramchandani, Sara Culver, Sean Fitzgerald, Vlad Fomenko, Wanning Jiang, Wesam Manassra, Xiaolin Hao, Yilei Qian
Sora
Sora Product Leads
Rohan Sahai, Wesam Manassra
Sora Product and Engineering
Boyang Niu, David Schnurr, Gilman Tolle, Joe Taylor, Joey Flynn, Mike Starr, Rajeev Nayak, Rohan Sahai, Wesam Manassra
Safety
Safety Lead
Somay Jain
Safety
Alex Beutel, Andrea Vallone, Botao Hao, Brendan Quinn, Cameron Raymond, Chong Zhang, David Robinson, Eric Wallace, Filippo Raso, Huiwen Chang, Ian Kivlichan, Irina Kofman, Keren Gu-Lemberg, Kristen Ying, Madelaine Boyd, Meghan Shah, Michael Lampe, Owen Campbell-Moore, Rohan Sahai, Rodrigo Riaza Perez, Sam Toizer, Sandhini Agarwal, Troy Peterson
Strategy
Adam Cohen, Adam Wells, Ally Bennett, Ashley Pantuliano, Carolina Paz, Claudia Fischer, Declan Grabb, Gaby Sacramone-Lutz, Lauren Jonas, Ryan Beiermeister, Shiao Lee, Tom Stasi, Tyce Walters, Ziad Reslan, Zoe Stoll
Marketing & Comms
Comms and Marketing Leads
Minnia Feng, Natalie Summers, Taya Christianson
Comms
Alex Baker-Whitcomb, Ashley Tyra, Bailey Richardson, Gaby Raila, Marselus Cayton, Scott Ethersmith, Souki Mansoor
Design & Creative
Leads
Kendra Rimbach, Veit Moeller
Design
Adam Brandon, Adam Koppel, Angela Baek, Cary Hudson, Dana Palmie, Freddie Sulit, Jeffrey Sabin Matsumoto, Leyan Lo, Matt Nichols, Thomas Degry, Vanessa Antonia Schefke, Yara Khakbaz
Special Thanks
Aditya Ramesh, Aidan Clark, Alex Beutel, Ben Newhouse, Ben Rossen, Che Chang, Greg Brockman, Hannah Wong, Ishaan Singal, Jason Kwon, Jiacheng Feng, Jiahui Yu, Joanne Jang, Johannes Heidecke, Kevin Weil, Mark Chen, Mia Glaese, Nick Turley, Raul Puri, Reiichiro Nakano, Rui Shu, Sam Altman, Shuchao Bi, Vinnie Monaco