GPT-4o Just Leveled Up Its Image Generation 🖼️
I’m so excited to try out 4o’s new image generation, I saw some cool ones from X, like memes in Ghibli style. Let’s explore some of its functionalities!
Contents:
- Text Rendering
- Character Consistency
- Upload and Restyle
- Detailed Directions
- Transparent Layers
- Infographics
- Limitations
- References
✨ Text Rendering
Let’s first explore the “Text Rendering” feature. Here’s my prompt:
I’m starting a thriller/mystery book club called “Booked for Murder”. I want you to design an image - a poster incorporating an illustration of the attached book item (The Family Experiment) in cartoon style. Lean into the harry potter style and font while keeping it feeling upscale and sleek. Create a nice background that suits the theme. Make sure all the text is rendered correctly. Include a tag “free entry”.
Join Our First
Booked for Murder
Book Club
Book Selection
(Cartoon Illustration of the Book)
Sunday 30th March 2025
15:00 - 16:00
The Book Cafe
The result:
Then I asked to change into gothic style and make the overall font smaller:
I am actually impressed that it could render the text in the book so clearly as well.
✨ Character Consistency
To test the “consistency” of the multi-turn generation, I first provided the following prompt:
Create a concept art for a game character. The character is a girl with a white persian cat. The girl has a cute cat paw gloves and cat ears. The girl has a bubbly personality and is agile.
The result:
I don’t quite like the art style so I asked to make her cuter and use pixar-like art style:
Then, I wanted to try to see how she will look like as a character in one of my favourite games (Overwatch):
After that, I asked to turn it into a landscape image 16:9 ratio, and showed me a nice visual of a game where this character is strolling on a Japanese street:
Lastly, I asked to create a profile interface with active quests:
Overall, I think it’s quite consistent! That said, I’ve noticed it occasionally alters the art style or facial features slightly.
✨ Upload and Restyle
I tried to restyle these photos of Valorant teams:
with the following prompt:
Create a black and white manga style of the photo. Keep the 16:9 aspect ratio.
The result:
While the overall image is accurate, certain details - such as glasses, height, or facial expressions - can sometimes be lost.
✨ Detailed Directions
I generated 9 items which I love using the following prompt:
A square image containing a 3 row by 3 column grid containing 16 objects on a white background. Go from left to right, top to bottom. The objects should be in illustration style. Here’s the list:
- white dog
- pink heart
- cute gaming mouse and keyboard
- football5. cinnamoroll
- cherry blossom
- one piece anime
- hot matcha
- tempura
The result:
What can I say, they are all spot on!
✨ Transparent Layers
I tried to generate Valorant stickers by providing these images:
and the following prompt:
Turn this into cute sticker, transparent background.
The result:
They’re so cute, omg! Most of them came out great on the first try, but a few needed a couple of runs to match my preferences.
✨ Infographics
I tried to generate human anatomy infographic with the following prompt:
Make a detailed visual infographic of human anatomy.
The result:
Honestly, I don’t think it’s 100% accurate, but the art looks nice.
Then I continued to generate PC parts infographic with the following prompt:
Make a detailed visual infographic of PC parts.
The result:
I feel like this one is quite detailed and accurate.
I wanted more descriptions, so I asked to generate infographic to explain why sleep is important.
Create an infographic explaining why sleep is important. Use cute art style.
Lastly, I wanted to create a poster showing different types of flowers.
Create an educational poster of different types of flowers in a vibrant watercolor style.
The result:
Overall, the infographics look great and definitely have the potential to be used in schools.
⚠️ Limitations
Here are some limitations that they listed:
- Cropping: occasionally crop longer images, like posters, too tightly, especially near the bottom (something that I encountered as well)
- Hallucinations: make up information, especially in low-context prompts (definitely still a problem!)
- High binding problems: may struggle to accurately render more than 10-20 distinct concepts at once, such as a full periodic table (tried to generate Standard Model elementary particles and it was struggling as well)
- Precise graphing: (no description given, I didn’t try to generate data plots but I guess it can struggle to generate the correct, consistent tick labels)
- Multilingual text rendering: struggles with rendering non-Latin languages, and the characters can be inaccurate or hallucinated, especially with more complexity (I’ve only tried to generate English texts, but I can see the difficulties)
- Editing precision: requests to edit specific portions of an image generation, such as typos are not always effective and may also alter other parts of the image in a way that was not requested or introduce more errors (YES, I felt the same - it’s so hard to edit the generated picture sometimes that I’d rather just regenerate it instead!)
- Dense information with small text: struggle when asked to render detail information at a very small size (I did encounter this, but it’s understandable why the model will struggle with more texts to render)
Overall, most of my prompts are rather short and simple. I guess if you’re really particular about the details, you could test how well it follows prompts by giving it longer, more specific ones.
📚 References
Enjoy Reading This Article?
Here are some more articles you might like to read next: