GPT-4o Just Leveled Up Its Image Generation 🖼️

I’m so excited to try out 4o’s new image generation, I saw some cool ones from X, like memes in Ghibli style. Let’s explore some of its functionalities!

Contents:

Text Rendering
Character Consistency
Upload and Restyle
Detailed Directions
Transparent Layers
Infographics
Limitations
References

✨ Text Rendering

Let’s first explore the “Text Rendering” feature. Here’s my prompt:

I’m starting a thriller/mystery book club called “Booked for Murder”. I want you to design an image - a poster incorporating an illustration of the attached book item (The Family Experiment) in cartoon style. Lean into the harry potter style and font while keeping it feeling upscale and sleek. Create a nice background that suits the theme. Make sure all the text is rendered correctly. Include a tag “free entry”.

Join Our First
Booked for Murder
Book Club

Book Selection
(Cartoon Illustration of the Book)

Sunday 30th March 2025
15:00 - 16:00
The Book Cafe

The result:

Book Club Poster

Then I asked to change into gothic style and make the overall font smaller:

Book Club Poster

I am actually impressed that it could render the text in the book so clearly as well.

✨ Character Consistency

To test the “consistency” of the multi-turn generation, I first provided the following prompt:

Create a concept art for a game character. The character is a girl with a white persian cat. The girl has a cute cat paw gloves and cat ears. The girl has a bubbly personality and is agile.

The result:

Game Character

I don’t quite like the art style so I asked to make her cuter and use pixar-like art style:

Game Character

Then, I wanted to try to see how she will look like as a character in one of my favourite games (Overwatch):

Game Character

After that, I asked to turn it into a landscape image 16:9 ratio, and showed me a nice visual of a game where this character is strolling on a Japanese street:

Game Character

Lastly, I asked to create a profile interface with active quests:

Game Character

Overall, I think it’s quite consistent! That said, I’ve noticed it occasionally alters the art style or facial features slightly.

✨ Upload and Restyle

I tried to restyle these photos of Valorant teams:

VCT Manga

with the following prompt:

Create a black and white manga style of the photo. Keep the 16:9 aspect ratio.

The result:

VCT Manga

While the overall image is accurate, certain details - such as glasses, height, or facial expressions - can sometimes be lost.

✨ Detailed Directions

I generated 9 items which I love using the following prompt:

A square image containing a 3 row by 3 column grid containing 16 objects on a white background. Go from left to right, top to bottom. The objects should be in illustration style. Here’s the list:

white dog
pink heart
cute gaming mouse and keyboard
football5. cinnamoroll
cherry blossom
one piece anime
hot matcha
tempura

The result:

9 Things I Love

What can I say, they are all spot on!

✨ Transparent Layers

I tried to generate Valorant stickers by providing these images:

VCT Stickers

and the following prompt:

Turn this into cute sticker, transparent background.

The result:

VCT Stickers

They’re so cute, omg! Most of them came out great on the first try, but a few needed a couple of runs to match my preferences.

✨ Infographics

I tried to generate human anatomy infographic with the following prompt:

Make a detailed visual infographic of human anatomy.

The result:

Infographic

Honestly, I don’t think it’s 100% accurate, but the art looks nice.

Then I continued to generate PC parts infographic with the following prompt:

Make a detailed visual infographic of PC parts.

The result:

Infographic

I feel like this one is quite detailed and accurate.

I wanted more descriptions, so I asked to generate infographic to explain why sleep is important.

Create an infographic explaining why sleep is important. Use cute art style.

Infographic

Lastly, I wanted to create a poster showing different types of flowers.

Create an educational poster of different types of flowers in a vibrant watercolor style.

The result:

Infographic

Overall, the infographics look great and definitely have the potential to be used in schools.

⚠️ Limitations

Here are some limitations that they listed:

Cropping: occasionally crop longer images, like posters, too tightly, especially near the bottom (something that I encountered as well)
Hallucinations: make up information, especially in low-context prompts (definitely still a problem!)
High binding problems: may struggle to accurately render more than 10-20 distinct concepts at once, such as a full periodic table (tried to generate Standard Model elementary particles and it was struggling as well)
Precise graphing: (no description given, I didn’t try to generate data plots but I guess it can struggle to generate the correct, consistent tick labels)
Multilingual text rendering: struggles with rendering non-Latin languages, and the characters can be inaccurate or hallucinated, especially with more complexity (I’ve only tried to generate English texts, but I can see the difficulties)
Editing precision: requests to edit specific portions of an image generation, such as typos are not always effective and may also alter other parts of the image in a way that was not requested or introduce more errors (YES, I felt the same - it’s so hard to edit the generated picture sometimes that I’d rather just regenerate it instead!)
Dense information with small text: struggle when asked to render detail information at a very small size (I did encounter this, but it’s understandable why the model will struggle with more texts to render)

Overall, most of my prompts are rather short and simple. I guess if you’re really particular about the details, you could test how well it follows prompts by giving it longer, more specific ones.

📚 References

Introducing 4o Image Generation

✨ Text Rendering

✨ Character Consistency

✨ Upload and Restyle

✨ Detailed Directions

✨ Transparent Layers

✨ Infographics

⚠️ Limitations

📚 References

Enjoy Reading This Article?