A Quick Comparison of Text-to-Image Models: Flux, Stable Diffusion 3, DALL·E 3, and Kling

Last week, a new state-of-the-art text-to-image model called Flux was released by Black Forest Labs (the original creators of Stable Diffusion), which is open-sourced and offers capabilities comparable to Midjourney. Curious about its quality compared to other models, I conducted a quick one-shot generation test for the following models (prices are estimated based on official pricing websites and replicate.com):

Model Name	Company	Type	Price per Image
Flux Schnell	Black forest labs	Open Source	$0.003 / image
Flux Pro	Black forest labs	Open Source	$0.055 / image
Stable Diffusion 3	Stability.ai	Open Source	$0.035 / image
Dalle 3	OpenAI	Closed Source	$0.040 / image
Kling	KuaiShou	Closed Source	$0.002 / image

I used the following prompt for general image with an artist style:

a surreal landscape with floating islands and a giant glowing moon in the style of Hayao Miyazaki

and another prompt to test the text generation:

gateau cake spelling out the words "Takin.AI", tasty, food photography, dynamic shot

The testing results are listed below.

For the first prompt, I prefer the Flux Schnell and Kling results, which are also the most affordable models.
For the second prompt, I like the results from Flux Schnell and Dalle3 the most.

You can use text2image models such as Flux, SD3, Dalle3, and ControlNets with one single account from Takin.ai - start with a free account to try the examples in this post.

Flux Schnell (fastest - only took 1.3 second):

Flux Pro (took about 8.1 second):

Dalle 3:

SD 3:

Kling:

PS. The featured image for this post is generated using HiddenArt tool from Takin.ai.