The Fastest Diffuser in the West
Aug 24, 2023

Imagine this: You’re in the final round of a high-stakes international design contest. Hours left to submit, you realize that a lifelike image could elevate your project to the winning spot. But there’s a catch — you can’t find the best image. Creating impeccable quality with AIGC (AI generated content) takes multiple rounds of trial and error. And each step is not only expensive, it’s painstakingly slow. As the seconds tick down, you’re left wondering, “can I get it faster?”

This is what HippoML and LeptonAI set out together to do: to be the fastest diffuser in the west. On the current A100 GPUs, every high-resolution image generation takes only 3 seconds, not counting internet overhead. Check out SDXL Playground. Whether you’re experimenting hands-on with our interactive UI or tapping into the API for seamless integration, the future of swift and sublime imaging is just a click away.


Interested in what’s under the hood? Kindly read on to discover the magic behind the scenes.

What is SDXL?

SDXL, short for Stable Diffusion XL, is the latest open source image generation model developed by Stability AI, focusing on delivering photorealistic outputs that boast intricate details and sophisticated compositions. The advent of SDXL brings image generation yet another large leap forward, making photo-realistic, hi-resolution, high quality images much more accessible than they were before. SDXL offers improved face generation, seamless text embedding, and the power to craft aesthetically pleasing art with succinct prompts.

Stable Diffusion XL


Stable Diffusion 1.5


New advances bring new challenges

You might think “OK, SDXL is higher-quality than the old SD but lower speed. I can settle for a lower resolution so it is still as fast.” Right?

Alas, while SDXL clearly produces better image quality than SD, its advantage really shines with higher resolutions. If you try to create a lower-resolution as a “compromise”, SDXL still looks cool, but you are missing out on its superiority. You really don’t want to compromise on quality.

Simply Put: the Numbers

In the rapidly evolving landscape of AIGC, the fusion of cutting-edge models and optimized performance has become a paramount focus. Recognizing this challenge, we have embarked on addressing the inference performance of SDXL. Together, we have forged a powerful synergy that brings forth unparalleled speed and efficiency to AI model deployment. Our joint endeavor has resulted in 46% (12.6 steps/s vs. 8.6 steps/s) speedup compared to stock PyTorch, allowing users to efficiently use the groundbreaking SDXL model to generate high resolution images without sacrificing the generation quality, delivering not only astounding speed but also enabling faster innovation and exploration.

With HippoML and Lepton AI optimizations, generating a high quality image at 1024 x 1024 resolution only takes 3 seconds (excluding network overhead, which varies depending on your network conditions).

HippoML: Unrivaled Performance, Simplified.

HippoML delivers a top-tier AI GPU inference solution, seamlessly blending development efficiency with optimal hardware utilization. HippoML’s platform effortlessly converts AI models into highly efficient, portable inference engines with minimal dependencies, streamlining the process of deploying cutting-edge models into actual production. With our combined expertise, we’ve optimized the Stable Diffusion XL model to achieve unparalleled speed. This enables real-time applications, once deemed prohibitive in terms of latency and cost, now entirely feasible.

Using HippoML’s optimized engine, SDXL inference on A100 rivals that of H100 with torch.compile, broadening accessibility and cutting costs.

Lepton AI: Build AI The Simple Way

With a resolute mission to offer an AI platform that empowers users to harness the capabilities of AI effortlessly, Lepton AI strives to make AI accessible “The Simple Way.” This entails enabling users to develop AI applications with utmost simplicity, efficiency, and scalability. The core of Lepton AI’s approach lies in its developer-centric platform, which facilitates the streamlined creation and deployment of AI applications at any scale. By embracing Lepton AI, users can seamlessly run AI applications with optimal efficiency, all in a matter of seconds.

In addition to the interactive UI provided above, developers can also easily use our fully managed API endpoint at with the lepton Python client:

# pip install leptonai
from leptonai.client import Client
c = Client("")
img_content ="a cat launching rocket", seed=1234)
with open("cat.png", "wb") as fid:

Or any existing https request tool to generate realistic images: e.g.

time curl -X POST \
  -H 'Content-Type: application/json' \
  -d '{"prompt": "a cat launching rocket", "seed": 1234 }' \
  2>/dev/null -o cat.png

real 0m3.268s
user 0m0.011s
sys 0m0.005s


In the realm of AI, partnerships that combine expertise and vision can lead to groundbreaking outcomes. Our collaboration with HippoML stands as a testament to what’s possible when two innovative forces join hands. Together, we’ve harnessed the power of Stable Diffusion XL to create a service that not only propels AI model inference to new heights of speed but also redefines the standards of photorealistic image generation. As we look ahead, we’re excited to continue pushing the boundaries of AI performance and exploring new horizons of possibility.

Stay tuned for more updates on this dynamic collaboration and the revolutionary capabilities it brings to the world of artificial intelligence.

More blogs
Jan 6, 2024
Lepton AI has announced the general availability of the structured decoding capability for all open-source models hosted on the platform.
Oct 3, 2023
LeptonAI aims to provide a more user-friendly and efficient toolchain for AI use cases, making it easier to deploy and run AI models.