Build AI the Easy Way
Oct 3, 2023

The need for AI has never been stronger. With ChatGPT, Stable Diffusion, LLaMA, and many awesome open source libraries and commercial products, it seems that one should be able to build an AI-enabled application in seconds.

But the growth of demand surely came with a great complexity in software stack complexity. 10 years ago, “AI” was no more than experimentation code shared in notebooks, and researchers’ only complaint was the relatively small problem of installing cuda and AI frameworks. Today, to build an end to end AI application, it seems that one has to understand the whole stack of modern computer engineering, from GPUs to python to Docker to every cloud computing jargon — it’s too complex.

Ideally, it should be as simple as opening python, and type the following:

model = Remote("llama2") # create a remote service running llama
response ="I love walking with my dog, and ")

That’s what we have been building for the last couple months At Lepton AI. We are excited to announce our public public beta today: to build a platform for AI engineers and application builders to easily create, adopt, and grow AI, and with open source at its core.

AI Software should be Simple

AI is hard not because we don’t have tools and abstractions. We have them. In fact, TOO MANY of them.

If you are an algorithm developer, you’ll actually find Python quite easy to deal with. CUDA, Jupyter notebook, no problem. But then you need to learn tools like Docker, Kubernetes, and many cloud computing jargons to convert experiment code and demos into shareable, production ready services — the ones that your colleagues or your target users can easily call.

On the other hand, if you are an application builder, it seems always impossible to “find the right API”. You are able to find great models on sites like GitHub andHuggingFace, but how to get these models into an easy API for e.g. node.js, without having to set up a full toolchain?

It IS possible. It’s just that many tools and platforms assume you are an “all-stack engineer” and you need to deal with, like the name suggests, the whole stack of modern computer engineering.

A prominent example is docker. Deploying AI today usually requires one to “build a docker image”. Building a web app in Vercel doesn’t require me to build a docker. Using a database in Airtable doesn’t require me to build a docker. In AI, I have a bunch of python code and a few python dependencies, sure, docker is great, but there should be much easier ways, right?

The AI-native build toolchain

We’ve created a pythonic toolchain specifically for AI use cases. Think of it as the Function-as-a-Service or NPM, and maximizes user friendliness without sacrificing quality. It makes starting an AI model as easy as one or two lines of code. For example, to run a GPT-2 model in python in the cloud, it is as easy as

from leptonai import Remote
model = Remote("hf:gpt2")

And you can call it like

> print("I love walking with my dog, and "))
I love walking with my dog, and who doesn't? He added: "I think it was love that made my life what it is. It was love on the step but still got deep."

We support the open nature of AI software with full heart. If you have custom models, like the Mistral AI open source model, and would like to use great open source libraries like vLLM to run it, and you want to create a long-running service for others to call, it’s as simple as a one-line command-line call:

lep photon run -n mymodel -m vllm:mistralai/Mistral-7B-Instruct-v0.1 --resource-shape gpu.a10

(If you have a gpu and want to run locally, replace --resource-shape gpu.a10 with --local ).

Behind the scenes, metrics, monitoring, autoscaling are all automatically set up to be production ready. And for popular models like the LLaMA2 model, you can call it directly with our API, which is completely compatible with OpenAI’s specs and your favorite language like node.js:

import OpenAI from 'openai';
const lepton = new OpenAI({
  baseURL: '<>'
async function main() {
  const completion = await{
    messages: [{ role: 'user', content: 'tell me a good story.' }],
    model: 'llama2-7b',

If you are interested, find more about us at the following pages:

Our story

We started as open source engineers. Yangqing built Caffe from Berkeley and went on to develop more AI frameworks including PyTorch 1.0. JJ wrote the first lines of code later known as ONNX, and led teams building modern tools like the vector database Proxima. Xiang created etcd which is the backbone of many of the world’s largest cloud-native Kubernetes clusters.

And then we realized one thing: users want simplicity. If we can build tools that exposes less of the infrastructure knobs and jargons, that’s going to be great. If we take care of all the DevOps details and users just need to worry about the application logic, that’s going to be great.

It’s not that we are abandoning infra best practices — collectively, we’ve managed maybe millions of machines in our past jobs. We want to build great abstraction so you don’t have to experience the pain we had over the years (trust me, there were a lot).

And we believe we are the best people to make building awesome AI applications as simple as possible.

Follow us on Twitter, LinkedIn

More blogs
Jan 6, 2024
Lepton AI has announced the general availability of the structured decoding capability for all open-source models hosted on the platform.
Aug 24, 2023
HippoML and LeptonAI to create a fast and efficient image generation model called Stable Diffusion XL (SDXL). This model, built on A100 GPUs, can generate high-resolution images in just 3 seconds, excluding internet overhead.