On-demand price is $0.07 / million tokens.
Get more details

And use it in the following codes.

import os
import openai

client = openai.OpenAI(
    base_url="https://llama3-8b.lepton.run/api/v1/",
    api_key=os.environ.get('LEPTON_API_TOKEN')
)

completion = client.chat.completions.create(
    model="llama3-8b",
    messages=[
        {"role": "user", "content": "say hello"},
    ],
    max_tokens=128,
    stream=True,
)

for chunk in completion:
    if not chunk.choices:
        continue
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="")

The rate limit for the Serverless Endpoints is 10 requests per minute across all models under Basic Plan. If you need a higher rate limit with SLA please upgrade to standard plan, or use dedicated deployment.

Lepton AI

© 2024