And use it in the following codes.

import os
from leptonai.client import Client

api_token = os.environ.get('LEPTON_API_TOKEN')
client = Client("https://sambachat.lepton.run", token=api_token)

result = client.query(
    prompt="What's the capital city of Japan. Answer in Japanese.",
    params={
        "do_sample": True,
        "max_tokens_to_generate": 512,
        "repetition_penalty": 1.0,
        "temperature": 0.7,
        "top_k": 50,
        "top_p": 0.95,
    },
    select_expert="japanese",
)
print(result)

The rate limit for the Model APIs is 10 requests per minute across all models under Basic Plan. For the pricing plan, you may check out pricing page, If you need a higher rate limit with SLA or dedicated deployment, please contact us.