Changelog

2024-09-12

Voice Mode Improvement

  • Users can now replay the generated audio when using voice mode in playground.
  • Users can now copy sample code for voice mode available in Python, JavaScript, and cURL command.
  • All LLM model APIs now have voice mode enabled, including models such as Qwen72B, Llama 3.1 8B, Llama 3.1 70B, and Llama 3.1 405B. This feature allows for voice interaction across a range of advanced language models, enhancing user experience and accessibility.

Deployment Version History

  • Users can now view the history of deployment versions and see what changes were made, when, and by whom. This feature helps users better understand the status and changes of the deployment over time and aids in debugging if needed.

Deployment Authentication Token Generation

  • Users can now generate a random token for deployment authentication while creating a new deployment directly.

Direct API Testing Under Deployment

  • Users can now try out the APIs directly under their deployments with improved API documentation and user interface. This enhancement allows for easier and more efficient API testing and integration within the deployment environment.

Deployment Status Enhancement

  • Users can now view the number of pending replicas in the deployment overview page. This feature provides a clearer and quicker understanding of the current state of the deployment, helping users to easily identify any pending actions or issues.
  • Users can now view the stopped and scaling states in the deployment status indicator. This improvement provides a clearer understanding of the deployment status, helping users easily monitor and manage their deployments.

Enhanced Replica Timeline with Event Visibility

  • Users can now observe events such as restarts and crashes directly in the replica timeline to understand the status and availability of their services.

Job Submission and Running History

  • Users can now view historical jobs by applying the 'Archived' flag in the job list filter.

2024-08-29

Self-managed Machines Monitoring

  • Users can now improve their GPUs' efficiency and reliability by adding their own machines to Lepton under the Machines > Self Managed page. Lepton helps monitor the status of these machines using the GPUd tool, which automatically identifies, diagnoses, and repairs GPU-related issues, minimizing downtime and maintaining high efficiency.

Viewing Termination Reasons for Replicas

  • Users can now see the termination reasons for replicas within deployment and job replica lists to understand why a replica was terminated and take corrective actions as needed. By hovering over the termination text, a tooltip will display the reason for termination.

Resource Shape Display in Pod Summary

  • When creating a pod, users can now clearly understand the resource shape associated with the pod they are creating by viewing the resource shape under the pod summary.

Create Dedicated Inference endpoints from Inference Page

  • Users can now easily create a deployment directly from the inference page with a single click. When viewing detailed model information, the new "Create Dedicated Deployment" button allows users to easily set up a dedicated deployment for the chosen models.

OpenAI-compatible Whisper API now available

  • We have introduced the Whisper model to the Built with Lepton page. Users can now explore and experiment with the transcribe model directly via the Built with Lepton interface. Additionally, the model is accessible through an OpenAI-compatible API, providing a seamless integration for developers. Visit the Built with Lepton to start using it.

File System Usage Display

  • Users can now view their file system usage under the Storage tab. This feature provides a clear understanding of how much data has been saved in the file system.

2024-08-15

Workspace-level log collection configuration

  • Users can now configure log collection settings at the workspace level under Settings > General > Settings. This feature allows setting a default log collection preference for the entire workspace, eliminating the need for individual configuration for each job or deployment. This enhancement promotes a more streamlined workflow and consistent log settings across all tasks within the workspace.

Deployment access with workspace user token

  • Users can now use their user token to access the API of deployments they created, in addition to the workspace token. Note that by default, user B's user token will not have access to the API of a deployment user A created.

Replica version matching with deployments

  • Replica versions on the web UI now match the versions of their deployments. Previously, deployment and replica follow their separate versioning. This allows users to more easily identify old and new replicas while updating a deployment.

Added inference tab to dashboard

  • Users can now access the Inference tab in the dashboard to choose serverless APIs of state-of-the-art (SOTA) models. This update allows users to select models, input data, and receive inference results. Additionally, users can view available APIs and track their inference usage history, enhancing convenience and user experience.

Deployment update confirmation

  • Users will now receive a confirmation if changes to the deployment configuration will result in a rolling restart of replicas. This feature allows users to review, confirm, or cancel the update beforehand, helping to prevent unintended service interruptions and ensuring greater control over deployment processes.

Moved billing under settings

  • The Billing section is now moved under Settings along with other workspace settings.

Edit Initial Delay in Deployment

  • Users can now edit the initial delay of a deployment after it has been created. This feature allows for adjustments to the initial delay to better suit user needs and deployment requirements.

2024-08-01

Redesigned dashboard after login for quick access to modules

  • Added an explore page for quick access to compute pages: deployments, jobs and pods. Other modules including Photon, Storage, Network, Observability, Billing, and Settings are grouped under the Others section for easy navigation.

Enhanced traffic management: introducing traffic splitting and shadowing for optimized deployment

  • Network configurations now support distributing traffic across multiple deployments based on assigned weights. Users can assign any positive integer weight to each deployment, and traffic will be split in proportion to these weights. Within each deployment, users can set the load balancing policy to Least Request to route traffic to the replica with the fewest active requests. If left empty, the default Round Robin policy will be used.

  • Introduced traffic shadowing in network configurations, enabling users to duplicate incoming traffic to a deployment for testing or monitoring purposes without impacting the primary traffic flow.

Customizable starting command for AI Pods

  • Users can now specify custom starting commands for AI Pods at initialization. This enhancement provides greater flexibility and control over the execution environment, enabling users to tailor their AI Pods to better fit specific workflows and use cases. The specified command requires the AI Pod's Docker image to have bash installed.

Real-time status indicators for rolling upgrades on deployment replica list page

  • Users can now track rolling upgrades on the Deployment Replica List page. Status indicators show the update progress of each running replica, providing real-time visibility into the process. This enhancement lets users efficiently monitor each replica's status and the overall deployment upgrade progress.

Display termination reasons in timeline

  • The timeline now shows the reason for termination when a deployment, job, or pod is terminated. This update provides users with critical context about why a resource was terminated, aiding in troubleshooting and corrective actions.

Configurable long-term log collection

  • Users can now configure long-term log collection to be enabled by default for pods, deployments, and jobs either at the workspace level or individually. If this field is left empty, the default settings from the workspace will be inherited and used. Note that restarting the workload will continue using the configuration set at its creation time. This feature simplifies log management by allowing broader or more granular control over log collection settings.

AI Pod supports multiple ssh keys

  • Users can now add multiple public SSH keys, separated by new lines, during the pod creation process. This improvement enhances security and collaboration by allowing multiple users or devices to access AI Pods easily.

Expanded model offerings: launch of Llama3.1-8B, Llama3.1-70B, and Llama3.1-405B in the playground

  • We have introduced the llama3.1-8B, llama3.1-70B, and llama3.1-405B models to the Playground. Users can now explore and experiment with these large-scale language models directly via the Playground interface. Additionally, these models are accessible through an OpenAI-compatible API, providing a seamless integration for developers. Visit the Playground to start using these new models.

2024-07-18

Redesigned navigation tabs to enhance user experience and streamline access to various features

  • Compute Tab: Now includes Photons, Deployments, Jobs, and Pods for better organization of compute-related resources.
  • Storage Tab: Groups File System, Object Storage, KV Store, and Queue under one tab to centralize storage management.
  • Network Tab: Networking Ingress is now categorized under the Network tab for improved access to networking configurations.
  • Observability Tab: Consolidates Logs, Monitoring, and Audit Logs to provide a unified observability interface.
  • Billing Tab: Billing functionalities are now accessible through a dedicated tab for easier financial management.
  • Settings Tab: Groups general information, Members, Tokens, Secrets, and Docker Registry settings for streamlined access to configuration options.

Support audit logs for workspace-level operations

  • Introduced support for audit logs for workspace-level operations. Users can now access and review detailed audit logs directly from the settings page. This feature enhances transparency and helps users track changes and activities within their workspace.

Support user level auth token

  • User level auth token can be used to access the workspace and perform operations on behalf of the user such as creating deployments, jobs, and pods. User level auth tokens can be found in the settings page.

Support for Role-Based Access Control (RBAC)

  • Lepton now supports role-based access control, allowing each user to have a role with specific, adjustable permissions.

Support toggle line and timestamps in logs

  • Added the ability for users to toggle line and timestamps when searching and viewing logs in the Observability tab. This enhancement improves log readability and allows users to customize their log viewing experience for more efficient troubleshooting and analysis.

Support context lookup in logs

  • Introduced context lookup: Users can now expand the context of selected lines in logs, viewing previous and subsequent lines for better clarity.

Support for launching jupyter lab option during Pod Creation

  • Users can now launch Jupyter Lab in Pods using preset images during the pod creation process.

Support for specifying user's ssh key during Pod Creation

  • Users can now spcify their SSH key during Pod creation, enabling direct SSH access to the Pod.

Support for Traffic-Based Auto-Scaling Policy

  • Users can now configure auto-scaling policies using Queries Per Minute (QPM) as the metric. This allows for dynamic scaling based on the actual traffic rate, ensuring optimal resource allocation and performance during varying load conditions.

Support for creating deployments with custom Docker images and commands

  • Users can now create deployments using a custom Docker image and specify their own commands. This allows for greater flexibility and customization in deployment configurations, catering to specific application requirements.

Support for Ingress Endpoints to route traffic to multiple deployments

  • Users can now create Ingress endpoints under the Networking tab to route traffic to multiple deployments, allowing specification of traffic distribution for each deployment separately.

Support for customizing the auto top-up amount in Billing

  • Users now have the ability to set a specific amount for automatic top-ups in their billing settings. This enhancement provides greater control and flexibility over billing preferences, ensuring that accounts are funded according to individual needs and reducing the risk of interruptions due to insufficient funds.

2024-06-26

Allow Job to select node groups

  • Added support for node group selection in CLI Job submissions. Users can now specify the desired node group for job execution using the --node-group flag.

Login support for enterprise email address

  • Users with non-Gmail email addresses can now sign up for Lepton AI using their enterprise email addresses.

Private Docker Registry UX Improvements

  • Enhanced the user experience for creating Private Image Registry Auth. Pre-filled values are now available for Docker Hub, AWS ECR, Azure CR, GitHub CR, and GCR.

2024-06-05

Job Fault-tolerance Support

  • Added job fault-tolerance support. Users can now specify the maximum number of retries for each job, both at the worker and job levels, enhancing reliability and streamlining execution.

2024-05-22

Log Persistency Support

  • Added support for persisting job logs. Users can now access logs from the job details page even after job completion.
  • Logs will be available for 30 days post-completion for enterprise tier users