lep job

Manages Lepton Jobs.

Lepton Jobs are for one-time and one-off tasks that run on one or more machines. For example, one can launch a shell script that does a bunch of data processing as a job, or a distributed ML training job over multiple, connected machines. See the documentation for more details.

Usage

lep job [OPTIONS] COMMAND [ARGS]...

Options

  • --help : Show this message and exit.

Commands

  • create : Creates a job.
  • get : Gets the job with the given name.
  • list : Lists all jobs in the current workspace.
  • remove : Removes the job with the given name.

lep job create

Creates a job.

For advanced uses, check https://kubernetes.io/docs/concepts/workloads/controllers/job/.

Usage

lep job create [OPTIONS]

Options

  • -n, --name TEXT : Job name [required]
  • -f, --file TEXT : If specified, load the job spec from the file. Any explicitly passed in arg will update the spec based on the file.
  • -ng, --node-group TEXT : Node group for the job. If not set, use on-demand resources.
  • --resource-shape TEXT : Resource shape for the deployment. Available types are: 'cpu.small', 'cpu.medium', 'cpu.large', 'gpu.t4', 'gpu.a10', 'gpu.a10.6xlarge', 'gpu.a100-40gb', 'gpu.2xa100-40gb', 'gpu.4xa100-40gb', 'gpu.8xa100-40gb', 'gpu.a100-80gb', 'gpu.2xa100-80gb', 'gpu.4xa100-80gb', 'gpu.8xa100-80gb', 'gpu.h100-pcie', 'gpu.h100-sxm', 'gpu.2xh100-sxm', 'gpu.4xh100-sxm', 'gpu.8xh100-sxm'.
  • -w, --num-workers INTEGER : Number of workers to use for the job. For example, when you do a distributed training job of 4 replicas, use --num-workers 4.
  • --container-image TEXT : Container image for the job. If not set, default to leptonai.config.BASE_IMAGE
  • --port TEXT : Ports to expose for the job, in the format portnumber[:protocol].
  • --command TEXT : Command string to run for the job.
  • --intra-job-communication BOOLEAN : Enable intra-job communication. If --num-workers is set, this is automatically enabled.
  • -e, --env TEXT : Environment variables to pass to the job, in the format NAME=VALUE.
  • -s, --secret TEXT : Secrets to pass to the job, in the format NAME=SECRET_NAME. If secret name is also the environment variable name, you can omit it and simply pass SECRET_NAME.
  • --mount TEXT : Persistent storage to be mounted to the deployment, in the format STORAGE_PATH:MOUNT_PATH.
  • --completions INTEGER : (advanced feature) completion policy for the job. This is supserceded by --num-workers if the latter is set.
  • --parallelism INTEGER : (advanced feature) parallelism for the job. This is supserceded by --num-workers if the latter is set.
  • --ttl-seconds-after-finished INTEGER : (advanced feature) limits the lifetime of a job that has finished execution (either Completed or Failed). If not set, we will have it default to 72 hours. Ref: https://kubernetes.io/docs/concepts/workloads/controllers/job/#ttl-mechanism-for-finished-jobs
  • --help : Show this message and exit.

lep job list

Lists all jobs in the current workspace.

Usage

lep job list [OPTIONS]

Options

  • --help : Show this message and exit.

lep job get

Gets the job with the given name.

Usage

lep job get [OPTIONS]

Options

  • -n, --name TEXT : Job name [required]
  • --help : Show this message and exit.

lep job remove

Removes the job with the given name.

Usage

lep job remove [OPTIONS]

Options

  • -n, --name TEXT : Job name
  • --help : Show this message and exit.