lep job
Manages Lepton Jobs.
Lepton Jobs are for one-time and one-off tasks that run on one or more machines. For example, one can launch a shell script that does a bunch of data processing as a job, or a distributed ML training job over multiple, connected machines. See the documentation for more details.
Usage
lep job [OPTIONS] COMMAND [ARGS]...
Options
--help
: Show this message and exit.
Commands
create
: Creates a job.get
: Gets the job with the given name.list
: Lists all jobs in the current workspace.remove
: Removes the job with the given name.
lep job create
Creates a job.
For advanced uses, check https://kubernetes.io/docs/concepts/workloads/controllers/job/.
Usage
lep job create [OPTIONS]
Options
-n
,--name TEXT
: Job name [required]-f
,--file TEXT
: If specified, load the job spec from the file. Any explicitly passed in arg will update the spec based on the file.-ng
,--node-group TEXT
: Node group for the job. If not set, use on-demand resources.--resource-shape TEXT
: Resource shape for the deployment. Available types are: 'cpu.small', 'cpu.medium', 'cpu.large', 'gpu.t4', 'gpu.a10', 'gpu.a10.6xlarge', 'gpu.a100-40gb', 'gpu.2xa100-40gb', 'gpu.4xa100-40gb', 'gpu.8xa100-40gb', 'gpu.a100-80gb', 'gpu.2xa100-80gb', 'gpu.4xa100-80gb', 'gpu.8xa100-80gb', 'gpu.h100-pcie', 'gpu.h100-sxm', 'gpu.2xh100-sxm', 'gpu.4xh100-sxm', 'gpu.8xh100-sxm'.-w
,--num-workers INTEGER
: Number of workers to use for the job. For example, when you do a distributed training job of 4 replicas, use --num-workers 4.--container-image TEXT
: Container image for the job. If not set, default to leptonai.config.BASE_IMAGE--port TEXT
: Ports to expose for the job, in the format portnumber[:protocol].--command TEXT
: Command string to run for the job.--intra-job-communication BOOLEAN
: Enable intra-job communication. If --num-workers is set, this is automatically enabled.-e
,--env TEXT
: Environment variables to pass to the job, in the formatNAME=VALUE
.-s
,--secret TEXT
: Secrets to pass to the job, in the formatNAME=SECRET_NAME
. If secret name is also the environment variable name, you can omit it and simply passSECRET_NAME
.--mount TEXT
: Persistent storage to be mounted to the deployment, in the formatSTORAGE_PATH:MOUNT_PATH
.--completions INTEGER
: (advanced feature) completion policy for the job. This is supserceded by --num-workers if the latter is set.--parallelism INTEGER
: (advanced feature) parallelism for the job. This is supserceded by --num-workers if the latter is set.--ttl-seconds-after-finished INTEGER
: (advanced feature) limits the lifetime of a job that has finished execution (either Completed or Failed). If not set, we will have it default to 72 hours. Ref: https://kubernetes.io/docs/concepts/workloads/controllers/job/#ttl-mechanism-for-finished-jobs--help
: Show this message and exit.
lep job list
Lists all jobs in the current workspace.
Usage
lep job list [OPTIONS]
Options
--help
: Show this message and exit.
lep job get
Gets the job with the given name.
Usage
lep job get [OPTIONS]
Options
-n
,--name TEXT
: Job name [required]--help
: Show this message and exit.
lep job remove
Removes the job with the given name.
Usage
lep job remove [OPTIONS]
Options
-n
,--name TEXT
: Job name--help
: Show this message and exit.