Deployment

🚀 No GPU? No problem!

You don’t need to own a GPU to run TuFT. These guides show how to stand up a TuFT server on pay-as-you-go cloud providers — rent a GPU on demand and release it when you’re done. Train from your laptop (no local GPU) by pointing the Tinker SDK at the cloud server.

TuFT is a single, standard server (tuft launch). The deploy helpers in the deploy/ directory just run that exact server on rented compute and wire up storage, ports, and secrets for you — they don’t change anything about the product. Pick the backend that fits your workflow:

Modal

Serverless, scale-to-zero. Deploys a web endpoint that scales to zero when idle, billed per second. Suited to bursty or intermittent use.

Deploy on Modal
Lambda Cloud

A plain on-demand GPU VM, billed per minute until you terminate it. No orchestration layer; the instance runs until you stop it.

Deploy on Lambda Cloud

Which one should I pick?

If you want…

Use

Description

To scale to zero when idle

Modal

Serverless; scale-to-zero, per-second billing, managed proxy and volumes.

A single dedicated GPU instance

Lambda Cloud

On-demand, billed per minute until you terminate; no orchestration layer, no preemption.

Keeping the GPU busy

A single laptop-driven training run leaves the rented GPU under-utilized: the client tokenizes data, builds batches, and waits on HTTP round-trips between GPU bursts, so the GPU sits idle for much of the run. Because TuFT is multi-tenant, one deployed server can host several concurrent jobs or users on the same GPU — give each their own key under authorized_users, and raise max_loras so multiple LoRA adapters can train at once. Sharing this way improves utilization and splits the cost. It matters most on Lambda (billed continuously whether or not the GPU is working) and during active Modal sessions (the container is warm but the GPU is bursty).

Both guides walk through the same end-to-end example: configuring the server, deploying it, training a “talk like Yoda” LoRA on Qwen/Qwen3-0.6B from your laptop, and downloading the trained adapter — all reusing the runnable code in examples/personality_sft/.