TinyLlama 1.1B Chat

Ultra-lightweight chat model for low-cost prototyping on DCP.

1. What it is

TinyLlama 1.1B Chat (`TinyLlama/TinyLlama-1.1B-Chat-v1.0`) is a small 1.1B instruction/chat model.

It supports low-cost experimentation, lightweight assistants, and simple text tasks on smaller GPUs.

Versus Mistral/Llama 3 8B: much cheaper and lighter, but lower reasoning quality.
Versus Phi-2: similar low-cost class; pick based on your prompt style and eval outcomes.

Use template `ollama` and keep `MODEL_ID=TinyLlama/TinyLlama-1.1B-Chat-v1.0`.
For job API scripts, use `job_type: "llm-inference"` and set `params.model` to TinyLlama.
Tune `max_tokens` low for best latency/cost.

TinyLlama is released under Apache-2.0 in the model card, which is generally commercial-friendly with attribution/notice obligations.

Sources: