TinyLlama 1.1B Chat
Ultra-lightweight chat model for low-cost prototyping on DCP.
1. What it is
TinyLlama 1.1B Chat (`TinyLlama/TinyLlama-1.1B-Chat-v1.0`) is a small 1.1B instruction/chat model.
2. What it does
It supports low-cost experimentation, lightweight assistants, and simple text tasks on smaller GPUs.
3. How it compares
- Versus Mistral/Llama 3 8B: much cheaper and lighter, but lower reasoning quality.
- Versus Phi-2: similar low-cost class; pick based on your prompt style and eval outcomes.
4. Best for on DCP
- Prototyping and dev/test APIs
- Cost-sensitive internal tools
- Simple classification and drafting
5. Hardware requirements on DCP
- Runtime floor in templates/routes: `>=4 GB` (prefer 8 GB for stable throughput)
- Recommended providers: entry GPUs or any higher-tier machine
- Template: `ollama` (or lightweight `llm-inference` jobs)
6. How to run on DCP
- Use template `ollama` and keep `MODEL_ID=TinyLlama/TinyLlama-1.1B-Chat-v1.0`.
- For job API scripts, use `job_type: "llm-inference"` and set `params.model` to TinyLlama.
- Tune `max_tokens` low for best latency/cost.
7. Licensing and commercial-use notes
TinyLlama is released under Apache-2.0 in the model card, which is generally commercial-friendly with attribution/notice obligations.
Sources:
- https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0
- /home/node/dc1-platform/docker-templates/ollama.json
- /home/node/dc1-platform/backend/src/routes/jobs.js