Runtime Verification Runbook

Operator checklist to verify daemon heartbeat, container runtime readiness, and end-to-end non-payment compute flow.

Scope

Use this runbook after daemon install, host maintenance, or runtime upgrades.

This checklist covers:

  1. Host GPU and container runtime readiness.
  2. Provider API auth and heartbeat visibility.
  3. End-to-end renter job path (submit -> logs -> output).

It intentionally excludes payment, escrow, and payout flows.

1) Host readiness checks

# GPU driver + device visibility
nvidia-smi

# Docker daemon status
docker ps

# Verify NVIDIA runtime is registered
docker info | grep -i runtime

Expected:

  • `nvidia-smi` shows the target GPU.
  • Docker command returns healthy daemon output.
  • `docker info` output includes `nvidia` runtime.

2) Provider auth and heartbeat

export DCP_API="https://dcp.sa/api/dc1"
export PROVIDER_KEY="dcp-provider-your-key"

curl -s "$DCP_API/providers/me?key=$PROVIDER_KEY"

Validate:

  • `provider.status` is `online` (or transitions to `online` after daemon restart).
  • `provider.last_heartbeat` updates roughly every 30 seconds.
  • `provider.daemon_version` is present and expected for your rollout.

3) Non-payment renter flow smoke

export RENTER_KEY="dcp-renter-your-key"

# Account + capacity checks
curl -s "$DCP_API/renters/me?key=$RENTER_KEY"
curl -s "$DCP_API/renters/available-providers"

Submit a short compute job:

JOB_JSON=$(curl -s -X POST "$DCP_API/jobs/submit" \
  -H "Content-Type: application/json" \
  -H "x-renter-key: $RENTER_KEY" \
  -d &#;{
    "provider_id": ,
    "job_type": "llm-inference",
    "duration_minutes": ,
    "max_duration_seconds": ,
    "container_spec": { "image_type": "vllm-serve" },
    "params": {
      "model": "TinyLlama/TinyLlama-.1B-Chat-v1.",
      "prompt": "Runtime verification check"
    }
  }&#;)

echo "$JOB_JSON"
JOB_ID=$(echo "$JOB_JSON" | jq -r &#;.job.job_id // .job_id&#;)
echo "JOB_ID=$JOB_ID"

Watch logs and final output:

curl -s "$DCP_API/jobs/$JOB_ID/logs?key=$RENTER_KEY"
curl -s "$DCP_API/jobs/$JOB_ID/output?key=$RENTER_KEY"

Pass criteria:

  • Submit returns `201` and job id.
  • Logs endpoint shows runtime progression (`pending` -> `running`).
  • Output endpoint reaches terminal state (`completed` or expected controlled failure).

4) Operator notes

  • Prefer header auth (`x-provider-key`, `x-renter-key`) for automation; keep query-key usage to legacy GET endpoints.
  • If any check fails, capture response bodies and timestamps in incident notes before restarting services.
  • Re-run this checklist after PM2 restarts or daemon upgrades.

Related docs