Operations and Production Checklist¶

This document captures the minimum runbook for running Pali in production-like environments.

Pre-deploy checklist¶

Keep configuration and data paths outside the repository and mounted from deployment secrets/config maps.
Keep jwt_secret, provider API keys, and database credentials outside source control.
Validate startup before rollout:
pali init -config /etc/pali/pali.yaml
scripts/verify_docs_examples.sh for repo-level command drift
For non-dev environments:
auth.enabled: true
long random auth.jwt_secret value
strong reverse-proxy TLS termination
Ensure data persistence:
database.sqlite_dsn points to persistent storage, not ephemeral temp directories.
directory permissions are restricted to the service user.
Confirm tenant/graph prerequisites for current profile:
Neo4j password provided when entity_fact_backend: neo4j
embeddings provider readiness checks pass

./bin/pali serve -config /etc/pali/pali.yaml

./bin/pali mcp serve -config /etc/pali/pali.yaml

Keep run-time supervisors enabled (systemd/Kubernetes/docker restart policy) and explicit health check loops.
Maintain a rollback image or binary snapshot for immediate restore.
Roll back by swapping config/binary and restarting to the last known-good version.

Built-in rate limiting is not part of the core service; enforce it at gateway/proxy.
There is no dedicated metrics endpoint in this version. Use proxy/app logs plus external monitoring of process and dependency health.
SQLite single-node persistence remains simpler than highly replicated topologies; scale vector or graph stores for heavier multi-node operational needs.

Check process and restart status in supervisor.
Validate network and TLS path from gateway to Pali.
Run /health and tenant auth checks.
Review startup and request logs for recent migration/provider failures.
Validate disk and backing store availability (SQLite/Qdrant/Neo4j).
If required, roll back to the last known-good deployment and preserve evidence for post-mortem.