Quality of Service Monitoring
QoS is the observability layer for running Blueprints. As an operator, you decide how metrics, logs, and dashboards are exposed to your team or customers. This page outlines what QoS exports, how to configure access safely, and how on-chain metrics affect your operator status.
What Gets Exported
QoS uses Prometheus-compatible metrics by default, with optional Grafana and Loki.
| Component | Default Endpoint | Notes |
|---|---|---|
| Prometheus metrics | http://<host>:9090/metrics | Includes /health plus Prometheus v1 API routes like /api/v1/query. |
| Grafana UI | http://<host>:3000 | Only when configured or managed by QoS. |
| Loki push API | http://<host>:3100/loki/api/v1/push | Only when configured or managed by QoS. |
On-Chain Metrics and Operator Status
Blueprints can report custom numeric metrics on-chain via heartbeats. These metrics are stored in the OperatorStatusRegistry contract and visible to anyone. As an operator, you should understand how this affects you.
What Gets Reported
The Blueprint developer defines which metrics are reported. Common examples include response time, uptime percentage, job completion rate, and resource utilization. Each metric has a name and a u64 value.
Validation and Violations
Service owners can define MetricDefinition bounds for each metric (min/max values, required flag). When your operator submits a heartbeat with metrics:
- Values outside the defined range trigger a
MetricViolationevent - Missing required metrics also trigger violations
- Violations are logged on-chain but do not auto-slash
Slashing Risk
Violations alone do not slash your stake. However, an off-chain keeper or governance process can call reportForSlashing() based on repeated violations. To minimize risk:
- Ensure your node has stable network connectivity (missed heartbeats accumulate)
- Monitor your operator’s status via
isHeartbeatCurrent(serviceId, yourAddress) - Check if you appear in
getSlashableOperators(serviceId)and resolve issues promptly - Review the Blueprint’s metric definitions to understand what values are expected
Checking Your Status
Query the contract directly or use a block explorer:
# Using cast (foundry)
cast call $REGISTRY "isHeartbeatCurrent(uint64,address)" $SERVICE_ID $YOUR_ADDRESS --rpc-url $RPC
cast call $REGISTRY "getOperatorState(uint64,address)" $SERVICE_ID $YOUR_ADDRESS --rpc-url $RPC
cast call $REGISTRY "getMetricValue(uint64,address,string)" $SERVICE_ID $YOUR_ADDRESS "response_time_ms" --rpc-url $RPCManaged Stack vs External Stack
Managed Stack (Docker)
If the Blueprint enables manage_servers, QoS will launch Grafana/Loki/Prometheus containers. You should:
- Ensure Docker is available on the host.
- Mount persistent volumes for Grafana and Loki (
data_dir). - Override default Grafana credentials (defaults are admin/admin and anonymous access is on).
- Open ports only on trusted networks or front them with a proxy.
External Stack (Recommended for Production)
Run your own observability stack and point QoS to it:
- Configure Prometheus to scrape
http://<host>:9090/metrics. - Set
GrafanaConfig.prometheus_datasource_urlto your Prometheus URL. - If you use Loki, set
LokiConfig.urlto your Loki push endpoint.
This approach keeps credentials and retention policies under your control.
Quick Verification
# Check if QoS metrics endpoint is running
curl -s http://localhost:9090/health
# View exported metrics
curl -s http://localhost:9090/metrics | head -n 20
# Check heartbeat status on-chain
cast call $REGISTRY "isHeartbeatCurrent(uint64,address)" $SERVICE_ID $YOUR_ADDRESS --rpc-url $RPCEnvironment Variables
| Variable | Default | Description |
|---|---|---|
QOS_ENABLED | false | Enable the QoS service |
QOS_HEARTBEAT_INTERVAL_SECS | 300 | Heartbeat interval in seconds |
QOS_METRICS_INTERVAL_SECS | 60 | Metrics collection interval in seconds |
QOS_DRY_RUN | true | Skip on-chain submissions (for testing) |
BLUEPRINT_KEYSTORE_URI | — | Path to keystore for signing heartbeats |
Security Notes
- Do not expose Grafana with default credentials.
- Prefer a reverse proxy with auth and TLS.
- If you allow public dashboards, isolate them from write endpoints.
- On-chain metrics are public. Do not report sensitive data as metric values.