BuildQuality of Service

Quality of Service (QoS) Integration Guide

SDK source (GitHub): https://github.com/tangle-network/blueprint/tree/v2/crates/qos

This guide explains how to integrate the Blueprint SDK Quality of Service (QoS) system for observability, monitoring, and dashboards. QoS combines heartbeats, metrics, logs, and Grafana dashboards into a single service that you can run alongside any Blueprint.

QoS Summary

The Blueprint QoS system provides a complete observability stack:

  • Heartbeat Service: submits periodic liveness signals to the status registry
  • Metrics Collection: exports system and job metrics via a Prometheus-compatible endpoint
  • Custom On-Chain Metrics: reports arbitrary numeric metrics on-chain via ABI-encoded heartbeats
  • Logging: streams logs to Loki (optional)
  • Dashboards: builds Grafana dashboards (optional)
  • Server Management: can run Grafana/Loki/Prometheus containers for you

What QoS Exposes

QoS always exposes a Prometheus-compatible metrics endpoint when metrics are enabled. Grafana and Loki are optional and can be managed by QoS or connected externally.

ComponentDefault EndpointNotes
Prometheus metricshttp://<host>:9090/metricsIncludes /health plus Prometheus v1 API routes like /api/v1/query.
Grafana UIhttp://<host>:3000Only when configured or managed by QoS.
Loki push APIhttp://<host>:3100/loki/api/v1/pushOnly when configured or managed by QoS.

Integrating QoS with BlueprintRunner

If you use BlueprintRunner, it wires the HTTP RPC endpoint, keystore URI, and status registry address into QoS for you:

let qos_config = blueprint_qos::default_qos_config();
let heartbeat_consumer = Arc::new(MyHeartbeatConsumer::new());
 
BlueprintRunner::builder(TangleEvmConfig::default(), env)
    .router(router)
    .qos_service(qos_config, Some(heartbeat_consumer))
    .run()
    .await?;

Note: BlueprintRunner::qos_service enables manage_servers(true) internally. If you want to avoid managed containers, pass a config with grafana_server: None and loki_server: None.

HeartbeatConsumer and Keystore Requirements

Heartbeats require a keystore with an ECDSA key. Use BLUEPRINT_KEYSTORE_URI or --keystore-path so QoS can sign heartbeats.

cargo tangle key --algo ecdsa --keystore ./keystore --name operator
export BLUEPRINT_KEYSTORE_URI="$(pwd)/keystore"

Implement the heartbeat consumer using the current trait signature:

use blueprint_qos::heartbeat::{HeartbeatConsumer, HeartbeatStatus};
use blueprint_qos::error::Result as QoSResult;
use std::future::Future;
use std::pin::Pin;
 
#[derive(Clone)]
struct MyHeartbeatConsumer;
 
impl HeartbeatConsumer for MyHeartbeatConsumer {
    fn send_heartbeat(
        &self,
        _status: &HeartbeatStatus,
    ) -> Pin<Box<dyn Future<Output = QoSResult<()>> + Send>> {
        Box::pin(async move { Ok(()) })
    }
}

Configuration Options

Default Configuration

let qos_config = blueprint_qos::default_qos_config();

This enables metrics, Loki logging, and Grafana integration. Whether containers start depends on manage_servers (BlueprintRunner forces it on; see note above).

Bring Your Own Observability Stack

Point QoS at your existing Grafana/Loki/Prometheus stack by overriding the configs and keeping manage_servers off:

let qos_config = QoSConfig {
    metrics: Some(MetricsConfig {
        prometheus_server: Some(PrometheusServerConfig {
            host: "0.0.0.0".into(),
            port: 9090,
            use_docker: false,
            ..Default::default()
        }),
        ..Default::default()
    }),
    grafana: Some(GrafanaConfig {
        url: "http://grafana.internal:3000".into(),
        api_key: Some(std::env::var("GRAFANA_API_KEY")?),
        prometheus_datasource_url: Some("http://prometheus.internal:9090".into()),
        ..Default::default()
    }),
    loki: Some(LokiConfig {
        url: "http://loki.internal:3100/loki/api/v1/push".into(),
        ..Default::default()
    }),
    manage_servers: false,
    ..blueprint_qos::default_qos_config()
};

Managed Observability Stack

QoS can spin up Grafana, Loki, and Prometheus containers for you. Make sure Docker is available.

let qos_config = QoSConfig {
    manage_servers: true,
    grafana_server: Some(GrafanaServerConfig {
        admin_user: "admin".into(),
        admin_password: "change-me".into(),
        allow_anonymous: false,
        data_dir: "/var/lib/grafana".into(),
        ..Default::default()
    }),
    loki_server: Some(LokiServerConfig {
        data_dir: "/var/lib/loki".into(),
        config_path: Some("./loki-config.yaml".into()),
        ..Default::default()
    }),
    prometheus_server: Some(PrometheusServerConfig {
        host: "0.0.0.0".into(),
        port: 9090,
        use_docker: true,
        config_path: Some("./prometheus.yml".into()),
        data_path: Some("./prometheus-data".into()),
        ..Default::default()
    }),
    docker_network: Some("blueprint-observability".into()),
    docker_bind_ip: Some("0.0.0.0".into()),
    ..blueprint_qos::default_qos_config()
};

Builder Pattern

Use the builder when you want explicit wiring for heartbeats or custom datasources:

let qos_service = QoSServiceBuilder::new()
    .with_heartbeat_config(HeartbeatConfig {
        service_id,
        blueprint_id,
        interval_secs: 60,
        jitter_percent: 10,
        max_missed_heartbeats: 3,
        status_registry_address,
    })
    .with_heartbeat_consumer(Arc::new(consumer))
    .with_http_rpc_endpoint(env.http_rpc_endpoint.to_string())
    .with_keystore_uri(env.keystore_uri.clone())
    .with_status_registry_address(status_registry_address)
    .with_metrics_config(MetricsConfig::default())
    .with_grafana_config(GrafanaConfig::default())
    .with_loki_config(LokiConfig::default())
    .with_prometheus_server_config(PrometheusServerConfig::default())
    .manage_servers(true)
    .build()
    .await?;

Recording Metrics and Events

Track job execution and errors in your handlers:

if let Some(qos) = &ctx.qos_service {
    qos.record_job_execution(
        JOB_ID,
        start_time.elapsed().as_secs_f64(),
        ctx.service_id,
        ctx.blueprint_id,
    );
}
if let Some(qos) = &ctx.qos_service {
    qos.record_job_error(JOB_ID, "complex_operation_failure");
}

Custom On-Chain Metrics

Custom on-chain metrics let your Blueprint report arbitrary numeric values that are ABI-encoded into each heartbeat, stored on the OperatorStatusRegistry contract, and queryable by anyone. This enables transparent SLA enforcement, slashing based on performance, and cross-operator comparison.

How It Works

The flow from Rust to on-chain storage:

Blueprint Rust code                    Heartbeat Service                 On-Chain
───────────────────                    ─────────────────                 ────────
provider.add_on_chain_metric(          Periodically drains               Contract stores
  "response_time_ms", 150              metrics, ABI-encodes              MetricPair[] in
)                                      as MetricPair[], signs            operatorMetrics
provider.add_on_chain_metric(          and submits via                   mapping, validates
  "uptime_percent", 99                 submitHeartbeatDirect()           against definitions
)

Metrics use Solidity-compatible ABI encoding (MetricPair[]), not Rust-specific serialization. The encoding is handled automatically by the SDK.

On-Chain Setup (Service Owner)

Before operators can report custom metrics, the service owner must enable them on the OperatorStatusRegistry contract and optionally define validation bounds.

// Enable custom metrics for the service
registry.enableCustomMetrics(serviceId, true);
 
// Define metric schemas with validation bounds
IOperatorStatusRegistry.MetricDefinition[] memory defs =
    new IOperatorStatusRegistry.MetricDefinition[](2);
 
defs[0] = IOperatorStatusRegistry.MetricDefinition({
    name: "response_time_ms",
    minValue: 0,
    maxValue: 5000,
    required: true
});
 
defs[1] = IOperatorStatusRegistry.MetricDefinition({
    name: "uptime_percent",
    minValue: 0,
    maxValue: 100,
    required: false
});
 
registry.setMetricDefinitions(serviceId, defs);

MetricDefinition fields:

FieldTypeDescription
namestringMetric identifier (must match Rust key)
minValueuint256Minimum acceptable value (inclusive)
maxValueuint256Maximum acceptable value (inclusive)
requiredboolIf true, missing metric emits MetricViolation

When a heartbeat arrives with metrics, the contract validates each reported value against these definitions. Out-of-bounds or missing required metrics emit a MetricViolation event but do not auto-slash. An off-chain keeper can monitor these events and call reportForSlashing() when policy warrants it.

Reporting Metrics in Rust

In your Blueprint Rust code, use the MetricsProvider trait to push on-chain metrics:

use blueprint_qos::metrics::types::MetricsProvider;
 
// Get the provider from the QoS service
let provider = qos_service.provider().unwrap();
 
// Report metrics (these accumulate until the next heartbeat drains them)
provider.add_on_chain_metric("response_time_ms".into(), 150).await;
provider.add_on_chain_metric("uptime_percent".into(), 99).await;

Metrics are accumulated in memory and automatically drained into the next heartbeat. No ABI encoding knowledge is required on the developer side.

The two metric APIs serve different purposes:

MethodValue TypeDestinationUse Case
add_custom_metric()StringPrometheus / GrafanaObservability, dashboards
add_on_chain_metric()u64On-chain via heartbeatSLA enforcement, slashing, billing

Querying Metrics On-Chain

Anyone can read stored operator metrics from the contract:

// Get a specific metric value for an operator
uint256 responseTime = registry.getMetricValue(
    serviceId,
    operatorAddress,
    "response_time_ms"
);
 
// Get all metric definitions for a service
IOperatorStatusRegistry.MetricDefinition[] memory defs =
    registry.getMetricDefinitions(serviceId);
 
// Check if an operator's heartbeat is current
bool current = registry.isHeartbeatCurrent(serviceId, operatorAddress);
 
// Get operators who have missed too many heartbeats
address[] memory slashable = registry.getSlashableOperators(serviceId);

Metric Validation and Slashing

The contract validates metrics against MetricDefinition bounds on every heartbeat. Violations emit events:

event MetricViolation(
    uint64 indexed serviceId,
    address indexed operator,
    string metricName,
    string reason
);

Violation reasons include:

  • "required metric missing" — a required metric was not reported
  • "value below minimum" — reported value < minValue
  • "value above maximum" — reported value > maxValue

Slashing is intentionally decoupled from validation. Auto-slashing from metric violations is dangerous because transient spikes or network delays could trigger false positives. Instead:

  1. An off-chain keeper monitors MetricViolation events
  2. When policy warrants it (e.g., repeated violations), the keeper calls reportForSlashing(serviceId, operator, reason)
  3. The contract sets the operator’s status to Slashed
  4. The staking layer can then execute the actual slash

ABI Encoding Details

The SDK uses alloy-sol-types to produce ABI-encoded bytes matching abi.decode(data, (MetricPair[])):

// This is handled internally, but for reference:
sol! {
    struct MetricPair {
        string name;
        uint256 value;
    }
}
 
fn encode_metric_pairs(metrics: &[(String, u64)]) -> Vec<u8> {
    let pairs: Vec<MetricPair> = metrics.iter().map(|(name, value)| {
        MetricPair {
            name: name.clone(),
            value: alloy_primitives::U256::from(*value),
        }
    }).collect();
    pairs.abi_encode()
}

The u64 to uint256 conversion is safe because all realistic metric values fit within u64::MAX.

End-to-End Example

Here is a complete example showing a Blueprint that reports response time and uptime metrics:

Solidity setup (service deployment script):

// In your Blueprint Service Manager constructor or setup
registry.configureHeartbeat(serviceId, HeartbeatConfig({
    interval: 60,
    maxMissed: 3,
    customMetrics: true
}));
 
registry.enableCustomMetrics(serviceId, true);
 
MetricDefinition[] memory defs = new MetricDefinition[](2);
defs[0] = MetricDefinition("response_time_ms", 0, 5000, true);
defs[1] = MetricDefinition("uptime_percent", 0, 100, false);
registry.setMetricDefinitions(serviceId, defs);

Rust Blueprint handler:

async fn handle_job(ctx: &BlueprintContext) -> Result<(), Error> {
    let start = std::time::Instant::now();
 
    // ... do work ...
 
    let duration_ms = start.elapsed().as_millis() as u64;
 
    // Report to on-chain metrics (flows to next heartbeat automatically)
    if let Some(provider) = ctx.qos_service.as_ref().and_then(|q| q.provider()) {
        provider.add_on_chain_metric("response_time_ms".into(), duration_ms).await;
        provider.add_on_chain_metric("uptime_percent".into(), 99).await;
    }
 
    Ok(())
}

Querying on-chain (from any contract or script):

uint256 rt = registry.getMetricValue(serviceId, operator, "response_time_ms");
require(rt <= 5000, "SLA violated");

Creating Grafana Dashboards

let mut qos_service = qos_service;
qos_service.create_dashboard("My Blueprint").await?;

The default dashboard template lives at crates/qos/config/grafana_dashboard.json in the SDK.

Accessing Metrics in Code

You can query the metrics provider directly (for custom metrics or status checks):

use blueprint_qos::metrics::types::MetricsProvider;
 
if let Some(qos) = &ctx.qos_service {
    if let Some(provider) = qos.provider() {
        let system_metrics = provider.get_system_metrics().await;
        let _cpu = system_metrics.cpu_usage;
 
        // Prometheus/Grafana metrics (string values)
        provider
            .add_custom_metric("custom.label".into(), "value".into())
            .await;
 
        // On-chain metrics (u64 values, included in next heartbeat)
        provider
            .add_on_chain_metric("jobs_completed".into(), 42)
            .await;
    }
}

Best Practices

DO:

  • Initialize QoS early in your Blueprint startup sequence.
  • Use BlueprintRunner::qos_service(...) to auto-wire RPC + keystore + status registry.
  • Keep Prometheus reachable (bind to 0.0.0.0 if scraped externally).
  • Replace default Grafana credentials when using managed servers.
  • Use add_on_chain_metric() for values that affect SLA/slashing; use add_custom_metric() for observability-only data.
  • Define MetricDefinition bounds conservatively. Tight bounds catch real issues; overly tight bounds cause false positives.
  • Set required: true only for metrics your Blueprint always reports. Optional metrics should use required: false.

DON’T:

  • Don’t enable heartbeats without setting BLUEPRINT_KEYSTORE_URI.
  • Don’t expose managed Grafana publicly without auth.
  • Don’t ignore QoS startup errors; they usually indicate misconfigured ports or credentials.
  • Don’t auto-slash on MetricViolation events. Use a keeper with policy logic to avoid slashing on transient spikes.
  • Don’t submit metrics with string keys that don’t match your MetricDefinition names. Unrecognized metrics are stored but not validated.

QoS Components Reference

ComponentPrimary StructConfigPurpose
Unified ServiceQoSServiceQoSConfigMain entry point for QoS integration
HeartbeatHeartbeatServiceHeartbeatConfigLiveness signals to the status registry
MetricsMetricsServiceMetricsConfigSystem + job metrics and Prometheus export
On-Chain MetricsMetricsProviderN/Aadd_on_chain_metric() for chain storage
ABI EncodingMetricPairN/ASolidity-compatible encoding via alloy
LoggingN/ALokiConfigLog aggregation via Loki
DashboardsGrafanaClientGrafanaConfigDashboards and datasources
Server ManagementServerManagerServer configsManages Docker containers for the stack

Contract Reference

The OperatorStatusRegistry contract provides these key functions for metrics:

FunctionAccessDescription
enableCustomMetrics(serviceId, bool)Service OwnerEnable/disable custom metric processing
setMetricDefinitions(serviceId, defs[])Service OwnerSet validation bounds for metrics
addMetricDefinition(serviceId, ...)Service OwnerAdd a single metric definition
getMetricValue(serviceId, operator, name)AnyoneRead a stored metric value
getMetricDefinitions(serviceId)AnyoneList all metric definitions
isHeartbeatCurrent(serviceId, operator)AnyoneCheck operator liveness
getSlashableOperators(serviceId)AnyoneList operators past heartbeat threshold
reportForSlashing(serviceId, operator, reason)AnyoneFlag an operator for slashing
getOperatorState(serviceId, operator)AnyoneFull operator state (heartbeat, status, metrics hash)