Skip to content

Telemetry

Piri uses OpenTelemetry to emit metrics and traces for observability. You can configure custom collectors to send this data to your own monitoring infrastructure.

Metrics

Piri emits metrics via OTLP (OpenTelemetry Protocol) that can be consumed by any compatible collector.

Host Metrics

System-level metrics for monitoring node health:

Metric Type Unit Description
system_cpu_utilization Gauge 0-1 System-wide CPU utilization
system_memory_used_bytes Gauge bytes System memory in use
system_memory_total_bytes Gauge bytes Total system memory
piri_datadir_used_bytes Gauge bytes Disk space used by data directory
piri_datadir_free_bytes Gauge bytes Free disk space for data directory
piri_datadir_total_bytes Gauge bytes Total disk space for data directory

Job Queue Metrics

Track task execution in internal job queues:

Metric Type Description
active_jobs UpDownCounter Currently running jobs
queued_jobs UpDownCounter Jobs waiting in queue
failed_jobs Counter Permanently failed jobs
job_duration Histogram Job execution duration (seconds)

Labels:

Label Description
queue Name of the job queue (e.g., replicator, aggregator, egress_tracker)
job Type of job being executed
status Job outcome (success or failure)
attempt Retry attempt number (1-based)
failure_reason Reason for permanent failure (only on failed_jobs)

HTTP Server Metrics

Standard OpenTelemetry HTTP instrumentation:

Metric Type Description
http.server.request.duration Histogram Request latency
http.server.request.body.size Histogram Request body size
http.server.response.body.size Histogram Response body size

PDP Metrics

Provable Data Possession task metrics:

Metric Type Description
chain_current_epoch Gauge Current Filecoin chain epoch
next_challenge_window_start_epoch Gauge Epoch when next challenge window starts
pdp_next_failure Counter Next proving period task failures
pdp_prove_failure Counter Proof generation task failures
message_send_failure Counter Blockchain message send failures
message_estimate_gas_failure Counter Gas estimation failures

Replication Metrics

Metric Type Description
transfer_duration Histogram Replica transfer operation duration

Labels:

Label Description
source Origin endpoint where data is pulled from
sink Destination endpoint where data is written to (this node)

Server Info

Build and runtime information:

Metric Type Description
piri_server_info Info Server metadata

Labels:

Label Description
version Piri software version
commit Git commit hash of the build
built_by Build system identifier
build_date When the binary was compiled
start_time_unix Server start time (Unix timestamp)
server_type Server mode (full or ucan)
did Server's Decentralized Identifier
owner_address Ethereum address of node owner
public_url Server's publicly accessible URL
proof_set PDP proof set ID

Traces

Distributed tracing provides end-to-end visibility into operations:

Span Description
blob.accept Blob acceptance operations
blob.allocate Blob allocation operations
space.content.retrieve Content retrieval operations
AddRoots PDP root addition operations

Traces use parent-based sampling and integrate with W3C Trace Context propagation.

Integration

Prometheus

Use an OpenTelemetry Collector with a Prometheus exporter:

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      http:
        endpoint: "0.0.0.0:4317"

exporters:
  prometheus:
    endpoint: "0.0.0.0:9090"

service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [prometheus]

Configure Piri to send metrics to your collector:

[[telemetry.metrics]]
endpoint = "http://localhost:4317"
insecure = true
publish_interval = "30s"

Jaeger

For distributed tracing, configure a Jaeger backend with OTLP support:

[[telemetry.traces]]
endpoint = "http://jaeger:4317"
insecure = true

Grafana

Connect your Prometheus datasource and create dashboards using the metrics above. Key metrics to monitor:

  • System health: system_cpu_utilization, system_memory_used_bytes, piri_datadir_free_bytes
  • Job queue health: active_jobs, failed_jobs, job_duration
  • API performance: http.server.request.duration (p95, p99)

Configuration

See Configuration > telemetry for collector setup options.

Analytics

Piri can optionally send anonymized analytics to Storacha to help improve the software. See Operations > Telemetry for details and opt-out instructions.