Skip to main content
Version: dev

Monitoring and metrics

Draft page

For the full metrics list, Prometheus configuration, Grafana setup, and OpenTelemetry instrumentation, see Monitoring in Advanced operations.

The Aztec node exposes 100+ Prometheus metrics. Most operators only need to watch a small handful to keep their sequencer healthy.

TL;DR

  • Alert on L2 block height not advancing for 15 minutes.
  • Alert on publisher ETH below 0.5 ETH. A dry publisher misses proposals, which is slashable as inactivity.
  • Alert on peer count below 5. Network reachability problem.
  • For everything else, the community dashboards (dashtec.xyz, aztec.vision, slashveto.me) cover you.

The metrics that matter

Aztec emits OpenTelemetry metrics, which Prometheus scrapes as snake-cased names (aztec.archiver.block_height becomes aztec_archiver_block_height on the wire).

Prometheus metricWhat it tells youAlert if
aztec_archiver_block_heightYour local view of the L2 chain tipNo advance for 15 minutes
aztec_archiver_l1_block_heightYour local view of L1 progressionNo advance for 5 minutes (likely your L1 RPC is down)
aztec_peer_manager_peer_countP2P connectivity to other nodesDrops below 5
aztec_l1_publisher_balanceWei in your publisher account for paying L1 gasDrops below 0.5 ETH (5e17 wei)
aztec_mempool_tx_countTransactions waiting to be includedSustained growth or sudden spike

Plus standard system metrics (process_cpu_seconds_total, process_resident_memory_bytes) for the underlying host.

For sequencer state, use aztec_sequencer_state_transition_buffer_duration (timing of state transitions). The full metric list is in Metrics reference.

What every operator should alert on

In priority order:

  1. No L2 blocks processed in the last 15 minutes (critical). Your node is stuck or the network has stalled. Either way, you need to know immediately.
  2. Publisher balance below 0.5 ETH (critical). When the publisher runs dry, you stop being able to publish proposals, which is a slashable inactivity offense.
  3. Peer count below 5 (warning). Network reachability problem. Check port forwarding and firewall.
  4. L1 block height stalled for 5+ minutes (warning). Your L1 RPC is degraded. See L1 RPC for common causes.
  5. CPU sustained above 70% of cores (warning). May indicate the node is struggling to keep up; check disk IOPS and RAM headroom.

How to wire this up

The shortest path from zero to a working monitoring stack:

  1. Enable metrics on your node by setting OTEL_EXPORTER_OTLP_METRICS_ENDPOINT and OTEL_SERVICE_NAME in your node's environment. Aztec exports OpenTelemetry by default.
  2. Run Prometheus to scrape http://your-node:metrics-port/metrics at 15-second intervals.
  3. Run Grafana pointed at Prometheus.
  4. Import a dashboard. The community-maintained Grafana dashboard for Aztec is ID 23054 (verify the current ID against the pittpv monitoring script, which is kept up to date alongside protocol releases).

For step-by-step instructions on each, see Monitoring setup guides in Advanced operations. Both Prometheus and Grafana configurations are covered in detail there.

Community monitoring options

If you do not want to maintain your own Prometheus + Grafana stack:

  • pittpv's monitoring script runs a bundled installer with Telegram alerts. The most widely used operator-monitoring tool today. See Operator tooling.
  • dashtec.xyz shows per-epoch performance for any registered attester. No install required; sign in and add your attester to your watchlist.
  • aztec.vision surfaces misconfigured coinbase addresses and provider-level stats. Useful even if you only check it manually a few times a week.

What about alerting on slashing risk

The Aztec node does not natively emit a "you are about to be slashed" metric. The slashing voting process happens on L1 through the TallySlashingProposer contract; by the time a slash payload is queued, it is too late to fix the underlying behavior.

The actionable proxies:

  • Watch aztec_l1_publisher_balance (above). Most slashing incidents trace back to a failed publish, which traces back to an L1 issue.
  • Subscribe to slashveto.me announcements. The community veto council publishes pending slash payloads before they execute.
  • Watch your performance on dashtec.xyz. A drop in your attestation rate is the early signal that an inactivity payload is coming.

See Slashing for the full slashing context.

See also