n8n Monitoring Setup

Comprehensive monitoring and observability are essential for production n8n deployments. This guide covers all monitoring options including Prometheus metrics, logging, health checks, and alerting.

info

Production Monitoring: Proper monitoring is crucial for production deployments. It helps you identify issues early, track performance, and ensure system reliability.

Monitoring Overview

tip

Monitoring Strategy: Implement monitoring at multiple levels: application metrics, infrastructure metrics, and business metrics for comprehensive observability.

Available Metrics

Application Metrics: Workflow executions, API calls, performance
Queue Metrics: Job processing, queue depths, worker utilization
System Metrics: CPU, memory, disk usage
Custom Metrics: Business-specific KPIs

Monitoring Components

Prometheus: Metrics collection and storage
ServiceMonitor: Kubernetes-native monitoring
Logging: Structured logging with configurable levels
Health Checks: Liveness and readiness probes
Alerting: Prometheus alerting rules

Prometheus Metrics Configuration

note

Prometheus Operator: This guide assumes you have Prometheus Operator installed. If not, you'll need to set up Prometheus separately.

Basic ServiceMonitor Setup

serviceMonitor:
  enabled: true
  interval: 30s
  timeout: 10s
  labels:
    release: prometheus
  include:
    defaultMetrics: true
    cacheMetrics: false
    messageEventBusMetrics: false
    workflowIdLabel: false
    nodeTypeLabel: false
    credentialTypeLabel: false
    apiEndpoints: false
    queueMetrics: false

Advanced ServiceMonitor Configuration

serviceMonitor:
  enabled: true
  namespace: monitoring
  interval: 15s
  timeout: 10s
  labels:
    release: prometheus
    team: platform
  targetLabels:
    - app.kubernetes.io/name
    - app.kubernetes.io/instance
  metricRelabelings:
    - sourceLabels: [prometheus_replica]
      regex: (.*)
      targetLabel: another_prometheus_replica
      action: replace
  include:
    defaultMetrics: true
    cacheMetrics: true
    messageEventBusMetrics: true
    workflowIdLabel: true
    nodeTypeLabel: true
    credentialTypeLabel: true
    apiEndpoints: true
    apiPathLabel: true
    apiMethodLabel: true
    apiStatusCodeLabel: true
    queueMetrics: true

warning

Metric Volume: Be cautious with enabling all metrics in high-traffic environments as it can increase Prometheus storage requirements and query latency.

Available Metrics

Default Metrics

n8n_execution_total - Total workflow executions
n8n_execution_duration_seconds - Execution duration histogram
n8n_execution_failed_total - Failed executions
n8n_workflow_total - Total workflows
n8n_credential_total - Total credentials
n8n_node_total - Total nodes

Queue Metrics (Queue Mode)

n8n_queue_bull_queue_waiting - Jobs waiting in queue
n8n_queue_bull_queue_active - Active jobs
n8n_queue_bull_queue_completed - Completed jobs
n8n_queue_bull_queue_failed - Failed jobs
n8n_queue_bull_queue_delayed - Delayed jobs

API Metrics

n8n_api_requests_total - Total API requests
n8n_api_request_duration_seconds - API request duration
n8n_api_requests_failed_total - Failed API requests

Cache Metrics

n8n_cache_hits_total - Cache hits
n8n_cache_misses_total - Cache misses
n8n_cache_size_bytes - Cache size

Prometheus Queries

tip

Query Optimization: Use appropriate time ranges and aggregation functions to optimize query performance in Prometheus.

Basic Queries

# Execution rate (executions per second)
rate(n8n_execution_total[5m])

# Average execution duration
histogram_quantile(0.95, rate(n8n_execution_duration_seconds_bucket[5m]))

# Error rate
rate(n8n_execution_failed_total[5m])

# Queue depth (queue mode)
n8n_queue_bull_queue_waiting

# API request rate
rate(n8n_api_requests_total[5m])

Advanced Queries

# Success rate
(
  rate(n8n_execution_total[5m]) - rate(n8n_execution_failed_total[5m])
) / rate(n8n_execution_total[5m]) * 100

# 95th percentile execution time
histogram_quantile(0.95, rate(n8n_execution_duration_seconds_bucket[5m]))

# Queue utilization (queue mode)
n8n_queue_bull_queue_active / (n8n_queue_bull_queue_waiting + n8n_queue_bull_queue_active) * 100

# Cache hit ratio
rate(n8n_cache_hits_total[5m]) / (rate(n8n_cache_hits_total[5m]) + rate(n8n_cache_misses_total[5m])) * 100

Logging Configuration

info

Log Management: Configure appropriate log levels and outputs based on your environment. Use structured logging for better log analysis.

Basic Logging

log:
  level: info
  output:
    - console
  scopes: []

Advanced Logging

log:
  level: info
  output:
    - console
    - file
  scopes:
    - concurrency
    - external-secrets
    - license
    - multi-main-setup
    - pubsub
    - redis
    - scaling
    - waiting-executions
  file:
    location: "logs/n8n.log"
    maxsize: 16
    maxcount: "100"

Structured Logging

log:
  level: info
  output:
    - console
  scopes:
    - concurrency
    - redis
    - scaling

Log Aggregation

# Configure log forwarding to external systems
main:
  extraContainers:
    - name: fluentd
      image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
      volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
  volumes:
    - name: varlog
      hostPath:
        path: /var/log
    - name: varlibdockercontainers
      hostPath:
        path: /var/lib/docker/containers

Health Checks

Basic Health Checks

main:
  livenessProbe:
    httpGet:
      path: /healthz
      port: http
    initialDelaySeconds: 30
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 3

  readinessProbe:
    httpGet:
      path: /healthz/readiness
      port: http
    initialDelaySeconds: 5
    periodSeconds: 5
    timeoutSeconds: 3
    failureThreshold: 3

Advanced Health Checks

main:
  livenessProbe:
    httpGet:
      path: /healthz
      port: http
      httpHeaders:
        - name: X-Custom-Header
          value: health-check
    initialDelaySeconds: 60
    periodSeconds: 30
    timeoutSeconds: 10
    failureThreshold: 3
    successThreshold: 1

  readinessProbe:
    httpGet:
      path: /healthz/readiness
      port: http
    initialDelaySeconds: 10
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 3
    successThreshold: 1

  startupProbe:
    httpGet:
      path: /healthz
      port: http
    initialDelaySeconds: 30
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 30

Queue Mode Health Checks

worker:
  livenessProbe:
    httpGet:
      path: /healthz
      port: http
    initialDelaySeconds: 30
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 3

  readinessProbe:
    httpGet:
      path: /healthz/readiness
      port: http
    initialDelaySeconds: 5
    periodSeconds: 5
    timeoutSeconds: 3
    failureThreshold: 3

  startupProbe:
    exec:
      command: ["/bin/sh", "-c", "ps aux | grep '[n]8n'"]
    initialDelaySeconds: 10
    periodSeconds: 5
    failureThreshold: 30

webhook:
  livenessProbe:
    httpGet:
      path: /healthz
      port: http
    initialDelaySeconds: 30
    periodSeconds: 10
    timeoutSeconds: 5
    failureThreshold: 3

  readinessProbe:
    httpGet:
      path: /healthz/readiness
      port: http
    initialDelaySeconds: 5
    periodSeconds: 5
    timeoutSeconds: 3
    failureThreshold: 3

Alerting Rules

Basic Alerting Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: n8n-alerts
  namespace: monitoring
  labels:
    release: prometheus
spec:
  groups:
    - name: n8n
      rules:
        - alert: N8NHighErrorRate
          expr: rate(n8n_execution_failed_total[5m]) > 0.1
          for: 2m
          labels:
            severity: warning
          annotations:
            summary: "High n8n execution error rate"
            description: "n8n is experiencing a high error rate of {{ $value }} errors per second"

        - alert: N8NHighExecutionTime
          expr: histogram_quantile(0.95, rate(n8n_execution_duration_seconds_bucket[5m])) > 300
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "High n8n execution time"
            description: "95th percentile execution time is {{ $value }} seconds"

        - alert: N8NQueueDepth
          expr: n8n_queue_bull_queue_waiting > 100
          for: 2m
          labels:
            severity: warning
          annotations:
            summary: "High n8n queue depth"
            description: "Queue depth is {{ $value }} jobs"

Advanced Alerting Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: n8n-advanced-alerts
  namespace: monitoring
  labels:
    release: prometheus
spec:
  groups:
    - name: n8n.advanced
      rules:
        - alert: N8NLowSuccessRate
          expr: (
            rate(n8n_execution_total[5m]) - rate(n8n_execution_failed_total[5m])
          ) / rate(n8n_execution_total[5m]) < 0.95
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Low n8n success rate"
            description: "Success rate is {{ $value | humanizePercentage }}"

        - alert: N8NHighMemoryUsage
          expr: (container_memory_usage_bytes{container="n8n"} / container_spec_memory_limit_bytes{container="n8n"}) > 0.8
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "High n8n memory usage"
            description: "Memory usage is {{ $value | humanizePercentage }}"

        - alert: N8NHighCPUUsage
          expr: rate(container_cpu_usage_seconds_total{container="n8n"}[5m]) > 0.8
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "High n8n CPU usage"
            description: "CPU usage is {{ $value | humanizePercentage }}"

        - alert: N8NQueueStuck
          expr: n8n_queue_bull_queue_active > 0 and rate(n8n_queue_bull_queue_completed[5m]) == 0
          for: 10m
          labels:
            severity: critical
          annotations:
            summary: "n8n queue appears stuck"
            description: "Active jobs: {{ $value }}, no completions in 5m"

        - alert: N8NHighAPILatency
          expr: histogram_quantile(0.95, rate(n8n_api_request_duration_seconds_bucket[5m])) > 5
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "High n8n API latency"
            description: "95th percentile API latency is {{ $value }} seconds"

Grafana Dashboards

info

Dashboard Templates: These dashboard examples are based on actual n8n metrics and can be imported directly into Grafana. Customize them based on your specific monitoring needs.

Comprehensive n8n Dashboard

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "target": {
          "limit": 100,
          "matchAny": false,
          "tags": [],
          "type": "dashboard"
        },
        "type": "dashboard"
      }
    ]
  },
  "description": "n8n prometheus client basic metrics",
  "editable": true,
  "fiscalYearStartMonth": 0,
  "gnetId": 11159,
  "graphTooltip": 0,
  "iteration": 1750529070188,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "prometheus",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 7,
        "w": 9,
        "x": 0,
        "y": 0
      },
      "hiddenSeries": false,
      "id": 6,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "paceLength": 10,
      "percentage": false,
      "pluginVersion": "8.2.7",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "exemplar": true,
          "expr": "irate(n8n_process_cpu_user_seconds_total{instance=~\"$instance\"}[2m]) * 100",
          "format": "time_series",
          "interval": "",
          "intervalFactor": 1,
          "legendFormat": "User CPU - {{instance}}",
          "refId": "A"
        },
        {
          "exemplar": true,
          "expr": "irate(n8n_process_cpu_system_seconds_total{instance=~\"$instance\"}[2m]) * 100",
          "format": "time_series",
          "interval": "",
          "intervalFactor": 1,
          "legendFormat": "Sys CPU - {{instance}}",
          "refId": "B"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Process CPU Usage",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "percent",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "prometheus",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 7,
        "w": 8,
        "x": 9,
        "y": 0
      },
      "hiddenSeries": false,
      "id": 8,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "paceLength": 10,
      "percentage": false,
      "pluginVersion": "8.2.7",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "exemplar": true,
          "expr": "n8n_nodejs_eventloop_lag_seconds{instance=~\"$instance\"}",
          "format": "time_series",
          "interval": "",
          "intervalFactor": 1,
          "legendFormat": "{{role}}",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Event Loop Lag",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "s",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "datasource": "prometheus",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "decimals": 0,
          "mappings": [],
          "noValue": "0",
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "none"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 7,
        "w": 3,
        "x": 17,
        "y": 0
      },
      "id": 14,
      "options": {
        "colorMode": "value",
        "graphMode": "none",
        "justifyMode": "auto",
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "last"
          ],
          "fields": "",
          "values": false
        },
        "text": {},
        "textMode": "value"
      },
      "pluginVersion": "8.2.7",
      "targets": [
        {
          "exemplar": true,
          "expr": "sum(increase(n8n_scaling_mode_queue_jobs_completed[1w]))",
          "interval": "",
          "legendFormat": "",
          "refId": "A"
        }
      ],
      "title": "Last Week Completed Jobs",
      "type": "stat"
    },
    {
      "cacheTimeout": null,
      "datasource": "prometheus",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [
            {
              "options": {
                "match": "null",
                "result": {
                  "text": "N/A"
                }
              },
              "type": "special"
            }
          ],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "none"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 3,
        "w": 4,
        "x": 20,
        "y": 0
      },
      "id": 2,
      "interval": "",
      "links": [],
      "maxDataPoints": 100,
      "options": {
        "colorMode": "none",
        "graphMode": "none",
        "justifyMode": "auto",
        "orientation": "horizontal",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "",
          "values": false
        },
        "text": {},
        "textMode": "name"
      },
      "pluginVersion": "8.2.7",
      "targets": [
        {
          "exemplar": true,
          "expr": "sum(n8n_nodejs_version_info{instance=~\"$instance\"}) by (version)",
          "format": "time_series",
          "instant": false,
          "interval": "",
          "intervalFactor": 1,
          "legendFormat": "{{version}}",
          "refId": "A"
        }
      ],
      "timeFrom": null,
      "timeShift": null,
      "title": "Node.js Version",
      "type": "stat"
    },
    {
      "cacheTimeout": null,
      "datasource": "prometheus",
      "fieldConfig": {
        "defaults": {
          "color": {
            "fixedColor": "#F2495C",
            "mode": "fixed"
          },
          "mappings": [
            {
              "options": {
                "match": "null",
                "result": {
                  "text": "N/A"
                }
              },
              "type": "special"
            }
          ],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "none"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 4,
        "w": 4,
        "x": 20,
        "y": 3
      },
      "id": 4,
      "interval": null,
      "links": [],
      "maxDataPoints": 100,
      "options": {
        "colorMode": "none",
        "graphMode": "none",
        "justifyMode": "auto",
        "orientation": "horizontal",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "",
          "values": false
        },
        "text": {},
        "textMode": "name"
      },
      "pluginVersion": "8.2.7",
      "targets": [
        {
          "exemplar": true,
          "expr": "sum(n8n_version_info{instance=~\"$instance\"}) by (version)",
          "format": "time_series",
          "interval": "",
          "intervalFactor": 1,
          "legendFormat": "{{version}}",
          "refId": "A"
        }
      ],
      "timeFrom": null,
      "timeShift": null,
      "title": "n8n version",
      "type": "stat"
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "prometheus",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 7,
        "w": 16,
        "x": 0,
        "y": 7
      },
      "hiddenSeries": false,
      "id": 7,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "rightSide": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "paceLength": 10,
      "percentage": false,
      "pluginVersion": "8.2.7",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "exemplar": true,
          "expr": "n8n_process_resident_memory_bytes{instance=~\"$instance\"}",
          "format": "time_series",
          "interval": "",
          "intervalFactor": 1,
          "legendFormat": "Process Memory - {{role}}",
          "refId": "A"
        },
        {
          "exemplar": true,
          "expr": "n8n_nodejs_heap_size_total_bytes{instance=~\"$instance\"}",
          "format": "time_series",
          "interval": "",
          "intervalFactor": 1,
          "legendFormat": "Heap Total - {{role}}",
          "refId": "B"
        },
        {
          "exemplar": true,
          "expr": "n8n_nodejs_heap_size_used_bytes{instance=~\"$instance\"}",
          "format": "time_series",
          "interval": "",
          "intervalFactor": 1,
          "legendFormat": "Heap Used - {{role}}",
          "refId": "C"
        },
        {
          "exemplar": true,
          "expr": "n8n_nodejs_external_memory_bytes{instance=~\"$instance\"}",
          "format": "time_series",
          "interval": "",
          "intervalFactor": 1,
          "legendFormat": "External Memory - {{role}}",
          "refId": "D"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Process Memory Usage",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "bytes",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "prometheus",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 7,
        "w": 8,
        "x": 16,
        "y": 7
      },
      "hiddenSeries": false,
      "id": 9,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "paceLength": 10,
      "percentage": false,
      "pluginVersion": "8.2.7",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "exemplar": true,
          "expr": "n8n_nodejs_active_handles_total{instance=~\"$instance\"}",
          "format": "time_series",
          "interval": "",
          "intervalFactor": 1,
          "legendFormat": "Active Handler - {{role}}",
          "refId": "A"
        },
        {
          "exemplar": true,
          "expr": "n8n_nodejs_active_requests_total{instance=~\"$instance\"}",
          "format": "time_series",
          "interval": "",
          "intervalFactor": 1,
          "legendFormat": "Active Request - {{role}}",
          "refId": "B"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Active Handlers/Requests Total",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "prometheus",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 8,
        "w": 8,
        "x": 0,
        "y": 14
      },
      "hiddenSeries": false,
      "id": 10,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "rightSide": false,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "paceLength": 10,
      "percentage": false,
      "pluginVersion": "8.2.7",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "exemplar": true,
          "expr": "n8n_nodejs_heap_space_size_total_bytes{instance=~\"$instance\"}",
          "format": "time_series",
          "interval": "",
          "intervalFactor": 1,
          "legendFormat": "Heap Total - {{role}} - {{space}}",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Heap Total Detail",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "bytes",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "prometheus",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 8,
        "w": 8,
        "x": 8,
        "y": 14
      },
      "hiddenSeries": false,
      "id": 11,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "rightSide": false,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "paceLength": 10,
      "percentage": false,
      "pluginVersion": "8.2.7",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "exemplar": true,
          "expr": "n8n_nodejs_heap_space_size_used_bytes{instance=~\"$instance\"}",
          "format": "time_series",
          "interval": "",
          "intervalFactor": 1,
          "legendFormat": "Heap Used - {{role}} - {{space}}",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Heap Used Detail",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "bytes",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "prometheus",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 8,
        "w": 8,
        "x": 16,
        "y": 14
      },
      "hiddenSeries": false,
      "id": 12,
      "legend": {
        "alignAsTable": true,
        "avg": true,
        "current": true,
        "max": true,
        "min": true,
        "rightSide": false,
        "show": true,
        "total": false,
        "values": true
      },
      "lines": true,
      "linewidth": 1,
      "links": [],
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "paceLength": 10,
      "percentage": false,
      "pluginVersion": "8.2.7",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "exemplar": true,
          "expr": "n8n_nodejs_heap_space_size_available_bytes{instance=~\"$instance\"}",
          "format": "time_series",
          "interval": "",
          "intervalFactor": 1,
          "legendFormat": "Heap Used - {{role}} - {{space}}",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Heap Available Detail",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "bytes",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    }
  ],
  "refresh": "30s",
  "schemaVersion": 32,
  "style": "dark",
  "tags": [
    "n8n"
  ],
  "templating": {
    "list": [
      {
        "allValue": null,
        "current": {
          "selected": true,
          "text": [
            "All"
          ],
          "value": [
            "$__all"
          ]
        },
        "datasource": "prometheus",
        "definition": "label_values(n8n_nodejs_version_info, instance)",
        "description": null,
        "error": null,
        "hide": 0,
        "includeAll": true,
        "label": "instance",
        "multi": true,
        "name": "instance",
        "options": [],
        "query": {
          "query": "label_values(n8n_nodejs_version_info, instance)",
          "refId": "StandardVariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 1,
        "tagValuesQuery": "",
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      }
    ]
  },
  "time": {
    "from": "now-1h",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": [
      "5s",
      "10s",
      "30s",
      "1m",
      "5m",
      "15m",
      "30m",
      "1h",
      "2h",
      "1d"
    ],
    "time_options": [
      "5m",
      "15m",
      "1h",
      "6h",
      "12h",
      "24h",
      "2d",
      "7d",
      "30d"
    ]
  },
  "timezone": "browser",
  "title": "N8N Application Dashboard",
  "uid": "PTSqcpJWk",
  "version": 1
}

Enhanced Prometheus Queries

tip

Query Optimization: These queries are optimized for production use and include proper rate calculations and aggregations.

System Performance Queries

# CPU Usage (per instance)
irate(n8n_process_cpu_user_seconds_total{instance=~"$instance"}[2m]) * 100

# Memory Usage (per instance)
n8n_process_resident_memory_bytes{instance=~"$instance"}

# Event Loop Lag (critical for performance)
n8n_nodejs_eventloop_lag_seconds{instance=~"$instance"}

# Active Handles and Requests
n8n_nodejs_active_handles_total{instance=~"$instance"}
n8n_nodejs_active_requests_total{instance=~"$instance"}

Workflow Execution Queries

# Execution Rate (per instance)
rate(n8n_execution_total{instance=~"$instance"}[5m])

# Success Rate (per instance)
(
  rate(n8n_execution_total{instance=~"$instance"}[5m]) -
  rate(n8n_execution_failed_total{instance=~"$instance"}[5m])
) / rate(n8n_execution_total{instance=~"$instance"}[5m]) * 100

# Execution Duration Percentiles
histogram_quantile(0.50, rate(n8n_execution_duration_seconds_bucket{instance=~"$instance"}[5m]))
histogram_quantile(0.95, rate(n8n_execution_duration_seconds_bucket{instance=~"$instance"}[5m]))
histogram_quantile(0.99, rate(n8n_execution_duration_seconds_bucket{instance=~"$instance"}[5m]))

# Error Rate
rate(n8n_execution_failed_total{instance=~"$instance"}[5m])

Queue Mode Queries

# Queue Depth by Status
n8n_queue_bull_queue_waiting{instance=~"$instance"}
n8n_queue_bull_queue_active{instance=~"$instance"}
n8n_queue_bull_queue_delayed{instance=~"$instance"}

# Queue Processing Rate
rate(n8n_queue_bull_queue_completed{instance=~"$instance"}[5m])
rate(n8n_queue_bull_queue_failed{instance=~"$instance"}[5m])

# Worker Utilization
n8n_queue_bull_queue_active{role="worker"} /
(n8n_queue_bull_queue_waiting{role="worker"} + n8n_queue_bull_queue_active{role="worker"}) * 100

# Queue Stuck Detection
n8n_queue_bull_queue_active{instance=~"$instance"} > 0 and
rate(n8n_queue_bull_queue_completed{instance=~"$instance"}[5m]) == 0

API Performance Queries

# API Request Rate
rate(n8n_api_requests_total{instance=~"$instance"}[5m])

# API Response Time Percentiles
histogram_quantile(0.50, rate(n8n_api_request_duration_seconds_bucket{instance=~"$instance"}[5m]))
histogram_quantile(0.95, rate(n8n_api_request_duration_seconds_bucket{instance=~"$instance"}[5m]))

# API Error Rate
rate(n8n_api_requests_failed_total{instance=~"$instance"}[5m])

Memory and Heap Queries

# Heap Memory Usage
n8n_nodejs_heap_size_used_bytes{instance=~"$instance"}
n8n_nodejs_heap_size_total_bytes{instance=~"$instance"}

# Heap Space Details
n8n_nodejs_heap_space_size_used_bytes{instance=~"$instance"}
n8n_nodejs_heap_space_size_available_bytes{instance=~"$instance"}

# External Memory
n8n_nodejs_external_memory_bytes{instance=~"$instance"}

Advanced Analytics Queries

# Execution Trends (hourly)
increase(n8n_execution_total{instance=~"$instance"}[1h])

# Success Rate Trends (hourly)
(
  increase(n8n_execution_total{instance=~"$instance"}[1h]) -
  increase(n8n_execution_failed_total{instance=~"$instance"}[1h])
) / increase(n8n_execution_total{instance=~"$instance"}[1h]) * 100

# Queue Processing Efficiency
rate(n8n_queue_bull_queue_completed{instance=~"$instance"}[5m]) /
(n8n_queue_bull_queue_waiting{instance=~"$instance"} + n8n_queue_bull_queue_active{instance=~"$instance"})

# Resource Utilization Score
(
  irate(n8n_process_cpu_user_seconds_total{instance=~"$instance"}[2m]) * 100 +
  (n8n_process_resident_memory_bytes{instance=~"$instance"} /
   n8n_nodejs_heap_size_total_bytes{instance=~"$instance"}) * 100
) / 2

warning

Query Performance: Use appropriate time ranges and consider using recording rules for complex queries that are frequently executed.

Sentry Integration

Basic Sentry Setup

sentry:
  enabled: true
  backendDsn: "https://your-sentry-dsn@sentry.io/project"
  frontendDsn: "https://your-sentry-dsn@sentry.io/project"

Advanced Sentry Configuration

sentry:
  enabled: true
  backendDsn: "https://your-sentry-dsn@sentry.io/project"
  frontendDsn: "https://your-sentry-dsn@sentry.io/project"
  externalTaskRunnersDsn: "https://your-sentry-dsn@sentry.io/project"

Database Monitoring

PostgreSQL Monitoring

# Enable PostgreSQL metrics
postgresql:
  enabled: true
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
      interval: 30s
      labels:
        release: prometheus

Redis Monitoring (Queue Mode)

# Enable Redis metrics
redis:
  enabled: true
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
      interval: 30s
      labels:
        release: prometheus

Storage Monitoring

Filesystem Storage Monitoring

# Monitor filesystem usage
main:
  extraContainers:
    - name: storage-monitor
      image: busybox
      command:
        - /bin/sh
        - -c
        - |
          while true; do
            df -h /data | tail -1 | awk '{print $5}' | sed 's/%//' > /tmp/disk-usage
            sleep 60
          done
      volumeMounts:
        - name: n8n-binary-data
          mountPath: /data

Troubleshooting

Common Monitoring Issues

ServiceMonitor Not Scraping

# Check ServiceMonitor status
kubectl get servicemonitor -n monitoring

# Check Prometheus targets
kubectl port-forward svc/prometheus-operated 9090:9090 -n monitoring

# Check n8n metrics endpoint
kubectl exec -it <n8n-pod> -- curl -s http://localhost:5678/metrics

High Memory Usage

# Check memory usage
kubectl top pods -l app.kubernetes.io/name=n8n

# Check memory limits
kubectl describe pod <n8n-pod>

# Check for memory leaks
kubectl logs <n8n-pod> | grep -i memory

High CPU Usage

# Check CPU usage
kubectl top pods -l app.kubernetes.io/name=n8n

# Check for CPU-intensive operations
kubectl logs <n8n-pod> | grep -i cpu

# Check execution metrics
kubectl exec -it <n8n-pod> -- curl -s http://localhost:5678/metrics | grep execution

Performance Optimization

Metrics Collection Optimization

serviceMonitor:
  enabled: true
  interval: 60s  # Increase interval for high-volume deployments
  timeout: 30s
  include:
    defaultMetrics: true
    cacheMetrics: false  # Disable if not needed
    messageEventBusMetrics: false
    queueMetrics: true

Logging Optimization

log:
  level: warn  # Reduce log level in production
  output:
    - console
  scopes:
    - redis
    - scaling
  file:
    maxsize: 32  # Increase file size
    maxcount: "50"  # Reduce file count

Next Steps

Usage Guide - Quick start and basic deployment
Configuration Guide - Detailed configuration options
Database Setup - PostgreSQL and external database configuration
Queue Mode Setup - Distributed execution with Redis
Storage Configuration - Binary data storage options
Troubleshooting - Common issues and solutions

Best Practices

Monitoring Strategy

Start with basic metrics and expand gradually
Use appropriate alert thresholds
Monitor both application and infrastructure metrics
Set up dashboards for different user roles
Regular review and tuning of alerts

Performance

Use appropriate scrape intervals
Filter metrics to reduce cardinality
Optimize Prometheus queries
Use recording rules for complex queries
Monitor monitoring system performance

Reliability

Set up monitoring for the monitoring system
Use multiple alerting channels
Test alerting rules regularly
Document alert procedures
Set up escalation policies

Security

Secure Prometheus endpoints
Use RBAC for monitoring access
Encrypt sensitive metrics
Audit monitoring access
Regular security updates

Monitoring Overview​

Available Metrics​

Monitoring Components​

Prometheus Metrics Configuration​

Basic ServiceMonitor Setup​

Advanced ServiceMonitor Configuration​

Available Metrics​

Default Metrics​

Queue Metrics (Queue Mode)​

API Metrics​

Cache Metrics​

Prometheus Queries​

Basic Queries​

Advanced Queries​

Logging Configuration​

Basic Logging​

Advanced Logging​

Structured Logging​

Log Aggregation​

Health Checks​

Basic Health Checks​

Advanced Health Checks​

Queue Mode Health Checks​

Alerting Rules​

Basic Alerting Rules​

Advanced Alerting Rules​

Grafana Dashboards​

Comprehensive n8n Dashboard​

Enhanced Prometheus Queries​

System Performance Queries​

Workflow Execution Queries​

Queue Mode Queries​

API Performance Queries​

Memory and Heap Queries​

Advanced Analytics Queries​

Sentry Integration​

Basic Sentry Setup​

Advanced Sentry Configuration​

Database Monitoring​

PostgreSQL Monitoring​

Redis Monitoring (Queue Mode)​

Storage Monitoring​

Filesystem Storage Monitoring​

Troubleshooting​

Common Monitoring Issues​

ServiceMonitor Not Scraping​

High Memory Usage​

High CPU Usage​

Performance Optimization​

Metrics Collection Optimization​

Logging Optimization​

Next Steps​

Best Practices​

Monitoring Strategy​

Performance​

Reliability​

Security​

Monitoring Overview

Available Metrics

Monitoring Components

Prometheus Metrics Configuration

Basic ServiceMonitor Setup

Advanced ServiceMonitor Configuration

Available Metrics

Default Metrics

Queue Metrics (Queue Mode)

API Metrics

Cache Metrics

Prometheus Queries

Basic Queries

Advanced Queries

Logging Configuration

Basic Logging

Advanced Logging

Structured Logging

Log Aggregation

Health Checks

Basic Health Checks

Advanced Health Checks

Queue Mode Health Checks

Alerting Rules

Basic Alerting Rules

Advanced Alerting Rules

Grafana Dashboards

Comprehensive n8n Dashboard

Enhanced Prometheus Queries

System Performance Queries

Workflow Execution Queries

Queue Mode Queries

API Performance Queries

Memory and Heap Queries

Advanced Analytics Queries

Sentry Integration

Basic Sentry Setup

Advanced Sentry Configuration

Database Monitoring

PostgreSQL Monitoring

Redis Monitoring (Queue Mode)

Storage Monitoring

Filesystem Storage Monitoring

Troubleshooting

Common Monitoring Issues

ServiceMonitor Not Scraping

High Memory Usage

High CPU Usage

Performance Optimization

Metrics Collection Optimization

Logging Optimization

Next Steps

Best Practices

Monitoring Strategy

Performance

Reliability

Security