n8n Troubleshooting Guide

This guide covers common issues and solutions when deploying n8n using the Helm chart.

info

Troubleshooting Approach: Always start with the quick diagnostics section to gather basic information about your deployment before diving into specific issues.

Quick Diagnostics

tip

Diagnostic Order: Follow this order for systematic troubleshooting: Pod Status → Services → ConfigMaps/Secrets → Application Logs.

Check Pod Status

# Check all n8n-related pods
kubectl get pods -l app.kubernetes.io/name=n8n

# Check pod details
kubectl describe pod <pod-name>

# Check pod logs
kubectl logs <pod-name>

Check Services

# Check service status
kubectl get svc -l app.kubernetes.io/name=n8n

# Check service endpoints
kubectl get endpoints -l app.kubernetes.io/name=n8n

Check ConfigMaps and Secrets

# Check ConfigMaps
kubectl get configmap -l app.kubernetes.io/name=n8n

# Check Secrets
kubectl get secret -l app.kubernetes.io/name=n8n

Common Issues

warning

Issue Resolution: Always check the logs first when troubleshooting. Most issues can be identified from application logs.

Pod Startup Issues

Pod Stuck in Pending State

Symptoms:

Pod remains in Pending state
No events in pod description

Solutions:

Check Resource Requests:

kubectl describe pod <pod-name> | grep -A 10 "Events:"

Check Node Resources:

kubectl top nodes
kubectl describe node <node-name>

Check Storage:

kubectl get pvc
kubectl describe pvc <pvc-name>

note

Resource Issues: Pods stuck in Pending state are usually due to insufficient cluster resources or storage issues.

Pod Stuck in CrashLoopBackOff

Symptoms:

Pod repeatedly crashes and restarts
Exit code 1 or 137

Solutions:

Check Application Logs:

kubectl logs <pod-name> --previous

Check Resource Limits:

kubectl describe pod <pod-name> | grep -A 5 "Containers:"

Check Configuration:

kubectl get configmap <configmap-name> -o yaml

danger

CrashLoopBackOff: This indicates a serious configuration or resource issue. Check logs immediately and verify all configuration values.

Database Connection Issues

warning

Database Issues: Database connection problems are among the most common issues. Always verify database credentials and connectivity.

PostgreSQL Connection Failures

Symptoms:

Database connection timeout errors
Authentication failures
Cloud SQL Proxy sidecar errors (if using GKE/Cloud SQL)

Solutions:

Check Database Pod:

kubectl get pods -l app.kubernetes.io/name=postgresql
kubectl logs -l app.kubernetes.io/name=postgresql

Test Database Connection:

kubectl exec -it <n8n-pod> -- nc -zv <db-host> <db-port>

Check Database Credentials:

kubectl get secret <db-secret> -o yaml

Verify Database Configuration:

# Check values.yaml
db:
  type: postgresdb
  logging:
    enabled: true
    options: error
    maxQueryExecutionTime: 1000

postgresql:
  enabled: true
  auth:
    username: n8n
    password: your-secure-password
    database: n8n

If using Cloud SQL Proxy (GKE/Cloud SQL):
- Check the Cloud SQL Proxy sidecar logs:
```
kubectl logs <n8n-pod> -c cloudsql-proxy
```
- Ensure the service account has the Cloud SQL Client role.
- Verify the instance connection name is correct.
- Make sure the proxy port matches externalPostgresql.port.
- See Cloud SQL Proxy docs for more troubleshooting.

SQLite Issues

Symptoms:

Database lock errors
Permission denied errors

Solutions:

Check File Permissions:

kubectl exec -it <n8n-pod> -- ls -la /home/node/.n8n/

Check Disk Space:

kubectl exec -it <n8n-pod> -- df -h

Enable SQLite Logging:

db:
  type: sqlite
  sqlite:
    database: "database.sqlite"
    poolSize: 0
    vacuum: true

tip

SQLite Debugging: Enable SQLite logging and VACUUM operations to identify and resolve database issues.

Queue Mode Issues

info

Queue Mode: Queue mode issues often involve Redis connectivity or configuration problems. Verify Redis is running and accessible.

Redis Connection Problems

Symptoms:

Queue mode not working
Redis connection errors

Solutions:

Check Redis Pod:

kubectl get pods -l app.kubernetes.io/name=redis
kubectl logs -l app.kubernetes.io/name=redis

Test Redis Connection:

kubectl exec -it <n8n-pod> -- redis-cli -h <redis-host> ping

Check Redis Configuration:

redis:
  enabled: true
  architecture: standalone
  master:
    persistence:
      enabled: true
      size: 5Gi

Worker Node Issues

Symptoms:

Workers not processing jobs
Queue backlog

Solutions:

Check Worker Pods:

kubectl get pods -l app.kubernetes.io/component=worker
kubectl logs -l app.kubernetes.io/component=worker

Check Worker Configuration:

worker:
  mode: queue
  count: 2
  concurrency: 10
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
    limits:
      cpu: 2000m
      memory: 2Gi

Check Queue Status:

kubectl exec -it <n8n-pod> -- curl -s http://localhost:5678/metrics | grep queue

MCP and Form Endpoint Issues

info

Advanced Endpoints: MCP and Form endpoints are only available in queue mode with PostgreSQL database. Verify your configuration meets these requirements.

MCP Endpoint Problems

Symptoms:

MCP endpoints not accessible
AI model integration failures
MCP authentication errors

Solutions:

Verify Queue Mode Configuration:

# Ensure queue mode is enabled
webhook:
  mode: queue
  url: "https://yourdomain.com"

# Ensure PostgreSQL is configured
db:
  type: postgresdb

Check MCP Endpoint Accessibility:

# Test MCP endpoint
curl -I https://yourdomain.com/mcp/

# Test MCP test endpoint
curl -I https://yourdomain.com/mcp-test/

Check Ingress Configuration:

# Verify ingress includes MCP paths
kubectl get ingress -o yaml | grep -A 10 -B 5 mcp

Check Webhook Node Logs:

# MCP endpoints are handled by webhook nodes
kubectl logs -l app.kubernetes.io/component=webhook

Verify MCP Authentication:

# If using MCP authentication
ingress:
  annotations:
    nginx.ingress.kubernetes.io/auth-type: basic
    nginx.ingress.kubernetes.io/auth-secret: mcp-auth

warning

MCP Requirements: MCP endpoints require PostgreSQL database and queue mode. They are not available with SQLite or single-node deployments.

Form Endpoint Problems

Symptoms:

Form endpoints not accessible
Form submission failures
Form waiting workflows not working

Solutions:

Verify Form Endpoint Configuration:

# Ensure queue mode is enabled
webhook:
  mode: queue
  url: "https://yourdomain.com"

Check Form Endpoint Accessibility:

# Test form endpoint
curl -I https://yourdomain.com/form/

# Test form test endpoint
curl -I https://yourdomain.com/form-test/

# Test form waiting endpoint
curl -I https://yourdomain.com/form-waiting/

Check Form Trigger Node Configuration:

# Verify form trigger is properly configured in n8n UI
# Check workflow execution logs
kubectl logs -l app.kubernetes.io/name=n8n | grep -i form

Test Form Submission:

# Test form submission
curl -X POST https://yourdomain.com/form/ \
  -H "Content-Type: application/json" \
  -d '{"test": "data"}'

Check Webhook Node Status:

# Form endpoints are handled by webhook nodes
kubectl get pods -l app.kubernetes.io/component=webhook
kubectl logs -l app.kubernetes.io/component=webhook

tip

Form Testing: Use the /form-test/ endpoint to test form functionality before deploying to production.

Endpoint Routing Issues

Symptoms:

Endpoints routed to wrong nodes
Performance issues with specific endpoints

Solutions:

Verify Endpoint Routing:

# Check which nodes handle which endpoints
kubectl get pods -l app.kubernetes.io/name=n8n -o wide

Check Service Configuration:

# Verify service endpoints
kubectl get endpoints -l app.kubernetes.io/name=n8n

Monitor Endpoint Performance:

# Check metrics for endpoint performance
kubectl exec -it <n8n-pod> -- curl -s http://localhost:5678/metrics | grep api

info

Routing Logic: In queue mode, test endpoints (/mcp-test/, /form-test/) are handled by main nodes, while production endpoints (/mcp/, /form/) are handled by webhook nodes.

Persistence Issues

info

Enhanced Features: The chart supports comprehensive persistence for all node types. This feature requires proper configuration to work correctly.

Common Persistence Issues

PVC not bound
Volume mount errors
Data loss after pod restarts

Example Checks

# Check PVC status
kubectl get pvc -l app.kubernetes.io/name=n8n

# Check volume mounts in pods
kubectl describe pod <n8n-pod> | grep -A 10 "Volumes:"

# Check disk space
kubectl exec -it <n8n-pod> -- df -h

Storage Issues

S3 Connection Problems

Symptoms:

Binary data upload failures
S3 authentication errors

Solutions:

Check S3 Credentials:

kubectl get secret <s3-secret> -o yaml

Test S3 Connectivity:

kubectl exec -it <n8n-pod> -- curl -I https://s3.amazonaws.com

Check S3 Configuration:

binaryData:
  mode: "s3"
  s3:
    host: s3.amazonaws.com
    bucketName: n8n-binary-data
    bucketRegion: us-east-1
    accessKey: your-access-key
    accessSecret: your-secret-key

Filesystem Storage Issues

Symptoms:

Permission denied errors
Disk space issues

Solutions:

Check Volume Permissions:

kubectl exec -it <n8n-pod> -- ls -la /data

Check Disk Space:

kubectl exec -it <n8n-pod> -- df -h

Fix Permissions:

kubectl exec -it <n8n-pod> -- chown -R 1000:1000 /data

Network Issues

Ingress Problems

Symptoms:

Cannot access n8n from outside
404 or 502 errors

Solutions:

Check Ingress Status:

kubectl get ingress
kubectl describe ingress <ingress-name>

Check Ingress Controller:

kubectl get pods -n ingress-nginx
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx

Test Service:

kubectl port-forward svc/<n8n-service> 8080:5678
curl http://localhost:8080

Service Connectivity

Symptoms:

Pods cannot communicate
Service discovery issues

Solutions:

Check Service Endpoints:

kubectl get endpoints <service-name>

Test Pod-to-Pod Communication:

kubectl exec -it <pod1> -- nc -zv <pod2-ip> <port>

Check Network Policies:

kubectl get networkpolicy
kubectl describe networkpolicy <policy-name>

Performance Issues

High Memory Usage

Symptoms:

Pods being OOM killed
Slow response times

Solutions:

Check Memory Usage:

kubectl top pods -l app.kubernetes.io/name=n8n

Increase Memory Limits:

main:
  resources:
    requests:
      memory: 512Mi
    limits:
      memory: 2Gi

Enable Memory Monitoring:

serviceMonitor:
  enabled: true
  include:
    defaultMetrics: true

High CPU Usage

Symptoms:

Slow workflow execution
High CPU utilization

Solutions:

Check CPU Usage:

kubectl top pods -l app.kubernetes.io/name=n8n

Increase CPU Limits:

main:
  resources:
    requests:
      cpu: 500m
    limits:
      cpu: 2000m

Scale Workers:

worker:
  mode: queue
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10

Debugging Commands

General Debugging

# Get detailed pod information
kubectl describe pod <pod-name>

# Get pod logs with timestamps
kubectl logs <pod-name> --timestamps

# Get logs from previous container
kubectl logs <pod-name> --previous

# Execute commands in pod
kubectl exec -it <pod-name> -- /bin/sh

# Check environment variables
kubectl exec -it <pod-name> -- env | grep N8N

Database Debugging

# Connect to PostgreSQL
kubectl exec -it <postgres-pod> -- psql -U n8n -d n8n

# Check SQLite database
kubectl exec -it <n8n-pod> -- sqlite3 /home/node/.n8n/database.sqlite ".tables"

# Check database logs
kubectl logs <n8n-pod> | grep -i database

Network Debugging

# Check DNS resolution
kubectl exec -it <pod-name> -- nslookup <service-name>

# Test network connectivity
kubectl exec -it <pod-name> -- nc -zv <host> <port>

# Check network policies
kubectl get networkpolicy
kubectl describe networkpolicy <policy-name>

Storage Debugging

# Check volume mounts
kubectl exec -it <pod-name> -- mount | grep n8n

# Check file permissions
kubectl exec -it <pod-name> -- ls -la /data

# Check disk usage
kubectl exec -it <pod-name> -- df -h

Affinity Debugging

tip

Affinity Issues: Pod affinity and anti-affinity rules can cause scheduling issues. Check affinity configuration when pods are stuck in Pending state.

Check Affinity Configuration

# Check current affinity settings
kubectl get pod <pod-name> -o yaml | grep -A 20 affinity

# Check node labels
kubectl get nodes --show-labels

# Check pod labels
kubectl get pods -l app.kubernetes.io/name=n8n --show-labels

Common Affinity Issues

Symptoms:

Pods stuck in Pending state
"0/1 nodes are available" errors
Scheduling failures due to affinity rules

Solutions:

Check Node Availability:

# Check if nodes match affinity requirements
kubectl get nodes -l node-type=compute-optimized
kubectl get nodes -l storage-type=ssd

Verify Topology Keys:

# Check available topology keys
kubectl get nodes -o jsonpath='{.items[*].metadata.labels.topology\.kubernetes\.io/zone}' | tr ' ' '\n' | sort | uniq
kubectl get nodes -o jsonpath='{.items[*].metadata.labels.kubernetes\.io/hostname}' | tr ' ' '\n' | sort | uniq

Check Affinity Rules:

# Example: Debug affinity configuration
main:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node-type
            operator: In
            values:
            - compute-optimized  # Verify this label exists
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/name: n8n
        topologyKey: kubernetes.io/hostname  # Verify this topology key exists

Troubleshoot Anti-Affinity Conflicts:

# Check if anti-affinity rules are too restrictive
kubectl get pods -l app.kubernetes.io/name=n8n -o wide

# Check node capacity
kubectl describe node <node-name> | grep -A 10 "Allocated resources"

Affinity Configuration Examples

Fix: Relax Anti-Affinity Rules

# Change from required to preferred
worker:
  mode: queue
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:  # Changed from requiredDuringSchedulingIgnoredDuringExecution
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app.kubernetes.io/name: n8n
              app.kubernetes.io/component: worker
          topologyKey: kubernetes.io/hostname

Fix: Use Correct Topology Keys

# Use available topology keys
worker:
  mode: queue
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app.kubernetes.io/name: n8n
              app.kubernetes.io/component: worker
          topologyKey: topology.kubernetes.io/zone  # Use available topology key

warning

Deprecation Notice: The top-level affinity field is deprecated. Use the specific affinity configurations under main, worker, and webhook blocks instead.

info

Affinity Debugging: When troubleshooting scheduling issues, always check if affinity rules are preventing pod placement and verify that required node labels and topology keys exist.

Log Analysis

Common Log Patterns

Database Connection Errors

Error: connect ECONNREFUSED
Error: timeout acquiring a connection
Error: authentication failed

Queue Mode Errors

Error: Redis connection failed
Error: Queue processing failed
Error: Worker not responding

Storage Errors

Error: S3 upload failed
Error: Permission denied
Error: Disk space full

Log Filtering

# Filter for errors
kubectl logs <pod-name> | grep -i error

# Filter for specific components
kubectl logs <pod-name> | grep -i database
kubectl logs <pod-name> | grep -i redis
kubectl logs <pod-name> | grep -i s3

# Filter by time
kubectl logs <pod-name> --since=1h
kubectl logs <pod-name> --since-time="2024-01-01T00:00:00Z"

Recovery Procedures

Pod Recovery

# Restart deployment
kubectl rollout restart deployment <deployment-name>

# Scale down and up
kubectl scale deployment <deployment-name> --replicas=0
kubectl scale deployment <deployment-name> --replicas=1

# Delete and recreate pod
kubectl delete pod <pod-name>

Database Recovery

# Backup database
kubectl exec -it <n8n-pod> -- pg_dump -h <db-host> -U n8n -d n8n > backup.sql

# Restore database
kubectl exec -it <n8n-pod> -- psql -h <db-host> -U n8n -d n8n < backup.sql

Configuration Recovery

# Check current configuration
kubectl get configmap <configmap-name> -o yaml

# Update configuration
kubectl patch configmap <configmap-name> --patch '{"data":{"key":"value"}}'

# Restart to apply changes
kubectl rollout restart deployment <deployment-name>

Prevention

Best Practices

Resource Planning:
- Set appropriate resource requests and limits
- Monitor resource usage regularly
- Plan for scaling needs
Monitoring:
- Enable comprehensive monitoring
- Set up alerting for critical metrics
- Regular log analysis
Backup Strategy:
- Regular database backups
- Configuration backups
- Disaster recovery testing
Security:
- Use Kubernetes secrets for sensitive data
- Enable RBAC
- Regular security updates
Documentation:
- Document custom configurations
- Maintain runbooks for common issues
- Regular review and updates

Getting Help

Community Resources

GitHub Issues: Report bugs or request features
Discussions: Community discussions
n8n Documentation: Official n8n docs

Support Information

When seeking help, please provide:

Environment Details:
- Kubernetes version
- Helm version
- Chart version
- n8n version
Configuration:
- Relevant parts of values.yaml
- Custom configurations
Error Information:
- Complete error messages
- Pod logs
- Events from kubectl describe
Steps to Reproduce:
- Exact commands used
- Expected vs actual behavior

Next Steps

Usage Guide - Quick start and basic deployment
Configuration Guide - Detailed configuration options
Database Setup - PostgreSQL and external database configuration
Queue Mode Setup - Distributed execution with Redis
Storage Configuration - Binary data storage options
Monitoring Setup - Metrics and observability

Quick Diagnostics​

Check Pod Status​

Check Services​

Check ConfigMaps and Secrets​

Common Issues​

Pod Startup Issues​

Pod Stuck in Pending State​

Pod Stuck in CrashLoopBackOff​

Database Connection Issues​

PostgreSQL Connection Failures​

SQLite Issues​

Queue Mode Issues​

Redis Connection Problems​

Worker Node Issues​

MCP and Form Endpoint Issues​

MCP Endpoint Problems​

Form Endpoint Problems​

Endpoint Routing Issues​

Persistence Issues​

Common Persistence Issues​

Example Checks​

Storage Issues​

S3 Connection Problems​

Filesystem Storage Issues​

Network Issues​

Ingress Problems​

Service Connectivity​

Performance Issues​

High Memory Usage​

High CPU Usage​

Debugging Commands​

General Debugging​

Database Debugging​

Network Debugging​

Storage Debugging​

Affinity Debugging​

Check Affinity Configuration​

Common Affinity Issues​

Affinity Configuration Examples​

Log Analysis​

Common Log Patterns​

Database Connection Errors​

Queue Mode Errors​

Storage Errors​

Log Filtering​

Recovery Procedures​

Pod Recovery​

Database Recovery​

Configuration Recovery​

Prevention​

Best Practices​

Getting Help​

Community Resources​

Support Information​

Next Steps​

Quick Diagnostics

Check Pod Status

Check Services

Check ConfigMaps and Secrets

Common Issues

Pod Startup Issues

Pod Stuck in Pending State

Pod Stuck in CrashLoopBackOff

Database Connection Issues

PostgreSQL Connection Failures

SQLite Issues

Queue Mode Issues

Redis Connection Problems

Worker Node Issues

MCP and Form Endpoint Issues

MCP Endpoint Problems

Form Endpoint Problems

Endpoint Routing Issues

Persistence Issues

Common Persistence Issues

Example Checks

Storage Issues

S3 Connection Problems

Filesystem Storage Issues

Network Issues

Ingress Problems

Service Connectivity

Performance Issues

High Memory Usage

High CPU Usage

Debugging Commands

General Debugging

Database Debugging

Network Debugging

Storage Debugging

Affinity Debugging

Check Affinity Configuration

Common Affinity Issues

Affinity Configuration Examples

Log Analysis

Common Log Patterns

Database Connection Errors

Queue Mode Errors

Storage Errors

Log Filtering

Recovery Procedures

Pod Recovery

Database Recovery

Configuration Recovery

Prevention

Best Practices

Getting Help

Community Resources

Support Information

Next Steps