Temporal vs n8n vs Airflow Batch Processing

Temporal vs n8n vs Airflow Batch Processing

What You’ll Need

  • n8n Cloud or self-hosted n8n instance
  • Hetzner VPS or Contabo VPS for self-hosting Temporal/Airflow
  • DigitalOcean as an alternative hosting option
  • Docker and Docker Compose (free)
  • Python 3.9+ for Airflow workflows
  • Node.js 16+ for n8n custom nodes (optional)

Table of Contents

  1. Why Batch Processing Matters
  2. Understanding Temporal, n8n, and Airflow
  3. Temporal for Distributed Batch Jobs
  4. n8n’s Approach to Batch Processing
  5. Airflow: The Batch Processing Pioneer
  6. Head-to-Head Comparison
  7. Real-World Implementation Examples
  8. Getting Started

Why Batch Processing Matters

I’ve spent the last three years helping teams move data at scale, and I can tell you: batch processing is the backbone of serious data operations. Whether you’re processing daily user exports, running nightly reconciliations, or ingesting thousands of records from a data warehouse, your choice of orchestration tool can make or break your workflow efficiency.

The problem? Everyone claims their tool is best for batch processing. Temporal boasts distributed resilience. Airflow owns the data engineering space. And n8n promises low-code simplicity. So which one actually delivers when you need to process 100,000 records reliably?

This guide cuts through the marketing. I’m walking you through real batch scenarios, actual config code, and honest trade-offs so you can pick the right tool today—not waste six months on the wrong one tomorrow.


Understanding Temporal, n8n, and Airflow

Let me be direct: these three tools solve the same problem in fundamentally different ways.

Temporal is a distributed workflow engine built on Apache Cadence. It’s designed for long-running, fault-tolerant operations with built-in replay logic. Think: microservices orchestration at Netflix scale.

n8n is a visual workflow automation platform that’s perfect for connecting SaaS tools and APIs without code. You drag, drop, and deploy. It’s self-hostable (which I cover in depth in my Self-Hosted Workflow Automation vs Cloud Zapier Alternatives guide), making it enterprise-friendly.

Airflow is Apache’s open-source DAG (Directed Acyclic Graph) scheduler, built specifically for data engineering. It’s Python-first and has the most mature ecosystem for batch jobs.

For batch processing specifically, they occupy different lanes:

  • Temporal: Best for complex, distributed batch workflows with strict reliability needs
  • n8n: Best for mid-scale batch jobs connecting SaaS platforms
  • Airflow: Best for data engineering pipelines with complex dependencies

Temporal for Distributed Batch Jobs

Temporal’s killer feature for batch processing is durability through replay. If your worker crashes mid-batch, Temporal replays from the last checkpoint without reprocessing completed items.

Here’s a real batch processor that handles customer data exports:

import * as wf from '@temporalio/workflow';
import type * as activities from './activities';

export async function customerExportBatch(batchId: string, customerIds: string[]) {
  const { processCustomerBatch, uploadToS3, notifyAdmin } = wf.proxyActivities<typeof activities>({
    startToCloseTimeout: '10 minutes',
    retryPolicy: {
      initialInterval: '5 seconds',
      maximumInterval: '1 minute',
      maximumAttempts: 3,
    },
  });

  const chunkSize = 100;
  const chunks: string[][] = [];
  
  for (let i = 0; i < customerIds.length; i += chunkSize) {
    chunks.push(customerIds.slice(i, i + chunkSize));
  }

  const results: string[] = [];
  
  for (const chunk of chunks) {
    const result = await processCustomerBatch(batchId, chunk);
    results.push(result);
  }

  const s3Path = await uploadToS3(batchId, results);
  await notifyAdmin(batchId, s3Path);
  
  return { batchId, processedCount: customerIds.length, s3Path };
}

The activity implementation:

import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
import * as fs from 'fs';

const s3 = new S3Client({ region: 'us-east-1' });

export async function processCustomerBatch(batchId: string, customerIds: string[]): Promise<string> {
  const exportData: Record<string, unknown>[] = [];
  
  for (const customerId of customerIds) {
    const customer = await fetchCustomerFromDB(customerId);
    exportData.push({
      id: customerId,
      name: customer.name,
      email: customer.email,
      createdAt: customer.createdAt,
      lastPurchase: customer.lastPurchase,
    });
  }

  const fileName = `batch-${batchId}-${Date.now()}.json`;
  fs.writeFileSync(fileName, JSON.stringify(exportData, null, 2));
  
  return fileName;
}

export async function uploadToS3(batchId: string, filePaths: string[]): Promise<string> {
  const s3Paths: string[] = [];
  
  for (const filePath of filePaths) {
    const fileContent = fs.readFileSync(filePath);
    const key = `exports/${batchId}/${filePath}`;
    
    const command = new PutObjectCommand({
      Bucket: 'your-bucket-name',
      Key: key,
      Body: fileContent,
    });
    
    await s3.send(command);
    s3Paths.push(`s3://your-bucket-name/${key}`);
    fs.unlinkSync(filePath);
  }
  
  return s3Paths.join(', ');
}

export async function notifyAdmin(batchId: string, s3Path: string): Promise<void> {
  console.log(`Batch ${batchId} completed. Files available at: ${s3Path}`);
}

async function fetchCustomerFromDB(customerId: string): Promise<Record<string, unknown>> {
  return {
    name: `Customer ${customerId}`,
    email: `customer${customerId}@example.com`,
    createdAt: new Date().toISOString(),
    lastPurchase: new Date(Date.now() - 86400000).toISOString(),
  };
}

To run this, you deploy with a Worker:

import { Worker } from '@temporalio/worker';
import * as activities from './activities';

async function run() {
  const worker = await Worker.create({
    workflowsPath: require.resolve('./workflows'),
    activities,
    taskQueue: 'batch-processing',
    connection: {
      address: 'temporal.example.com:7233',
    },
  });

  await worker.run();
}

run().catch(err => {
  console.error('Worker failed:', err);
  process.exit(1);
});

Temporal shines here because if your database call fails on customer #7,500, Temporal remembers where you left off. Restart the worker—it picks up at 7,501. No lost work, no duplicates.


n8n’s Approach to Batch Processing

n8n handles batch processing differently: it’s about connecting existing systems and transforming data visually. You’re not writing orchestration logic—you’re wiring integrations.

Here’s a JSON config for a batch job that exports Stripe customers to Postgres:

{
  "nodes": [
    {
      "parameters": {
        "resource": "customer",
        "operation": "getAll",
        "returnAll": true,
        "options": {}
      },
      "id": "stripe-fetch",
      "name": "Stripe Get All Customers",
      "type": "n8n-nodes-base.stripe",
      "typeVersion": 1,
      "position": [250, 300]
    },
    {
      "parameters": {
        "mode": "runOnceForAllItems",
        "expression": "={\n  \"id\": $json.id,\n  \"name\": $json.name,\n  \"email\": $json.email,\n  \"created\": $json.created,\n  \"phone\": $json.phone\n}"
      },
      "id": "transform-data",
      "name": "Transform to DB Schema",
      "type": "n8n-nodes-base.set",
      "typeVersion": 3.5,
      "position": [550, 300]
    },
    {
      "parameters": {
        "host": "postgres.example.com",
        "database": "analytics",
        "user": "batch_user",
        "password": "{{ $env.POSTGRES_PASS }}",
        "ssl": true,
        "port": 5432
      },
      "id": "postgres-config",
      "name": "Postgres Connection",
      "type": "n8n-nodes-base.postgres",
      "typeVersion": 2.4,
      "position": [850, 300],
      "credentials": {
        "postgresCredential": "postgres_prod"
      }
    },
    {
      "parameters": {
        "query": "INSERT INTO customers (id, name, email, created, phone, synced_at) VALUES ($1, $2, $3, to_timestamp($4), $5, NOW()) ON CONFLICT (id) DO UPDATE SET name=$2, email=$3, phone=$5, synced_at=NOW();",
        "queryParams": "=id, name, email, created, phone"
      },
      "id": "upsert-customers",
      "name": "Upsert Customers",
      "type": "n8n-nodes-base.postgres",
      "typeVersion": 2.4,
      "position": [850, 450]
    },
    {
      "parameters": {
        "text": "=Batch completed. Synced {{ $node[\"stripe-fetch\"].data.length }} customers to Postgres.",
        "sendTo": "channel",
        "channel": "batch-logs"
      },
      "id": "slack-notify",
      "name": "Notify Slack",
      "type": "n8n-nodes-base.slack",
      "typeVersion": 2.1,
      "position": [1150, 300],
      "credentials": {
        "slackApi": "slack_workspace"
      }
    }
  ],
  "connections": {
    "stripe-fetch": {
      "main": [
        [
          {
            "node": "transform-data",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "transform-data": {
      "main": [
        [
          {
            "node": "upsert-customers",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "upsert-customers": {
      "main": [
        [
          {
            "node": "slack-notify",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

To schedule this daily using n8n’s Trigger

Want to automate this yourself?

Start with n8n Cloud (free tier available) or self-host on a Hetzner VPS for full control.

system online