Airbyte vs n8n vs Fivetran: ETL Pipelines

Airbyte vs n8n vs Fivetran: ETL Pipelines

What You’ll Need

Table of Contents

The Core Differences

I’ve been building data pipelines for five years now, and the “which ETL tool” question keeps coming up. The answer? It depends—but let me break down why these three platforms diverge in ways that matter.

Airbyte is the open-source movement’s answer to expensive proprietary ETL. It’s built for data engineers who want control and transparency. n8n started as workflow automation but evolved into a capable ETL platform with a visual builder that makes complex logic accessible. Fivetran is the enterprise play—fully managed, pre-built connectors, and you pay for convenience.

Here’s the practical reality: if you’re bootstrapped or learning, Airbyte or n8n are your friends. If your company budgeted six figures for data infrastructure, Fivetran’s your bet. But cost alone isn’t the story. Let me walk you through what actually matters.

Airbyte Deep Dive

Airbyte launched in 2020 and quickly became the darling of the self-hosted ETL crowd. Why? It solved a real problem: data connectors are a nightmare to build from scratch.

I deployed Airbyte on a Hetzner VPS last year and was shocked at how straightforward the connector library is. Out of the box, you get 300+ pre-built connectors for everything from Stripe to Salesforce to PostgreSQL.

The architecture is clean: a Docker-based Airbyte server runs on your infrastructure, manages jobs, and stores metadata. You point it at source and destination databases, define sync schedules, and it handles the rest.

Here’s what a minimal Airbyte Docker setup looks like:

git clone https://github.com/airbytehq/airbyte.git
cd airbyte
docker-compose up

That’s it. Airbyte runs on localhost:8000 and you configure everything via the web UI. But here’s where it gets interesting for developers: you can define custom connectors in Python.

Let me show you a custom Python connector skeleton:

from abc import ABC, abstractmethod
from typing import Any, Iterable, List, Mapping, Optional, Tuple

from airbyte_cdk.models import AirbyteCatalog, AirbyteConnectionStatus, AirbyteMessage, AirbyteStream, SyncMode, Type
from airbyte_cdk.sources import Source


class CustomPostgresConnector(Source):
    def check(self, logger, config: Mapping[str, Any]) -> AirbyteConnectionStatus:
        try:
            import psycopg2
            conn = psycopg2.connect(
                host=config["host"],
                port=config["port"],
                user=config["user"],
                password=config["password"],
                database=config["database"]
            )
            conn.close()
            return AirbyteConnectionStatus(status=Type.SUCCEEDED)
        except psycopg2.Error as e:
            return AirbyteConnectionStatus(status=Type.FAILED, message=str(e))

    def discover(self, logger, config: Mapping[str, Any]) -> AirbyteCatalog:
        import psycopg2
        conn = psycopg2.connect(
            host=config["host"],
            port=config["port"],
            user=config["user"],
            password=config["password"],
            database=config["database"]
        )
        cursor = conn.cursor()
        cursor.execute("""
            SELECT table_name 
            FROM information_schema.tables 
            WHERE table_schema = 'public'
        """)
        tables = cursor.fetchall()
        streams = []
        
        for table in tables:
            table_name = table[0]
            cursor.execute(f"""
                SELECT column_name, data_type 
                FROM information_schema.columns 
                WHERE table_name = '{table_name}'
            """)
            columns = cursor.fetchall()
            properties = {}
            for col_name, col_type in columns:
                properties[col_name] = {"type": "string"}
            
            stream = AirbyteStream(
                name=table_name,
                json_schema={
                    "type": "object",
                    "properties": properties
                },
                supported_sync_modes=[SyncMode.full_refresh, SyncMode.incremental]
            )
            streams.append(stream)
        
        cursor.close()
        conn.close()
        return AirbyteCatalog(streams=streams)

    def read(self, logger, config: Mapping[str, Any], catalog: AirbyteCatalog, state: Optional[Mapping[str, Any]]) -> Iterable[AirbyteMessage]:
        import psycopg2
        import json
        
        conn = psycopg2.connect(
            host=config["host"],
            port=config["port"],
            user=config["user"],
            password=config["password"],
            database=config["database"]
        )
        cursor = conn.cursor()
        
        for stream in catalog.streams:
            cursor.execute(f"SELECT * FROM {stream.name}")
            columns = [desc[0] for desc in cursor.description]
            
            for row in cursor.fetchall():
                record = dict(zip(columns, row))
                yield AirbyteMessage(
                    type=Type.RECORD,
                    record={
                        "stream": stream.name,
                        "data": record,
                        "emitted_at": int(__import__('time').time() * 1000)
                    }
                )
        
        cursor.close()
        conn.close()

That’s a fully functional Airbyte source. Airbyte expects this to be packaged and registered in their connector registry. The benefit? Complete control over how data flows.

Cost reality: Airbyte is free if self-hosted. They offer managed cloud at $0.50 per sync job, which adds up fast if you run hundreds daily. Storage is separate.

When Airbyte wins: You need ultimate control, have a large engineering team, or need custom logic connectors can’t provide.

n8n for ETL Workflows

I built my first production n8n workflow in 2022 and immediately saw why people call it “Zapier but actually yours.” n8n Cloud gives you hosted workflows, but if you care about data sovereignty—or price—self-host it.

The key difference from Airbyte: n8n is workflow-first, not data-movement-first. You can absolutely use it for ETL, but the mental model is “what operations do I want to chain together?” rather than “how do I sync this database?”

That said, n8n has one advantage Airbyte doesn’t: it’s a full integration platform. You can transform, validate, and route data in the same platform.

Here’s a practical ETL workflow in n8n JSON (this is what the UI generates):

{
  "nodes": [
    {
      "parameters": {
        "host": "postgres.example.com",
        "port": 5432,
        "user": "{{ $env.DB_USER }}",
        "password": "{{ $env.DB_PASSWORD }}",
        "database": "source_db",
        "ssl": true,
        "query": "SELECT id, email, created_at FROM users WHERE created_at > NOW() - INTERVAL '1 day'"
      },
      "name": "Fetch Recent Users",
      "type": "n8n-nodes-base.postgres",
      "typeVersion": 2,
      "position": [250, 300],
      "credentials": {
        "postgres": "prod_postgres_creds"
      }
    },
    {
      "parameters": {
        "operation": "create",
        "schema": "public",
        "table": "user_emails",
        "columns": "id,email,created_at",
        "dataToInsert": "={{ $json }}"
      },
      "name": "Insert to Data Warehouse",
      "type": "n8n-nodes-base.postgres",
      "typeVersion": 2,
      "position": [550, 300],
      "credentials": {
        "postgres": "warehouse_postgres_creds"
      },
      "dependsOn": ["Fetch Recent Users"]
    },
    {
      "parameters": {
        "operation": "post",
        "url": "https://api.segment.com/v1/batch",
        "authentication": "basicAuth",
        "basicAuth": "segment_auth",
        "sendBody": true,
        "bodyParameters": {
          "parameters": [
            {
              "name": "batch",
              "value": "={{ $json.map(row => ({ userId: row.id, traits: { email: row.email }, timestamp: row.created_at })) }}"
            }
          ]
        }
      },
      "name": "Send to Segment",
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 3,
      "position": [850, 300],
      "dependsOn": ["Insert to Data Warehouse"]
    },
    {
      "parameters": {},
      "name": "Start",
      "type": "n8n-nodes-base.start",
      "typeVersion": 1,
      "position": [50, 300]
    }
  ],
  "connections": {
    "Start": {
      "main": [
        [
          {
            "node": "Fetch Recent Users",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Fetch Recent Users": {
      "main": [
        [
          {
            "node": "Insert to Data Warehouse",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Insert to Data Warehouse": {
      "main": [
        [
          {
            "node": "Send to Segment",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

Notice what just happened: we fetched from Postgres, inserted into a data warehouse, and pushed to a third-party API—all in one workflow. Airbyte would require two separate connectors and external orchestration to chain them.

Self-hosting n8n Cloud on a Contabo VPS costs you ~$5-10/month for infrastructure, and the software is free. Managed n8n Cloud starts at $25/month.

💡 Fast-Track Your Project: Don’t want to configure this yourself? I build custom n8n pipelines and bots. Message me with code SYS3-HUGO.

Fivetran’s Enterprise Approach

Fivetran is the opposite of DIY. You pay Fivetran thousands per month, and they handle the connectors, transformations, monitoring—everything.

The appeal is real: pre-built connectors for 300+ data sources, automatic schema updates, connectors that just work. No Python,

Want to automate this yourself?

Start with n8n Cloud (free tier available) or self-host on a Hetzner VPS for full control.

system online