How to Connect Oracle Database to Apache Iceberg?

If you’re working with Oracle Database and want to leverage the power of Apache Iceberg for your data Lakehouse. Let me walk you through the different ways you can make this connection happen from simple batch loads to real-time CDC pipelines.

Why Connect Oracle to Iceberg Anyway?

Before we dive into the “how,” let’s quickly talk about the “why.” Oracle databases are great for transactional workloads, but when you need to run large-scale analytics, machine learning models, or feed data to multiple downstream systems, Iceberg gives you that flexibility.

You get an open table format that works across different compute engines (Spark, Flink, Trino), time-travel capabilities for historical queries, and schema evolution without the headaches.

Three Main Approaches:

There are essentially three ways to connect Oracle to Iceberg, depending on your requirements:

Batch loading using Oracle Data Transforms
Real-time CDC (Change Data Capture) using tools like Goldengate or Estuary Flow
Manual ETL using Spark or custom scripts

Let me break down each approach so you can pick what works best for your use case.

Approach 1: Oracle Data Transforms (The Oracle-Native Way)

If you’re already in the Oracle ecosystem and want a straightforward solution, Oracle Data Transforms supports Apache Iceberg as a target.[2]

Setting Up the Connection:

First, you’ll create an Apache Iceberg connection in Oracle Data Transforms:

Catalog Type: Choose REST Catalog (stores Iceberg data in Oracle Object Storage)
Catalog URI: Your REST catalog endpoint
Warehouse Location: Where you want to store the data (e.g., in Oracle Object Storage)
Authentication: Either OAuth2 (`client_credentials` or `authorization_code`)

Loading Data

Create a Data Load and select your Oracle database as the source
Choose Apache Iceberg as the target connection
Specify or create a namespace for your Iceberg tables
Select the tables you want to load
Choose your load operation: Incremental Merge or Incremental Append

Approach 2: Real-Time CDC with Oracle GoldenGate

Why GoldenGate for Iceberg?

GoldenGate captures changes from your Oracle database in real-time and streams them to Iceberg tables.

Keeping your data lake in sync with production databases
Feeding real-time analytics dashboards
Building event-driven architectures
Supporting AI/ML pipelines that need fresh data

The Setup Process

Oracle has detailed documentation for replicating from Autonomous Database to Iceberg, and the process is similar for regular Oracle databases

Configure your source Oracle database with supplemental logging enabled
Set up GoldenGate deployment in OCI (or on-premises)
Create an Iceberg connection in GoldenGate pointing to your catalog (AWS Glue, REST catalog, etc.)
Define your replication by mapping source tables to target Iceberg tables
Start the replication and monitor the data flow

Approach 3: Third-Party CDC Tools (Estuary Flow, Airbyte)

If you’re looking for alternatives to GoldenGate or want something cloud-agnostic, tools like Estuary Flow and Airbyte offer Oracle-to-Iceberg connectors.

Estuary Flow Approach

Estuary Flow is particularly interesting because it’s built for real-time streaming with a no-code setup.

Prerequisites:

Oracle 11g or later
Oracle user with SELECT access and LogMiner views (for CDC)
Supplemental logging enabled
S3 bucket for Iceberg storage
AWS Glue or REST catalog for Iceberg metadata
EMR Serverless (for standard updates)

Step 1: Configure Oracle as Source

Log into Estuary Flow, create a new capture, and select Oracle. Fill in your connection details hostname, port, database name, credentials and optionally configure SSH tunneling if your database is behind a firewall.

Step 2: Configure Iceberg as Destination

Set up your Iceberg materialization by providing:

Catalog type (AWS Glue or REST)
S3 bucket details
AWS credentials with appropriate permissions
EMR Serverless application ID

Step 3: Map and Sync

Define which Oracle tables should sync to which Iceberg tables, configure any transformations, and start the pipeline. Estuary handles CDC, schema enforcement, and orchestration automatically.

Airbyte Alternative

Airbyte offers a similar no-code approach:

Set up Oracle DB as source connector
Choose Apache Iceberg as destination
Define sync frequency and data mapping
Let Airbyte handle the extraction and loading.

You can self-host Airbyte Open Source or use their managed cloud service.

Approach 4: Manual Integration with Spark

For those who prefer full control or have custom requirements, you can build your own pipeline using Apache Spark.

High-Level Steps

Extract from Oracle:

Use JDBC to connect Spark to Oracle and read data into DataFrames.

Transform (if needed):

Apply any business logic or data cleaning using Spark’s DataFrame API.

Write to Iceberg:

Configure Spark with Iceberg libraries and write data to Iceberg tables.

Wrapping Up

Connecting Oracle Database to Apache Iceberg opens up a world of possibilities for modern analytics, AI/ML workloads, and data sharing across teams. Whether you go with Oracle’s native tools, GoldenGate’s real-time CDC, third-party platforms like Estuary Flow, or custom Spark pipelines, the key is choosing the approach that matches your technical requirements and team capabilities.

Gokul kumar

We take you from best practices to next practices of software implementation

Solutions that Ensure You Stay Ahead of Competition

Your Partner for Strategic Technology Solutions

Guiding Your Digital Transformation Journey

Why Connect Oracle to Iceberg Anyway?

Three Main Approaches:

Approach 1: Oracle Data Transforms (The Oracle-Native Way)

Approach 2: Real-Time CDC with Oracle GoldenGate

Approach 3: Third-Party CDC Tools (Estuary Flow, Airbyte)

Approach 4: Manual Integration with Spark

We take you from best practices to next practices of software implementation

Solutions that Ensure You Stay Ahead of Competition​

Your Partner for Strategic Technology Solutions

Guiding Your Digital Transformation Journey

How to Connect Oracle Database to Apache Iceberg?

Why Connect Oracle to Iceberg Anyway?

Three Main Approaches:

Approach 1: Oracle Data Transforms (The Oracle-Native Way)

Approach 2: Real-Time CDC with Oracle GoldenGate

Approach 3: Third-Party CDC Tools (Estuary Flow, Airbyte)

Approach 4: Manual Integration with Spark

Solutions that Ensure You Stay Ahead of Competition