cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Lakebridge transpile to translate from oracle to databricks sql

arushigulati
Databricks Partner

Hi Community,

I am currently working on a PoC to migrate data from Oracle to Databricks. As part of this, we are attempting to automate the DDL conversion process.

We are leveraging Databricks Labs Lakebridge for transpilation, but it is failing to convert a standard Oracle DDL. It appears the transpiler is unable to parse the PRIMARY KEY constraint.

CLI Command Used:

Bash
 
databricks labs lakebridge transpile \
--source-dialect oracle \
--input-source $INPUT_PATH \
--output-folder $OUTPUT_PATH \
--error-file-path $ERROR_PATH

Source Oracle DDL:

SQL
 
CREATE TABLE CUSTOMERS_V1 (
    CUSTOMER_ID NUMBER NOT NULL,
    CUSTOMER_NAME VARCHAR2(100) NOT NULL,
    EMAIL VARCHAR2(100),
    CITY VARCHAR2(50),
    PRIMARY KEY (CUSTOMER_ID)
);

Error - 

arushigulati_0-1769670207329.png

arushigulati_1-1769670261737.png

 

 - It is unable to parse simple PRIMARY KEY keyword.

Observations: If I remove the PRIMARY KEY line, the transpilation proceeds, but this isn't ideal for a large-scale migration where we need to maintain schema integrity (even if Primary Keys are only informational in Unity Catalog).

Has anyone else encountered this limitation with the Lakebridge CLI? Is there a specific configuration or a newer dialect flag I should be using to ensure constraints are handled correctly?

Thanks in advance for your help!

2 REPLIES 2

Marc_Gibson96
Contributor

Hi arushigulati,

I encountered the same issues when I was attempting a similar POC between Oracle and Databricks SQL / Python. I found it could cover a good ~80% of what was fed through out of the box but would not always cover the specific SQL flavor or how the original script author formatted their code. 

To get around this I ended up writing a custom configuration for Bladebridge. Within this I would;
Inherit the original oracle base config that the out of the box product uses (base_oracle2databricks_sql.json) and then write my customization on top.

Based on your specific issue, you could try and follow the section around line_subst for your specific customization need on primary key.

Once you are done make sure to re-run the install-transpile command and reference the path of your new config (example from linked documentation)

databricks labs lakebridge install-transpile
Do you want to override the existing installation? (default: no): yes
Specify the config file to override the default[Bladebridge] config - press <enter> for none (default: <none>):
my_custom_config.json

 

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @arushigulati,

Lakebridge (the Databricks Labs project formerly known as Remorph) does support Oracle as a source dialect for transpilation, but the DDL handling, particularly around constraints like PRIMARY KEY, has some gaps depending on the version you are running. Here is a walkthrough of how to approach this.

UNDERSTANDING THE PRIMARY KEY PARSING ISSUE

The Oracle dialect parser in Lakebridge may not fully handle inline PRIMARY KEY constraint definitions in all cases. This is a known area where coverage is still being expanded. If your Oracle DDL includes an inline PRIMARY KEY like this:

CREATE TABLE CUSTOMERS_V1 (
  CUSTOMER_ID NUMBER(10) PRIMARY KEY,
  FIRST_NAME  VARCHAR2(50),
  LAST_NAME   VARCHAR2(50)
);

...and the transpiler fails on the PRIMARY KEY clause, there are a few approaches you can use.

OPTION 1: PRE-PROCESS YOUR DDL TO SEPARATE CONSTRAINTS

You can extract the PRIMARY KEY constraints from the CREATE TABLE statements and handle them separately. Convert inline constraints to out-of-line (table-level) constraints before transpiling:

CREATE TABLE CUSTOMERS_V1 (
  CUSTOMER_ID NUMBER(10) NOT NULL,
  FIRST_NAME  VARCHAR2(50),
  LAST_NAME   VARCHAR2(50)
);

Then, after transpilation, add the PRIMARY KEY constraint as a separate ALTER TABLE statement in Databricks SQL:

ALTER TABLE CUSTOMERS_V1 ADD CONSTRAINT pk_customers PRIMARY KEY (CUSTOMER_ID);

Databricks supports informational PRIMARY KEY and FOREIGN KEY constraints on Unity Catalog managed and external tables (Databricks Runtime 11.3 LTS and above). These constraints are not enforced, but they are useful for documentation, query optimization hints, and compatibility with BI tools.

Documentation: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-table-constraint.html

OPTION 2: POST-PROCESS THE TRANSPILED OUTPUT

Run Lakebridge transpile on your Oracle SQL files, then write a small script to add back any constraints that were dropped during transpilation. For example, you could parse the original Oracle DDL for PRIMARY KEY definitions and generate corresponding ALTER TABLE statements in Databricks SQL syntax.

A simple Python approach:

import re

oracle_ddl = open("original_oracle.sql").read()
pattern = r'CREATE\s+TABLE\s+(\w+)\s*\(.*?(\w+)\s+\w+.*?PRIMARY\s+KEY'
for match in re.finditer(pattern, oracle_ddl, re.DOTALL | re.IGNORECASE):
    table_name = match.group(1)
    col_name = match.group(2)
    print(f"ALTER TABLE {table_name} ADD CONSTRAINT pk_{table_name.lower()} PRIMARY KEY ({col_name});")

OPTION 3: FILE A GITHUB ISSUE FOR ORACLE PRIMARY KEY SUPPORT

Since Lakebridge is an open-source Databricks Labs project, you can file a bug or feature request on the GitHub repository:

https://github.com/databrickslabs/remorph/issues

There are existing issues tracking Oracle dialect improvements (for example, issue #2170 for Oracle outer join syntax, and issue #1801 for Oracle DDL data type conversions). Filing an issue specifically for inline PRIMARY KEY constraint parsing in Oracle DDL will help the maintainers prioritize this.

OPTION 4: USE THE DATABRICKS CLI TRANSPILE COMMAND WITH THE LATEST VERSION

Make sure you are running the latest version of Lakebridge, as Oracle dialect support has been actively improved. You can install or update via:

pip install databricks-labs-lakebridge --upgrade

Then run the transpile with:

databricks labs lakebridge transpile --source-dialect oracle --input-dir /path/to/oracle/sql --output-dir /path/to/output

Check the output directory for any error reports or partial transpilation results. The tool generates a summary that shows which files transpiled successfully and which had issues.

KEY POINTS ABOUT DATABRICKS CONSTRAINTS

Once your DDL is transpiled, keep in mind:

1. PRIMARY KEY and FOREIGN KEY constraints in Databricks are informational (not enforced). They are supported on Unity Catalog tables.

2. You can add constraints during CREATE TABLE or via ALTER TABLE:

CREATE TABLE catalog.schema.customers_v1 (
  customer_id INT NOT NULL,
  first_name STRING,
  last_name STRING,
  CONSTRAINT pk_customers PRIMARY KEY (customer_id)
);

3. These constraints are valuable for downstream BI tools, query optimizers, and documentation even though Databricks does not enforce them at write time.

ADDITIONAL RESOURCES

Lakebridge documentation: https://databrickslabs.github.io/lakebridge/
Lakebridge GitHub repository: https://github.com/databrickslabs/remorph
Databricks SQL constraints reference: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-table-constraint.html
Migration guide: https://docs.databricks.com/en/migration/index.html

I hope this helps you move forward with your Oracle to Databricks migration. If you can share the specific error message you are seeing, I can provide more targeted guidance.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.