Hi @arushigulati,
Lakebridge (the Databricks Labs project formerly known as Remorph) does support Oracle as a source dialect for transpilation, but the DDL handling, particularly around constraints like PRIMARY KEY, has some gaps depending on the version you are running. Here is a walkthrough of how to approach this.
UNDERSTANDING THE PRIMARY KEY PARSING ISSUE
The Oracle dialect parser in Lakebridge may not fully handle inline PRIMARY KEY constraint definitions in all cases. This is a known area where coverage is still being expanded. If your Oracle DDL includes an inline PRIMARY KEY like this:
CREATE TABLE CUSTOMERS_V1 (
CUSTOMER_ID NUMBER(10) PRIMARY KEY,
FIRST_NAME VARCHAR2(50),
LAST_NAME VARCHAR2(50)
);
...and the transpiler fails on the PRIMARY KEY clause, there are a few approaches you can use.
OPTION 1: PRE-PROCESS YOUR DDL TO SEPARATE CONSTRAINTS
You can extract the PRIMARY KEY constraints from the CREATE TABLE statements and handle them separately. Convert inline constraints to out-of-line (table-level) constraints before transpiling:
CREATE TABLE CUSTOMERS_V1 (
CUSTOMER_ID NUMBER(10) NOT NULL,
FIRST_NAME VARCHAR2(50),
LAST_NAME VARCHAR2(50)
);
Then, after transpilation, add the PRIMARY KEY constraint as a separate ALTER TABLE statement in Databricks SQL:
ALTER TABLE CUSTOMERS_V1 ADD CONSTRAINT pk_customers PRIMARY KEY (CUSTOMER_ID);
Databricks supports informational PRIMARY KEY and FOREIGN KEY constraints on Unity Catalog managed and external tables (Databricks Runtime 11.3 LTS and above). These constraints are not enforced, but they are useful for documentation, query optimization hints, and compatibility with BI tools.
Documentation: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-table-constraint.html
OPTION 2: POST-PROCESS THE TRANSPILED OUTPUT
Run Lakebridge transpile on your Oracle SQL files, then write a small script to add back any constraints that were dropped during transpilation. For example, you could parse the original Oracle DDL for PRIMARY KEY definitions and generate corresponding ALTER TABLE statements in Databricks SQL syntax.
A simple Python approach:
import re
oracle_ddl = open("original_oracle.sql").read()
pattern = r'CREATE\s+TABLE\s+(\w+)\s*\(.*?(\w+)\s+\w+.*?PRIMARY\s+KEY'
for match in re.finditer(pattern, oracle_ddl, re.DOTALL | re.IGNORECASE):
table_name = match.group(1)
col_name = match.group(2)
print(f"ALTER TABLE {table_name} ADD CONSTRAINT pk_{table_name.lower()} PRIMARY KEY ({col_name});")
OPTION 3: FILE A GITHUB ISSUE FOR ORACLE PRIMARY KEY SUPPORT
Since Lakebridge is an open-source Databricks Labs project, you can file a bug or feature request on the GitHub repository:
https://github.com/databrickslabs/remorph/issues
There are existing issues tracking Oracle dialect improvements (for example, issue #2170 for Oracle outer join syntax, and issue #1801 for Oracle DDL data type conversions). Filing an issue specifically for inline PRIMARY KEY constraint parsing in Oracle DDL will help the maintainers prioritize this.
OPTION 4: USE THE DATABRICKS CLI TRANSPILE COMMAND WITH THE LATEST VERSION
Make sure you are running the latest version of Lakebridge, as Oracle dialect support has been actively improved. You can install or update via:
pip install databricks-labs-lakebridge --upgrade
Then run the transpile with:
databricks labs lakebridge transpile --source-dialect oracle --input-dir /path/to/oracle/sql --output-dir /path/to/output
Check the output directory for any error reports or partial transpilation results. The tool generates a summary that shows which files transpiled successfully and which had issues.
KEY POINTS ABOUT DATABRICKS CONSTRAINTS
Once your DDL is transpiled, keep in mind:
1. PRIMARY KEY and FOREIGN KEY constraints in Databricks are informational (not enforced). They are supported on Unity Catalog tables.
2. You can add constraints during CREATE TABLE or via ALTER TABLE:
CREATE TABLE catalog.schema.customers_v1 (
customer_id INT NOT NULL,
first_name STRING,
last_name STRING,
CONSTRAINT pk_customers PRIMARY KEY (customer_id)
);
3. These constraints are valuable for downstream BI tools, query optimizers, and documentation even though Databricks does not enforce them at write time.
ADDITIONAL RESOURCES
Lakebridge documentation: https://databrickslabs.github.io/lakebridge/
Lakebridge GitHub repository: https://github.com/databrickslabs/remorph
Databricks SQL constraints reference: https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-table-constraint.html
Migration guide: https://docs.databricks.com/en/migration/index.html
I hope this helps you move forward with your Oracle to Databricks migration. If you can share the specific error message you are seeing, I can provide more targeted guidance.
* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.