cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks Database synced tables

prakharsachan
New Contributor

When I am deploying synced tables and the pipelines which create the source tables(used by synced tables) using DABs for the first time, the error occurs that the source tables doesnt exist (yes because the pipeline hasnt ran yet), then whats the workaround for this?

1 ACCEPTED SOLUTION

Accepted Solutions

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @prakharsachan,

synced_database_table creation assumes the Unity Catalog source table referenced in spec.source_table_full_name already exists and is readable. The API treats this as the source table to sync from, and if it canโ€™t be read, youโ€™ll see errors like SOURCE_READ_ERROR or TABLE_DOES_NOT_EXIST from the synced table pipeline. In practice, that means you must materialise the source UC table (for example, by running the Lakeflow pipeline once) before creating the synced_database_table resource in a bundle.

Because bundles only define and deploy resources (they donโ€™t run your pipelines to materialize data), there isnโ€™t a one-shot way to create pipeline... run pipeline... and then create synced table in a single DAB deploy today.

The workaround will be a two stage deployment. 

In stage 1, deploy and run the source pipelines. Bundle contains the Lakeflow pipelines that build the UC source tables (no synced_database_tables yet). Databricks bundle deploy, then trigger those pipelines (manually, via Jobs, or CI) so the Delta/UC tables actually exist.

In stage 2, add synced tables to the bundle. This is where you extend the same bundle to add resources.synced_database_tables pointing to those now-existing source_table_full_name tables. Deploy again. create_synced_database_table now succeeds because validation can read the source tables.

This is the same pattern Databricks recommends for other resources that cannot be referenced until they exist, like UC volumes... first deploy the volume, then reference it (for example in artifact_path) in subsequent deployments.

Hope this helps.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

1 REPLY 1

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @prakharsachan,

synced_database_table creation assumes the Unity Catalog source table referenced in spec.source_table_full_name already exists and is readable. The API treats this as the source table to sync from, and if it canโ€™t be read, youโ€™ll see errors like SOURCE_READ_ERROR or TABLE_DOES_NOT_EXIST from the synced table pipeline. In practice, that means you must materialise the source UC table (for example, by running the Lakeflow pipeline once) before creating the synced_database_table resource in a bundle.

Because bundles only define and deploy resources (they donโ€™t run your pipelines to materialize data), there isnโ€™t a one-shot way to create pipeline... run pipeline... and then create synced table in a single DAB deploy today.

The workaround will be a two stage deployment. 

In stage 1, deploy and run the source pipelines. Bundle contains the Lakeflow pipelines that build the UC source tables (no synced_database_tables yet). Databricks bundle deploy, then trigger those pipelines (manually, via Jobs, or CI) so the Delta/UC tables actually exist.

In stage 2, add synced tables to the bundle. This is where you extend the same bundle to add resources.synced_database_tables pointing to those now-existing source_table_full_name tables. Deploy again. create_synced_database_table now succeeds because validation can read the source tables.

This is the same pattern Databricks recommends for other resources that cannot be referenced until they exist, like UC volumes... first deploy the volume, then reference it (for example in artifact_path) in subsequent deployments.

Hope this helps.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***