cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Ingest data from SQL Server

Vamsi_S
New Contributor II

I've been working on data ingestion from SQL Server to UC using lakeflow connect. Lakeflow connect actually made the work easier when everything is right. I am trying to incorporate this with DAB and this would work fine with schema and table tags for 'object' in 'ingestion_definition'. What if someone wants to clean the table names before ingesting them into UC? For now, 'object' only allows 'report', 'table', 'schema'. If I create a python file or notebook that gets the table names from SQL Server using 'source_schema', and then clean/modify the table names, I would not be able to dynamically ingest tables. I understand that DAB doesn't allow runtime ingestion/modification of files, but I would love to have a way so that I can do something like below.
pipeline_sqlserver:
name: sqlserver-ingestion-pipeline
ingestion_definition:
ingestion_gateway_id: ${resources.pipelines.gateway.id}
objects: [] 
    library:
         file: .yml/.json/any other format

Link to Ingestion documentation: Ingest data from SQL Server | Databricks on AWS

Give objects an empty list and get the names from a yaml file or json file or any other format that tables can be saved into from preprocessed script.
If anyone had already faced the same issue and have a solution to it, I would greatly appreciate if you can share it here. Thanks!

3 REPLIES 3

Khaja_Zaffer
Contributor

Hello @Vamsi_S 

Good day! Did you Preprocessing Table Names in CI/CD and Generate YAML Dynamically (Recommended for Dynamic, Automated Ingestion)

Did you contact your databricks account manager (incase if you working with a company) for feature request?

Vamsi_S
New Contributor II

Hey @Khaja_Zaffer 

Thanks for sharing your thoughts. Yes, dynamic generation of YAML works in this case, but I wanted to know if there is any other workaround that is officially made available by Databricks.

I did not contact anyone from Databricks. Before reaching out I wanted to try all different possibilities. 
Thank you again!

Vamsi_S
New Contributor II

I’ve been using the notebook style as there are some transformations that need to be done on destination_table names as the source_table names has special characters. My requirement is: for example, I have a schema with 5 tables, 10 individual tables from another schema. Since each table name needs to be transformed, I am getting all the tables and creating a list of objects. In the first run if the number of objects is 15 it creates pipeline with the name specified in the notebook. And by the second run of the pipeline, a new table gets created in the schema, we will have 16 objects which changes the ingestion definition of the pipeline. So, the second run creates a new pipeline with name “Managed Ingestion Pipeline - <original-pipeline_id>”. This has been the issue as we are not able to capture the history of the pipeline properly. I just want a way to update the ingestion definition of the pipeline but keep same pipeline name. Find the link to the documentation of notebook style Ingestoin Pipelines with Lakeflow Connect below.
SQL Server CDC Connector Setup

This is a code snippet I am working on:

Vamsi_S_0-1757599917896.png

Please advise if anyone has a solution to this issue.