ingest csv file on-prem to delta table on databricks

pshuk — Wed, 22 May 2024 15:20:10 GMT

Hi,

So I want to create a delta live table using a csv file that I create locally (on-prem). A little background: So I have a working ELT pipeline that finds newly generated files (since the last upload), and upload them to databricks volume and at the same time create a csv file locally with all the meta data information about these files. Is there any way I can create a delta live table at the databricks using this csv file after finishing my upload. I am using databricks CLI to upload files but haven't found a way to create the table using CLI.

Any help would be greatly appreciated.

TIA.

Re: ingest csv file on-prem to delta table on databricks

raphaelblg — Wed, 22 May 2024 20:40:49 GMT

Hello @pshuk ,

Based on your description, you have an external pipeline that writes CSV files to a specific storage location and you wish to set up a DLT based on the output of this pipeline.

DLT offers has access to a feature called Autoloader, which can incrementally list and ingest these files automatically. I recommend starting with a simple scenario based on the Load data with Delta Live Tables guide.

For example:

@dlt.table def raw_data(): return ( spark.readStream.format("cloudFiles") .option("cloudFiles.format", "csv") .load("external_pipeline_output_location/") )

Next, you can explore the Autoloader Settings to further customize your ingestion logic.

It would be beneficial to read about Continuous vs Triggered Pipeline Execution to determine the best trigger option for your pipeline. You can set the DLT to run continuously as a streaming sink, or set the trigger for the pipeline to be on new file events (there are other trigger options as well).

If you would like to perform the DLT setup through the CLI, I suggest you to consult this documentation page as a reference: Develop Delta Live Tables pipelines with Databricks Asset Bundles.

topic Re: ingest csv file on-prem to delta table on databricks in Get Started Discussions

ingest csv file on-prem to delta table on databricks

Re: ingest csv file on-prem to delta table on databricks