Databricks Community

pshuk · ‎05-22-2024

Hi,

So I want to create a delta live table using a csv file that I create locally (on-prem). A little background: So I have a working ELT pipeline that finds newly generated files (since the last upload), and upload them to databricks volume and at the same time create a csv file locally with all the meta data information about these files. Is there any way I can create a delta live table at the databricks using this csv file after finishing my upload. I am using databricks CLI to upload files but haven't found a way to create the table using CLI.

Any help would be greatly appreciated.

TIA.

raphaelblg · ‎05-22-2024

Hello @pshuk ,

Based on your description, you have an external pipeline that writes CSV files to a specific storage location and you wish to set up a DLT based on the output of this pipeline.

DLT offers has access to a feature called Autoloader, which can incrementally list and ingest these files automatically. I recommend starting with a simple scenario based on the Load data with Delta Live Tables guide.

For example:

@dlt.table
def raw_data():
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .load("external_pipeline_output_location/")
  )

Next, you can explore the Autoloader Settings to further customize your ingestion logic.

It would be beneficial to read about Continuous vs Triggered Pipeline Execution to determine the best trigger option for your pipeline. You can set the DLT to run continuously as a streaming sink, or set the trigger for the pipeline to be on new file events (there are other trigger options as well).

If you would like to perform the DLT setup through the CLI, I suggest you to consult this documentation page as a reference: Develop Delta Live Tables pipelines with Databricks Asset Bundles.

Best regards,

Raphael Balogo
Sr. Technical Solutions Engineer
Databricks

Databricks Community

ingest csv file on-prem to delta table on databricks

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Share Your Feedback in Our Community Survey

Databricks Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks