cancel
Showing results for 
Search instead for 
Did you mean: 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

ingest csv file on-prem to delta table on databricks

pshuk
New Contributor III

Hi,

So I want to create a delta live table using a csv file that I create locally (on-prem). A little background: So I have a working ELT pipeline that finds newly generated files (since the last upload), and upload them to databricks volume and at the same time create a csv file locally with all the meta data information about these files. Is there any way I can create a delta live table at the databricks using this csv file after finishing my upload. I am using databricks CLI to upload files but haven't found a way to create the table using CLI. 

Any help would be greatly appreciated.

TIA.

1 REPLY 1

raphaelblg
Honored Contributor
Honored Contributor

Hello @pshuk ,

Based on your description, you have an external pipeline that writes CSV files to a specific storage location and you wish to set up a DLT based on the output of this pipeline.

DLT offers has access to a feature called Autoloader, which can incrementally list and ingest these files automatically. I recommend starting with a simple scenario based on the Load data with Delta Live Tables guide.

For example:

 

@dlt.table
def raw_data():
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .load("external_pipeline_output_location/")
  )

 

Next, you can explore the Autoloader Settings to further customize your ingestion logic.

It would be beneficial to read about Continuous vs Triggered Pipeline Execution to determine the best trigger option for your pipeline. You can set the DLT to run continuously as a streaming sink, or set the trigger for the pipeline to be on new file events (there are other trigger options as well).

If you would like to perform the DLT setup through the CLI, I suggest you to consult this documentation page as a reference: Develop Delta Live Tables pipelines with Databricks Asset Bundles.

 

Best regards,

Raphael Balogo
Sr. Technical Solutions Engineer
Databricks
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!