cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

ingest csv file on-prem to delta table on databricks

pshuk
New Contributor III

Hi,

So I want to create a delta live table using a csv file that I create locally (on-prem). A little background: So I have a working ELT pipeline that finds newly generated files (since the last upload), and upload them to databricks volume and at the same time create a csv file locally with all the meta data information about these files. Is there any way I can create a delta live table at the databricks using this csv file after finishing my upload. I am using databricks CLI to upload files but haven't found a way to create the table using CLI. 

Any help would be greatly appreciated.

TIA.

1 REPLY 1

raphaelblg
Honored Contributor II

Hello @pshuk ,

Based on your description, you have an external pipeline that writes CSV files to a specific storage location and you wish to set up a DLT based on the output of this pipeline.

DLT offers has access to a feature called Autoloader, which can incrementally list and ingest these files automatically. I recommend starting with a simple scenario based on the Load data with Delta Live Tables guide.

For example:

 

@dlt.table
def raw_data():
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .load("external_pipeline_output_location/")
  )

 

Next, you can explore the Autoloader Settings to further customize your ingestion logic.

It would be beneficial to read about Continuous vs Triggered Pipeline Execution to determine the best trigger option for your pipeline. You can set the DLT to run continuously as a streaming sink, or set the trigger for the pipeline to be on new file events (there are other trigger options as well).

If you would like to perform the DLT setup through the CLI, I suggest you to consult this documentation page as a reference: Develop Delta Live Tables pipelines with Databricks Asset Bundles.

 

Best regards,

Raphael Balogo
Sr. Technical Solutions Engineer
Databricks

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group