Hello @pshuk ,
Based on your description, you have an external pipeline that writes CSV files to a specific storage location and you wish to set up a DLT based on the output of this pipeline.
DLT offers has access to a feature called Autoloader, which can incrementally list and ingest these files automatically. I recommend starting with a simple scenario based on the Load data with Delta Live Tables guide.
For example:
@dlt.table
def raw_data():
return (
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "csv")
.load("external_pipeline_output_location/")
)
Next, you can explore the Autoloader Settings to further customize your ingestion logic.
It would be beneficial to read about Continuous vs Triggered Pipeline Execution to determine the best trigger option for your pipeline. You can set the DLT to run continuously as a streaming sink, or set the trigger for the pipeline to be on new file events (there are other trigger options as well).
If you would like to perform the DLT setup through the CLI, I suggest you to consult this documentation page as a reference: Develop Delta Live Tables pipelines with Databricks Asset Bundles.
Best regards,
Raphael Balogo
Sr. Technical Solutions Engineer
Databricks