Databricks Community

Sanjay_AMP · ‎08-11-2023

Hi all,

We are planning to develop an Autoloader based DLT Pipeline that needs to be

Deployable via a CI/CD Pipeline
Observable

Can somebody please point me to source-code that we can start with a firm foundation instead of falling into a newbie-pattern ?

Thanks in advance

Sanjay

Priyanka_Biswas · ‎08-16-2023

Hi @Sanjay_AMP

Delta Live Tables and AutoLoader can be used together to incrementally ingest data from cloud object storage.
• Python code example:
- Define a table called "customers" that reads data from a CSV file in cloud object storage.
- Define a table called "sales_orders_raw" that reads data from a JSON file in cloud object storage.
• SQL code example:
- Create or refresh a streaming table called "customers" that selects all data from a CSV file in cloud object storage.
- Create or refresh a streaming table called "sales_orders_raw" that selects all data from a JSON file in cloud object storage.
• Options can be passed to the cloud_files() method using the map() function.
• Schema can be specified for formats that don't support schema inference.
• Additional code examples and documentation can be found at the provided sources.

@dlt.table
def customers():
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .load("/databricks-datasets/retail-org/customers/")
  )

@dlt.table
def sales_orders_raw():
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "json")
      .load("/databricks-datasets/retail-org/sales_orders/")
  )

https://docs.databricks.com/en/ingestion/auto-loader/dlt.html