cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Does Delta Table can be the source of streaming/auto loader?

QPeiran
New Contributor III

Hi,

Since the Auto Loader only accept "append-only" data as the source, I am wondering if the "Delta Table" can also be the source.

Does VACCUM(deleting stale files) or _delta_log(creating nested and different file format than parquet) going to break Auto Loader mechanism? 

2 REPLIES 2

QPeiran
New Contributor III

What I am confused of is on this page https://docs.databricks.com/en/ingestion/auto-loader/options.html#file-format-options 

It indicated a various of formats that can be ingested as the "Source" of Auto Loader, but Delta Lake is not mentioned anywhere, which makes me wondering whether Auto Loader can ingest Delta Lake files in streaming manner.

The Delta Lake VACCUM operation does remove files, so I am not sure if this kind of removal still apply to Auto Loader's "append only" rule or going to break it.

In terms of _delta_log, it is storing check point files in PARQUET but also has a mix of JSON and CRC files. Will this mix of files going to break the Auto Loader?

artsheiko
Honored Contributor

Hi @QPeiran,

Auto-loader is a feature that allows to integrate files into the Data Platform. Once your data is stored into the Delta Table, you can rely on spark.readStream.table("<my_table_name>") to continuously read from the table.

Take a look at the CDC demo showcasing the integration with Autoloader and applying modifications using Structured Streaming.

Depending on your needs, it's possible that the Materialized views could be useful in your use-case - you can create a bronze layer with autoloader and then add a MV on top of it.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group