cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to make streaming files?

RIDBX
New Contributor III

Thanks for reviewing my threads.

I am trying to test streaming table /files in databricks FREE edition.

-- Create test streaming table
CREATE OR REFRESH STREAMING TABLE user.demo.test_bronze_st AS
SELECT * FROM STREAM read_files('/Volumes/xxx_ws/demo/raw_files/test');

I created test file via note upload (data upload in catalog). It created the file, but not treating this as stream file.

The above create table failing

but 

CREATE OR REFRESH TABLE user.demo.test_bronze_st AS
SELECT * FROM  read_files('/Volumes/xxx_ws/demo/raw_files/test');

is working. The output window shows 0 rows created, but when I checked the catalog, it shows a different story. I see the data there in user.demo.test_bronze_st table.

Is this a right behavior? 

How do I make a streaming file with input data files sitting on my local window folder?

Thanks .

2 REPLIES 2

ManojkMohan
Honored Contributor II

@RIDBX  The Free Edition only allows access to serverless compute resources, and many advanced streaming features are not supported. For example, custom storage locations and online/streaming tables are explicitly noted as unsupported features in this tier.

https://docs.databricks.com/aws/en/getting-started/free-edition-limitations

Databricks in general requires files to be present in cloud storage mounted to your workspace https://docs.databricks.com/aws/en/getting-started/free-edition-limitations

Alternate approaches

  • simulate streaming by incrementally adding new files or data batches to a source folder or table and then re-running batch queries that consume only the new data
  • Upload files in small increments to a cloud-mounted directory (like DBFS). Then use Auto Loader with batch mode to process newly added files during each job run. This mimics streaming ingestion
  • In notebooks, use loops or scheduled notebook jobs that periodically append new data files or rows into a Delta table, reading from that table each time as new input arrives

RIDBX
New Contributor III

Thanks for weighing in. Are you saying 

CREATE OR REFRESH STREAMING TABLE user.demo.test_bronze_st  cannot be used in FREE Edition?

If we can use it, how do to create STREAM read_files('/Volumes/xxx_ws/demo/raw_files/test.csv'),

where .csv sitting on local drive?

Thanks .

 

 

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now