cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Best Practices for Daily Source-to-Bronze Data Ingestion in Databricks

JissMathew
Valued Contributor

How can we effectively manage source-to-bronze data ingestion from a project perspective, particularly when considering daily scheduling strategies using either Auto Loader or Serverless Warehouse COPY INTO commands?

Jiss Mathew
India .
3 REPLIES 3

BigRoux
Databricks Employee
Databricks Employee

JissMathew, you have options when ingesting data from Cloud Storage into Delta Lake.

Your first approach should be with Delta Live Tables, it will be the easiest and most cost effective approach.  Among other benefits it will manage the infrastructure for you so that you don't have to think about what type of compute resources you should be using.

Another option is to create structure streaming jobs and schedule using Workflows.  Like DLT, you can leverage autoloader to pull files from cloud storage in a very efficient way.  

DLT is the best way forward though.

Cheers, Louis.

JissMathew
Valued Contributor

@BigRoux  thank you for your reply , for implementing dlt is that we need multi node cluster ?

 

Jiss Mathew
India .

BigRoux
Databricks Employee
Databricks Employee

No, it is not a strict requirement. You can have a single node job cluster run the job if the job is small.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now