cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Best Practices for Daily Source-to-Bronze Data Ingestion in Databricks

JissMathew
New Contributor II

How can we effectively manage source-to-bronze data ingestion from a project perspective, particularly when considering daily scheduling strategies using either Auto Loader or Serverless Warehouse COPY INTO commands?

3 REPLIES 3

BigRoux
Databricks Employee
Databricks Employee

JissMathew, you have options when ingesting data from Cloud Storage into Delta Lake.

Your first approach should be with Delta Live Tables, it will be the easiest and most cost effective approach.  Among other benefits it will manage the infrastructure for you so that you don't have to think about what type of compute resources you should be using.

Another option is to create structure streaming jobs and schedule using Workflows.  Like DLT, you can leverage autoloader to pull files from cloud storage in a very efficient way.  

DLT is the best way forward though.

Cheers, Louis.

JissMathew
New Contributor II

@BigRoux  thank you for your reply , for implementing dlt is that we need multi node cluster ?

 

BigRoux
Databricks Employee
Databricks Employee

No, it is not a strict requirement. You can have a single node job cluster run the job if the job is small.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group