cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

What is the bestway to handle huge gzipped file dropped to S3 ?

RIDBX
New Contributor II

What is the bestway to handle huge gzipped file dropped to S3 ?=================================================

I find some intereting suggestions for posted questions. Thanks for reviewing my threads. Here is the situation we have.

We are getting data feed from on-prem to S3 , where datafeed is able to push data in only gzip format. When the dropped file are huge in s3 bucket/folder (eg north of 20GB). We are facing loading challenges in databricks. Our current databricks autoloader takes very long time with risk of retry upon failures of this load.

We know other databases have bulk load options with Parallel split/distribution  to handle this sutuation. 

Do we have such or better options in Databricks?

Are we able use databricks external table to tie S3 gizip file with partition option ? will this help ?

Do we need to go to upstraem on-prem data push process and ask for dropping files in smaller sizes (eg 4gb) into S3? 😂

I see the standard way for this decribed as 

"Read the Gzip File from S3: Use boto3 to read the gzip file from S3 and load it into your Databricks environment."

How did folks in this community addres these issues ?

Thanks for your guidance.

 

 

 

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @RIDBX

RIDBX
New Contributor II

@Kaniz 

Thanks for weighing in.

I learned that RDD is the predecessor to Dataframes.  What is the reason RDD perform better than Dataframes?

Are they using RDD for new implementations?

Thanks for patiently addressing my questions. tAre you able to tell us, what situation/condition each option applicable ?

Thanks for guidance.

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.