cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Need help with loading 11 TB data into spark dataframe using managed gcp databricks.

sh23
New Contributor II

I am using managed databricks on gcp. I have 11TB of data with 5B rows. Data from source is not partitioned. I'm having trouble loading the data into dataframe and do further data processing. I have tried couple of executors configuration , none of them seem to work. Can you guide me to best practise to load huge data into dataframe.

Data is in nested json format. Schema is not consistence across document. Source of data is mongoDB.

Things which I have tried already :

n1-standard-4 executors 20 - Job aborted after 2+ hours

n1-standard-8 executors 8 - Job aborted after 2 + hours

I know these are not best practises but I also tried setting the below spark config:

spark.executor.memory 0

spark.driver.memory 0

spark.driver.maxResultSize 0

I want to know what should be the right executor size, machine type , spark config to be used for my use case . Any suggestion which helps us save credits would be an added advantage. We plan to run data quality check for this data so we will be looking for reading the entire dataset.

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Shruti Sโ€‹, These articles can help you with data management with Databricks.

View solution in original post

2 REPLIES 2

Kaniz
Community Manager
Community Manager

Hi @Shruti Sโ€‹, These articles can help you with data management with Databricks.

Hi @Shruti Sโ€‹,

Just a friendly follow-up. Are you getting any errors? please share the error stack trace so we can help you to narrow down the RCA for this issue.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.