Databricks Community

AzureDatabricks · 11-21-2021

How we can read files from azure blob storage and process parallel in databricks using pyspark.As of now we are reading all 10 files at a time into dataframe and flattening it.Thanks & Regards,Sujata

AzureDatabricks · 11-21-2021

Truncate False not working in Delta table. df_delta.show(df_delta.count(),False)Computer size Single Node - Standard_F4S - 8GB Memory, 4 coresHow much max data we can persist in Delta table in Parquet file and How fast we can retrieve data.

AzureDatabricks · 11-21-2021

How we can persist 300 million records? What is the best option to persist data databricks hive metastore/Azure storage/Delta table?What is the limitations we have for deltatables of databricks in terms of data?We have usecase where testers should be...

AzureDatabricks · 11-22-2021

thank you!!!

AzureDatabricks · 11-22-2021

thank you !!!

AzureDatabricks · 11-22-2021

can you provide us sample to read read json files parallel from blob. We are reading all files one by one from directory it is taking time to load into data frameThank you

AzureDatabricks · 11-22-2021

can you please let us know limit of data that can be store in Delta table/Hive table or in Parquet file

AzureDatabricks · 11-22-2021

thank you !!!

Databricks Community

User Stats

User Activity

Parallel processing of json files in databricks pyspark

Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded

Can we store 300 million records and what is the preferable compute type and config?

Re: Can we store 300 million records and what is the preferable compute type and config?

Re: Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded

Re: Parallel processing of json files in databricks pyspark

Re: Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded

Re: Need to see all the records in DeltaTable. Exception - java.lang.OutOfMemoryError: GC overhead limit exceeded