โ01-11-2024 07:06 PM
reading 130gb file without multi line true it is 6 minutes
my file has data in multi liner .
How to speed up the reading time here ..
i am using below command
โ01-12-2024 02:14 AM
Hi @vishwanath_1, Reading large CSV files with multiline records in Databricks can be time-consuming due to the complexity of parsing multiline records.
Use Explicit Schema: One way to speed up reading a CSV into a DataFrame is by using an explicit sche.... This can help Spark optimize the reading process.
Ensure Proper Quoting: By default, when you use the multiLine option, Spark assumes that you have en.... If your data doesnโt follow this, it might lead to incorrect reading and slow performance.
Consider Data Partitioning: If your data is too large, consider partitioning it. This allows Spark to read and process data in parallel, which can significantly improve performance. However, this might not be applicable if your data needs to be read as a whole due to multiline records.
โ01-12-2024 05:02 AM
Hi @vishwanath_1 , Can you try setting the below config if this resolves the issue?
set spark.databricks.sql.csv.edgeParserSplittable=true;
โ01-21-2024 10:18 PM
By using set spark.databricks.sql.csv.edgeParserSplittable=true;
There is now taking 30 mins lesser time than usual 4 hours.
Any other setting which can be used to make it faster?
โ01-22-2024 08:00 AM
You can also try using Photon. That can also help speed up the read operation.
โ01-18-2024 04:17 AM
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group