i am reading a 130gb csv file with multi line true it is taking 4 hours just to read
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-11-2024 07:06 PM
reading 130gb file without multi line true it is 6 minutes
my file has data in multi liner .
How to speed up the reading time here ..
i am using below command
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-12-2024 05:02 AM
Hi @vishwanath_1 , Can you try setting the below config if this resolves the issue?
set spark.databricks.sql.csv.edgeParserSplittable=true;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-21-2024 10:18 PM
By using set spark.databricks.sql.csv.edgeParserSplittable=true;
There is now taking 30 mins lesser time than usual 4 hours.
Any other setting which can be used to make it faster?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-22-2024 08:00 AM
You can also try using Photon. That can also help speed up the read operation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-22-2024 12:38 AM - edited 09-22-2024 12:43 AM
Hi @Lakshay , where did you find this config ? can you give link ?

