Loading multiple gz files from ADLS to Delta Lake/Delta table in ADB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ā04-28-2023 03:21 PM
I have several gz files (file.csv.gz) in a ADLS folder which are of pretty big size. All of these files are extractd from the same base table so it has the similar data but of different dates. How can I transfer them in delta lake/delta table. We would like to do quick POC on how faster Databricks is in reading the data from these files in delta as other platform(s) couldn't scale it well.
Any help would be much appreciated.
Krish.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ā05-09-2023 06:03 AM
Hi, You can read GZ files through spark.
https://stackoverflow.com/questions/42761912/how-to-read-gz-compressed-file-by-pyspark
Please let us know if this helps. Also, please tag @Debayanā with your next comment so that I will get notified. Thank you!

