cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to read a compressed file in spark if the filename does not include the file extension for that compression format?

Anonymous
Not applicable

For example, let's say I have a file called 

some-file

, which is a gzipped text file. If I try 

spark.read.text('some-file')

, it will return a bunch of gibberish since it doesn't know that the file is gzipped. I'm looking to manually tell spark the file is gzipped and decode it based on that. I did some searching but don't see a good answer to the question or the answers say you can't.

2 REPLIES 2

sean_owen
Databricks Employee
Databricks Employee

Other than renaming the file, I'm not sure you can do much - figuring out how to read the compressed file happens a bit below Spark, in Hadoop APIs, and looking at the source it seems to definitely key off the file name.

If they aren't big files, you can load the bytes of the files with .load("binaryFiles") and then apply a UDF that gunzips the file with a library, and then interpret the bytes as a string. In Scala you can then interpret that as a Dataset[String] and actually pass it to things like spark.read.csv; not sure you can do the same in Python. But that at least gets you the whole text of each file.

Francie
New Contributor II

The community is field for the approval of the terms. The struggle of a great site is recommend for the norms. The value is suggested for the top of the vital paths for the finding members.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group