cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Unable to unpack or read rar file

TechExplorer
New Contributor II

Hi everyone,

I'm encountering an issue with the following code when trying to unpack or read a RAR file in Databricks: 

with rarfile.RarFile(s3_path) as rf:
    for file_info in rf.infolist():
        with rf.open(file_info) as file:
            file_content = io.BytesIO(file.read())
            df = pd.read_csv(file_content, nrows=10)
            display(df)

The error message I receive is: "cannot find working tool".

Has anyone successfully managed to unpack or read a RAR file using rarfile.RarFile in Databricks? Any insights or alternative approaches would be greatly appreciated!

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions

Walter_C
Databricks Employee
Databricks Employee

The direct use of rarfile.RarFile in Databricks often fails with errors such as "cannot find working tool" because the rarfile Python library requires an underlying system utility (unrar or unar) to be available on the host machine. In typical Databricks environments, especially on shared or managed clusters, these system utilities are not pre-installed, which is the root cause of your error.

 

  • Install unrar on your cluster (only possible on single-user/dedicated clusters): %sh apt-get update && apt-get install -y unrar
  • Once installed, confirm the unrar command is available: %sh which unrar
  • Then rerun your Python code using rarfile.RarFile. The rarfile package should now find the system tool and proceed to extract files

 

View solution in original post

3 REPLIES 3

Walter_C
Databricks Employee
Databricks Employee

The direct use of rarfile.RarFile in Databricks often fails with errors such as "cannot find working tool" because the rarfile Python library requires an underlying system utility (unrar or unar) to be available on the host machine. In typical Databricks environments, especially on shared or managed clusters, these system utilities are not pre-installed, which is the root cause of your error.

 

  • Install unrar on your cluster (only possible on single-user/dedicated clusters): %sh apt-get update && apt-get install -y unrar
  • Once installed, confirm the unrar command is available: %sh which unrar
  • Then rerun your Python code using rarfile.RarFile. The rarfile package should now find the system tool and proceed to extract files

 

TechExplorer
New Contributor II

Thank you! this solved my problem ๐Ÿ˜

Upendra_Dwivedi
Contributor

Hi @Walter_C,

I am also using this unrar utility but the problem it is a proprietary software and i am working for a client and this license could cause issues. What is the alternative to unrar so that we eliminate the risk of any legal compliance.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now