cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How do I download and unzip datasets from Kaggle into DBFS?

StephanieRivera
Valued Contributor II
Valued Contributor II
 
1 ACCEPTED SOLUTION

Accepted Solutions

StephanieRivera
Valued Contributor II
Valued Contributor II

As shown here on StackOverflow

import opendatasets as od
 
od.download("https://www.kaggle.com/competitions/tlvmc-parkinsons-freezing-gait-prediction/data","/dbfs/FileStore/mypath/")

 The output, when running this, shows first the zip being downloaded. Once the download is complete, it automatically extracts or unzips the files.

Extracting archive /dbfs/FileStore/mypath/tlvmc-parkinsons-freezing-gait-prediction/tlvmc-parkinsons-freezing-gait-prediction.zip to /dbfs/FileStore/mypath/tlvmc-parkinsons-freezing-gait-prediction
 

View solution in original post

4 REPLIES 4

StephanieRivera
Valued Contributor II
Valued Contributor II

As shown here on StackOverflow

import opendatasets as od
 
od.download("https://www.kaggle.com/competitions/tlvmc-parkinsons-freezing-gait-prediction/data","/dbfs/FileStore/mypath/")

 The output, when running this, shows first the zip being downloaded. Once the download is complete, it automatically extracts or unzips the files.

Extracting archive /dbfs/FileStore/mypath/tlvmc-parkinsons-freezing-gait-prediction/tlvmc-parkinsons-freezing-gait-prediction.zip to /dbfs/FileStore/mypath/tlvmc-parkinsons-freezing-gait-prediction
 

Hi @Stephanie Riveraโ€‹. In the Databricks notebook, you can handle this with either Python, Scala or bash.

I have not tried it but below should work as this command works in native shell.

%sh curl some_url --output myfile.zip
 
%sh unzip myfile.zip -d "some directory"

karthik_p
Esteemed Contributor

@Stephanie Riveraโ€‹ please download u r kaggle file and unzip, if it is less than 100mb, you can follow below steps to directly to upload https://docs.databricks.com/ingestion/add-data/upload-data.html else please take below example and replace u r zip url and target

%sh curl https://resources.lendingclub.com/LoanStats3a.csv.zip --output /tmp/LoanStats3a.csv.zip

unzip /tmp/LoanStats3a.csv.zip

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, You can refer to https://docs.databricks.com/files/unzip-files.html. You can curl the file you want and then it can be unzipped as mentioned in the doc.

Please let us know if this helps.

Also, please tag @Debayan with your next update which will notify me.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.