Data Engineering

Forum Posts

Sorted by:

by Data_Engineer3 • Contributor III

11-01-2022 3:26:39 AM

4168 Views
1 replies
7 kudos

Move folder from dbfs location to user workspace directory in azure databricks

I need to move group of files(python or scala file from)or folder from dbfs location to user workspace directory in azure databricks to do testing on file.Its verify difficult to upload each file one by one into the user workspace directory, so is it...

Data Engineering

4168 Views
1 replies
7 kudos

11-01-2022 3:26:39 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

11-03-2022 7:11:39 AM

7 kudos

dbutils.fs.mv or dbutils.fs.cp can help you.

7 kudos

11-03-2022 7:11:39 AM

by Surendra • New Contributor III

04-22-2022 1:38:20 AM

9498 Views
3 replies
6 kudos

Resolved! Databricks notebook is taking 2 hours to write to /dbfs/mnt (blob storage). Same job is taking 8 minutes to write to /dbfs/FileStore. I would like to understand why write performance is different in both cases.

Problem statement:Source file format : .tar.gzAvg size: 10 mbnumber of tar.gz files: 1000Each tar.gz file contails around 20000 csv files.Requirement : Untar the tar.gz file and write CSV files to blob storage / intermediate storage layer for further...

Data Engineering

9498 Views
3 replies
6 kudos

04-22-2022 1:38:20 AM

View Replies

Latest Reply

Surendra
New Contributor III

04-25-2022 6:33:04 AM

6 kudos

@Hubert Dudek Thanks for your suggestions.After creating storage account in same region as databricks I can see that performance is as expected.Now it is clear that issue is with /mnt/ location is being in different region than databricks. I would ...

6 kudos

04-25-2022 6:33:04 AM

2 More Replies

by rajib76 • New Contributor II

03-11-2022 12:20:13 PM

2603 Views
1 replies
2 kudos

Resolved! DBFS with Google Cloud Storage(GCS)

Does DBFS support GCS?

Data Engineering

2603 Views
1 replies
2 kudos

03-11-2022 12:20:13 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-12-2022 2:51:44 AM

2 kudos

Yes you need just to create service account for databricks and than assign storage admin role to bucket. After that you can mount GCS standard way:bucket_name = "<bucket-name>" mount_name = "<mount-name>" dbutils.fs.mount("gs://%s" % bucket_name, "/m...

2 kudos

03-12-2022 2:51:44 AM

by study_community • New Contributor III

01-17-2022 2:57:32 AM

13353 Views
8 replies
3 kudos

Not able to move files from local to dbfs through dbfs CLI

Hi Folks,I have installed and configured databricks CLI in my local machine. I tried to move a local file from my personal computer using dbfs cp to dbfs:/ path. I can see the file is copied from local, and is only visible in local. I am not able to ...

Data Engineering

13353 Views
8 replies
3 kudos

01-17-2022 2:57:32 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-08-2022 4:10:58 AM

3 kudos

Hi, Could you try to save the file from your local machine to dbfs:/FileStore location?# Put local file test.py to dbfs:/FileStore/test.pydbfs cp test.py dbfs:/FileStore/test.py

3 kudos

03-08-2022 4:10:58 AM

7 More Replies

by snoeprol • New Contributor II

10-17-2021 5:25:45 AM

5352 Views
3 replies
2 kudos

Resolved! Unable to open files with python, but filesystem shows files exist

Dear community,I have the following problem:%fs mv '/FileStore/Tree_point_classification-1.dlpk' '/dbfs/mnt/group22/Tree_point_classification-1.dlpk'I have uploaded a file of a ML-model and have transferred it to the directory with When I now check ...

Data Engineering

5352 Views
3 replies
2 kudos

10-17-2021 5:25:45 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

10-18-2021 3:38:36 AM

2 kudos

There is dbfs:/dbfs/ displayed maybe file is in /dbfs/dbfs directory? Please check it and try to open with open('/dbfs/dbfs. You can also use "data" from left menu to check what is in dbfs file system more easily.

2 kudos

10-18-2021 3:38:36 AM

2 More Replies

by hoopla • New Contributor II

08-17-2021 6:52:59 PM

7285 Views
2 replies
1 kudos

Unable to copy mutiple files from file:/tmp to dbfs:/tmp

I am downloading multiple files by web scraping and by default they are stored in /tmp I can copy a single file by providing the filename and path %fs cp file:/tmp/2020-12-14_listings.csv.gz dbfs:/tmp but when I try to copy multiple files I get an ...

Data Engineering

7285 Views
2 replies
1 kudos

08-17-2021 6:52:59 PM

View Replies

Latest Reply

hoopla
New Contributor II

09-16-2021 11:10:01 AM

1 kudos

Thanks DeepakThis is what I have suspected.Hopefully the wild card feature might be available in futureThanks

1 kudos

09-16-2021 11:10:01 AM

1 More Replies

by rami1 • New Contributor III

08-19-2021 12:42:41 PM

1774 Views
1 replies
0 kudos

Missing Databricks Datasets

Hi, I am looking at my Databricks workspace and it looks like I am missing DBFS Databricks-dataset root folder. The dbfs root folders I can view are FileStore, local_disk(),mnt, pipelines and user. Can I mount Databricks-dataset or am I missing some...

Data Engineering

1774 Views
1 replies
0 kudos

08-19-2021 12:42:41 PM

View Replies

Latest Reply

Ryan_Chynoweth
Esteemed Contributor

09-01-2021 1:57:11 PM

0 kudos

If you run the following command do you receive an error? Or do you just get an empty list?dbutils.fs.ls("/databricks-datasets")

0 kudos

09-01-2021 1:57:11 PM

by hravilla • New Contributor

07-19-2021 7:20:08 AM

3601 Views
1 replies
0 kudos

Upload file to DBFS fails with error code 0

When trying to upload to DBFS from local machine getting error as "Error occurred when processing file ... : Server responded with 0 code" DBR 7.3 LTSSpark 3.0.1 Scala 2.12 Uploading the file using the "upload" in the Databricks cloud console, the c...

Data Engineering

3601 Views
1 replies
0 kudos

07-19-2021 7:20:08 AM

View Replies

Latest Reply

PramodNaik
New Contributor II

07-22-2021 12:20:25 AM

0 kudos

Even I am facing the same issue with GCP databricks. I am able to upload files with smaller size. When i tried with 3MB file, databricks chokes. I get the above error. I tried with aws databricks, it works good even for bigger size files.

0 kudos

07-22-2021 12:20:25 AM

by smanickam • New Contributor II

01-01-2020 10:08:46 PM

16928 Views
5 replies
3 kudos

com.databricks.sql.io.FileReadException: Error while reading file dbfs:

I ran the below statement and got the error %python data = sqlContext.read.parquet("/FileStore/tables/ganesh.parquet") display(data) Error: SparkException: Job aborted due to stage failure: Task 0 in stage 27.0 failed 1 times, most recent failure:...

Data Engineering

16928 Views
5 replies
3 kudos

01-01-2020 10:08:46 PM

View Replies

Latest Reply

MatthewSzafir
New Contributor III

02-19-2020 12:41:11 PM

3 kudos

I'm having a similar issue reading a JSON file. It is ~550MB compressed and is on a single line: val cfilename = "c_datafeed_20200128.json.gz" val events = spark.read.json(s"/mnt/c/input1/$cfilename") display(events) The filename is correct and t...

3 kudos

02-19-2020 12:41:11 PM

4 More Replies

by AnaDel_Campo_Me • New Contributor

01-02-2020 8:36:21 AM

11790 Views
2 replies
1 kudos

FileNotFoundError: [Errno 2] No such file or directory or IsADirectoryError: [Errno 21] Is a directory

I have been trying to open a file on the dbfs using all different combinations: if I use the following code: with open("/dbfs/FileStore/df/Downloadedfile.csv", 'r', newline='') as f I get IsADirectoryError: [Errno 21] Is a directory with open("dbfs:...

Data Engineering

11790 Views
2 replies
1 kudos

01-02-2020 8:36:21 AM

View Replies

Latest Reply

paulmark
New Contributor II

02-19-2020 10:15:37 PM

1 kudos

To get rid of this error you can try using Python file exists methods to check that at least python sees the file exists or not. In other words, you can make sure that the user has indeed typed a correct path for a real existing file. If the user do...

1 kudos

02-19-2020 10:15:37 PM

1 More Replies

by NandhaKumar • New Contributor II

11-14-2019 2:34:57 AM

5552 Views
3 replies
0 kudos

How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in --py-files present in dbfs: .

I have created a databricks in azure. I have created a cluster for python 3. I am creating a job using spark-submit parameters. How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in ...

Data Engineering

5552 Views
3 replies
0 kudos

11-14-2019 2:34:57 AM

View Replies

Latest Reply

shyam_9
Databricks Employee

11-17-2019 9:46:20 PM

0 kudos

Hi @Nandha Kumar,please go through the below docs to pass python files as job,https://docs.databricks.com/dev-tools/api/latest/jobs.html#sparkpythontask

0 kudos

11-17-2019 9:46:20 PM

2 More Replies