Data Engineering

Forum Posts

Sorted by:

by Tonny_Stark • New Contributor III

04-05-2023 4:06:02 PM

12566 Views
7 replies
1 kudos

FileNotFoundError: [Errno 2] No such file or directory: when I try to unzip .tar or .zip files it gives me this error

Hello, how are you? I have a small problem. I need to unzip some .zip, tar files. and gz inside these may have multiple files trying to unzip the .zip files i got this errorFileNotFoundError: [Errno 2] No such file or directory: but the files are in ...

Data Engineering

12566 Views
7 replies
1 kudos

04-05-2023 4:06:02 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-07-2023 11:46:17 PM

1 kudos

Hi @Alfredo Vallejos Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feed...

1 kudos

04-07-2023 11:46:17 PM

6 More Replies

by elgeo • Valued Contributor II

10-31-2022 5:46:17 AM

4165 Views
6 replies
8 kudos

Clean up _delta_log files

Hello experts. We are trying to clarify how to clean up the large amount of files that are being accumulated in the _delta_log folder (json, crc and checkpoint files). We went through the related posts in the forum and followed the below:SET spark.da...

Data Engineering

4165 Views
6 replies
8 kudos

10-31-2022 5:46:17 AM

View Replies

Latest Reply

Brad
Contributor II

10-13-2024 4:39:17 PM

8 kudos

Awesome, thanks for response.

8 kudos

10-13-2024 4:39:17 PM

5 More Replies

by my_community2 • New Contributor III

09-13-2022 5:47:13 PM

15687 Views
9 replies
6 kudos

Resolved! dropping a managed table does not remove the underlying files

the documentation states that "drop table":Deletes the table and removes the directory associated with the table from the file system if the table is not EXTERNAL table. An exception is thrown if the table does not exist.In case of an external table...

Data Engineering

15687 Views
9 replies
6 kudos

09-13-2022 5:47:13 PM

View Replies

Latest Reply

MajdSAAD_7953
New Contributor II

09-14-2023 1:21:18 AM

6 kudos

Hi,There is a way to force delete files after drop the table and don't wait 30 days to see size in S3 decrease?Tables that I dropped related to the dev and staging, I don't want to keep there files for 30 days

6 kudos

09-14-2023 1:21:18 AM

8 More Replies

by FabriceDeseyn • Contributor

03-15-2023 2:52:16 AM

9065 Views
6 replies
6 kudos

Resolved! What does autoloader's cloudfiles.backfillInterval do?

I'm using autoloader directory listing mode (without incremental file listing) and sometimes, new files are not picked up and found in the cloud_files-listing.I have found that using the 'cloudfiles.backfillInterval'-option can resolve the detection ...

Data Engineering

9065 Views
6 replies
6 kudos

03-15-2023 2:52:16 AM

View Replies

Latest Reply

822025
New Contributor II

08-30-2024 8:19:18 AM

6 kudos

If we set the backfill to 1 week, will it run only 1ce a week or rather it will look for old files not processed in every trigger ?For eg :- if we set it to 1 day and the job runs every hour, then will it look for files in past 24 hours on a sliding ...

6 kudos

08-30-2024 8:19:18 AM

5 More Replies

by Jiri_Koutny • New Contributor III

11-25-2021 4:47:28 AM

6666 Views
11 replies
3 kudos

Delay in files update on filesystem

Hi, I noticed that there is quite a significant delay (2 - 10s) between making a change to some file in Repos via Databricks file edit window and propagation of such change to the filesystem. Our engineers and scientists use YAML config files. If the...

Data Engineering

6666 Views
11 replies
3 kudos

11-25-2021 4:47:28 AM

View Replies

Latest Reply

Irka
New Contributor II

06-25-2024 9:15:41 AM

3 kudos

Is there a solution to this?BTW, the "ls" command trick didn't work for me

3 kudos

06-25-2024 9:15:41 AM

10 More Replies

by chandan_a_v • Valued Contributor

08-18-2022 1:25:35 AM

2550 Views
2 replies
1 kudos

Can't import local files under repo

I have a yaml file inside one of the sub dir in Databricks, I have appended the repo path to sys. Still I can't access this file. https://docs.databricks.com/_static/notebooks/files-in-repos.html

Data Engineering

2550 Views
2 replies
1 kudos

08-18-2022 1:25:35 AM

View Replies

Latest Reply

Abhishek10745
New Contributor III

05-16-2024 12:27:39 AM

1 kudos

Hello @chandan_a_v ,were you able to solve this issue?I am also experiencing the same thing where i cannot move file with extension .yml from repo folder to shared workspace folder.As per documentation, this is the limitation or functionality of data...

1 kudos

05-16-2024 12:27:39 AM

1 More Replies

by Danielsg94 • New Contributor II

08-24-2022 12:59:47 AM

35118 Views
5 replies
1 kudos

Resolved! How can I write a single file to a blob storage using a Python notebook, to a folder with other data?

When I use the following code: df .coalesce(1) .write.format("com.databricks.spark.csv") .option("header", "true") .save("/path/mydata.csv")it writes several files, and when used with .mode("overwrite"), it will overwrite everything in th...

Data Engineering

35118 Views
5 replies
1 kudos

08-24-2022 12:59:47 AM

View Replies

Latest Reply

Simha
New Contributor II

01-17-2024 4:37:17 AM

1 kudos

Hi Daniel,May I know, how did you fix this issue. I am facing similar issue while writing csv/parquet to blob/adls, it creates a separate folder with the filename and creates a partition file within that folder.I need to write just a file on to the b...

1 kudos

01-17-2024 4:37:17 AM

4 More Replies

by SimhadriRaju • New Contributor

07-25-2019 12:34:46 AM

54965 Views
7 replies
0 kudos

How to check file exists in databricks

I Have a while loop there i have to check a file exists or not if exists read the file in a data frame else go to another file

Data Engineering

54965 Views
7 replies
0 kudos

07-25-2019 12:34:46 AM

View Replies

Latest Reply

Amit_Dass
New Contributor II

01-11-2024 4:31:01 PM

0 kudos

How to check if a file exists in DBFS?Let's write a Python function to check if the file exists or not-------------------------------------------------------------def file_exists(path): try: dbutils.fs.ls(path) return True except ...

0 kudos

01-11-2024 4:31:01 PM

6 More Replies

by MattPython • New Contributor

02-01-2023 5:20:15 AM

26518 Views
4 replies
0 kudos

How do you read files from the DBFS with OS and Pandas Python libraries?

I created translations for decoded values and want to save the dictionary object the DBFS for mapping. However, I am unable to access the DBFS without using dbutils or PySpark library. Is there a way to access the DBFS with OS and Pandas Python libra...

Data Engineering

26518 Views
4 replies
0 kudos

02-01-2023 5:20:15 AM

View Replies

Latest Reply

User16789202230
Databricks Employee

12-21-2023 2:38:02 AM

0 kudos

db_path = 'file:///Workspace/Users/l<xxxxx>@databricks.com/TITANIC_DEMO/tested.csv' df = spark.read.csv(db_path, header = "True", inferSchema="True")

0 kudos

12-21-2023 2:38:02 AM

3 More Replies

by jfarmer • New Contributor II

01-21-2023 11:18:59 AM

6441 Views
3 replies
1 kudos

PermissionError / Operation not Permitted with Files-in-Repos

I've been running a notebook using files-in-repo. Previously this has worked fine. I'm unsure what's changed (I was testing integration with DCS on older runtimes, but don't think I made any persistent changes)--but now it's throwing an error (always...

Data Engineering

6441 Views
3 replies
1 kudos

01-21-2023 11:18:59 AM

View Replies

Latest Reply

_carleto_
New Contributor II

10-17-2023 7:14:16 AM

1 kudos

Hi @jfarmer , did you solved this issue? I'm having exactly the same challenge.Thanks!

1 kudos

10-17-2023 7:14:16 AM

2 More Replies

by harraz • New Contributor III

06-22-2023 2:53:40 AM

2374 Views
1 replies
0 kudos

Issues loading files csv files that contain BOM (Byte Order Mark) character

I keep getting and error when creating dataframe or steam from certain CSV files where the header contains BOM (Byte Order Mark) character This is the error message:AnalysisException: [RequestId=e09c7c8d-2399-4d6a-84ae-216e6a9f8f6e ErrorClass=INVALI...

Data Engineering

2374 Views
1 replies
0 kudos

06-22-2023 2:53:40 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-22-2023 10:19:15 PM

0 kudos

Hi @mohamed harraz Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

0 kudos

06-22-2023 10:19:15 PM

by Tjomme • New Contributor III

06-15-2023 5:56:28 AM

13375 Views
7 replies
8 kudos

Resolved! How to manipulate files in an external location?

According to the documentation, the usage of external locations is preferred over the use of mount points.Unfortunately the basic funtionality to manipulate files seems to be missing.This is my scenario:create a download folder in an external locatio...

Data Engineering

13375 Views
7 replies
8 kudos

06-15-2023 5:56:28 AM

View Replies

Latest Reply

Tjomme
New Contributor III

06-19-2023 5:50:12 AM

8 kudos

The main problem was related to the network configuration of the storage account: Databricks did not have access. Quite strange that it did manage to create folders...Currently dbutils.fs functionality is working.For the zipfile manipulation: that on...

8 kudos

06-19-2023 5:50:12 AM

6 More Replies

by simensma • New Contributor II

06-01-2023 1:54:22 AM

2019 Views
3 replies
1 kudos

Resolved! Autoload files in wide table format, but store it unpivot in Streaming Table

Hey, I get wide table format in csv file. Where each sensor have its own column. I want to store it in Delta Live Streaming Table. But since it is inefficient to process it and storage space, due to varying frequency and sensor amount. I want to tran...

Data Engineering

2019 Views
3 replies
1 kudos

06-01-2023 1:54:22 AM

View Replies

Latest Reply

Vartika
Databricks Employee

06-09-2023 3:58:32 AM

1 kudos

Hi @Simen Småriset,Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

1 kudos

06-09-2023 3:58:32 AM

2 More Replies

by konda1 • New Contributor

05-31-2023 3:27:13 AM

1103 Views
0 replies
0 kudos

Getting Executor lost due to stage failure error on writing data frame to a delta table or any file like parquet or csv or avro

We are working on multiline nested ( multilevel).The file is read and flattened using pyspark and the data frame is showing data using display() method. when saving the same dataframe it is giving executor lost failure error.for some files it is givi...

Data Engineering

1103 Views
0 replies
0 kudos

05-31-2023 3:27:13 AM

by Dean_Lovelace • New Contributor III

05-17-2023 1:36:23 AM

5143 Views
1 replies
1 kudos

Resolved! Efficiently move multiple files with dbutils.fs.mv command on abfs storage

As part of my batch processing I archive a large number of small files received from the source system each day using the dbutils.fs.mv command. This takes hours as dbutils.fs.mv moves the files one at a time.How can I speed this up?

Data Engineering

5143 Views
1 replies
1 kudos

05-17-2023 1:36:23 AM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

05-17-2023 2:02:06 AM

1 kudos

@Dean Lovelace You can use multithreading.See example here: https://nealanalytics.com/blog/databricks-spark-jobs-optimization-techniques-multi-threading/

1 kudos

05-17-2023 2:02:06 AM