cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Tonny_Stark
by New Contributor III
  • 11781 Views
  • 7 replies
  • 1 kudos

FileNotFoundError: [Errno 2] No such file or directory: when I try to unzip .tar or .zip files it gives me this error

Hello, how are you? I have a small problem. I need to unzip some .zip, tar files. and gz inside these may have multiple files trying to unzip the .zip files i got this errorFileNotFoundError: [Errno 2] No such file or directory: but the files are in ...

error
  • 11781 Views
  • 7 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Alfredo Vallejos​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feed...

  • 1 kudos
6 More Replies
elgeo
by Valued Contributor II
  • 3552 Views
  • 6 replies
  • 8 kudos

Clean up _delta_log files

Hello experts. We are trying to clarify how to clean up the large amount of files that are being accumulated in the _delta_log folder (json, crc and checkpoint files). We went through the related posts in the forum and followed the below:SET spark.da...

  • 3552 Views
  • 6 replies
  • 8 kudos
Latest Reply
Brad
Contributor II
  • 8 kudos

Awesome, thanks for response.

  • 8 kudos
5 More Replies
my_community2
by New Contributor III
  • 14071 Views
  • 9 replies
  • 6 kudos

Resolved! dropping a managed table does not remove the underlying files

the documentation states that "drop table":Deletes the table and removes the directory associated with the table from the file system if the table is not EXTERNAL  table. An exception is thrown if the table does not exist.In case of an external table...

image.png
  • 14071 Views
  • 9 replies
  • 6 kudos
Latest Reply
MajdSAAD_7953
New Contributor II
  • 6 kudos

Hi,There is a way to force delete files after drop the table and don't wait 30 days to see size in S3 decrease?Tables that I dropped related to the dev and staging, I don't want to keep there files for 30 days 

  • 6 kudos
8 More Replies
FabriceDeseyn
by Contributor
  • 8151 Views
  • 6 replies
  • 6 kudos

Resolved! What does autoloader's cloudfiles.backfillInterval do?

I'm using autoloader directory listing mode (without incremental file listing) and sometimes, new files are not picked up and found in the cloud_files-listing.I have found that using the 'cloudfiles.backfillInterval'-option can resolve the detection ...

image
  • 8151 Views
  • 6 replies
  • 6 kudos
Latest Reply
822025
New Contributor II
  • 6 kudos

If we set the backfill to 1 week, will it run only 1ce a week or rather it will look for old files not processed in every trigger ?For eg :- if we set it to 1 day and the job runs every hour, then will it look for files in past 24 hours on a sliding ...

  • 6 kudos
5 More Replies
Jiri_Koutny
by New Contributor III
  • 6212 Views
  • 11 replies
  • 3 kudos

Delay in files update on filesystem

Hi, I noticed that there is quite a significant delay (2 - 10s) between making a change to some file in Repos via Databricks file edit window and propagation of such change to the filesystem. Our engineers and scientists use YAML config files. If the...

  • 6212 Views
  • 11 replies
  • 3 kudos
Latest Reply
Irka
New Contributor II
  • 3 kudos

Is there a solution to this?BTW, the "ls" command trick didn't work for me

  • 3 kudos
10 More Replies
chandan_a_v
by Valued Contributor
  • 2373 Views
  • 2 replies
  • 1 kudos

Can't import local files under repo

I have a yaml file inside one of the sub dir in Databricks, I have appended the repo path to sys. Still I can't access this file. https://docs.databricks.com/_static/notebooks/files-in-repos.html

image
  • 2373 Views
  • 2 replies
  • 1 kudos
Latest Reply
Abhishek10745
New Contributor III
  • 1 kudos

Hello @chandan_a_v ,were you able to solve this issue?I am also experiencing the same thing where i cannot move file with extension .yml from repo folder to shared workspace folder.As per documentation, this is the limitation or functionality of data...

  • 1 kudos
1 More Replies
Danielsg94
by New Contributor II
  • 34156 Views
  • 5 replies
  • 1 kudos

Resolved! How can I write a single file to a blob storage using a Python notebook, to a folder with other data?

When I use the following code: df .coalesce(1) .write.format("com.databricks.spark.csv") .option("header", "true") .save("/path/mydata.csv")it writes several files, and when used with .mode("overwrite"), it will overwrite everything in th...

  • 34156 Views
  • 5 replies
  • 1 kudos
Latest Reply
Simha
New Contributor II
  • 1 kudos

Hi Daniel,May I know, how did you fix this issue. I am facing similar issue while writing csv/parquet to blob/adls, it creates a separate folder with the filename and creates a partition file within that folder.I need to write just a file on to the b...

  • 1 kudos
4 More Replies
SimhadriRaju
by New Contributor
  • 52780 Views
  • 7 replies
  • 0 kudos

How to check file exists in databricks

I Have a while loop there i have to check a file exists or not if exists read the file in a data frame else go to another file

  • 52780 Views
  • 7 replies
  • 0 kudos
Latest Reply
Amit_Dass
New Contributor II
  • 0 kudos

How to check if a file exists in DBFS?Let's write a Python function to check if the file exists or not-------------------------------------------------------------def file_exists(path):    try:        dbutils.fs.ls(path)        return True    except ...

  • 0 kudos
6 More Replies
MattPython
by New Contributor
  • 23651 Views
  • 4 replies
  • 0 kudos

How do you read files from the DBFS with OS and Pandas Python libraries?

I created translations for decoded values and want to save the dictionary object the DBFS for mapping. However, I am unable to access the DBFS without using dbutils or PySpark library. Is there a way to access the DBFS with OS and Pandas Python libra...

image.png image image image
  • 23651 Views
  • 4 replies
  • 0 kudos
Latest Reply
User16789202230
Databricks Employee
  • 0 kudos

db_path = 'file:///Workspace/Users/l<xxxxx>@databricks.com/TITANIC_DEMO/tested.csv' df = spark.read.csv(db_path, header = "True", inferSchema="True")

  • 0 kudos
3 More Replies
jfarmer
by New Contributor II
  • 5963 Views
  • 3 replies
  • 1 kudos

PermissionError / Operation not Permitted with Files-in-Repos

I've been running a notebook using files-in-repo. Previously this has worked fine. I'm unsure what's changed (I was testing integration with DCS on older runtimes, but don't think I made any persistent changes)--but now it's throwing an error (always...

image image
  • 5963 Views
  • 3 replies
  • 1 kudos
Latest Reply
_carleto_
New Contributor II
  • 1 kudos

Hi @jfarmer , did you solved this issue? I'm having exactly the same challenge.Thanks!

  • 1 kudos
2 More Replies
harraz
by New Contributor III
  • 2208 Views
  • 1 replies
  • 0 kudos

Issues loading files csv files that contain BOM (Byte Order Mark) character

I keep getting and error when creating dataframe or steam from certain CSV files where the header contains BOM (Byte Order Mark) character  This is the error message:AnalysisException: [RequestId=e09c7c8d-2399-4d6a-84ae-216e6a9f8f6e ErrorClass=INVALI...

  • 2208 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @mohamed harraz​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 0 kudos
Tjomme
by New Contributor III
  • 12303 Views
  • 7 replies
  • 8 kudos

Resolved! How to manipulate files in an external location?

According to the documentation, the usage of external locations is preferred over the use of mount points.Unfortunately the basic funtionality to manipulate files seems to be missing.This is my scenario:create a download folder in an external locatio...

  • 12303 Views
  • 7 replies
  • 8 kudos
Latest Reply
Tjomme
New Contributor III
  • 8 kudos

The main problem was related to the network configuration of the storage account: Databricks did not have access. Quite strange that it did manage to create folders...Currently dbutils.fs functionality is working.For the zipfile manipulation: that on...

  • 8 kudos
6 More Replies
simensma
by New Contributor II
  • 1871 Views
  • 3 replies
  • 1 kudos

Resolved! Autoload files in wide table format, but store it unpivot in Streaming Table

Hey, I get wide table format in csv file. Where each sensor have its own column. I want to store it in Delta Live Streaming Table. But since it is inefficient to process it and storage space, due to varying frequency and sensor amount. I want to tran...

  • 1871 Views
  • 3 replies
  • 1 kudos
Latest Reply
Vartika
Databricks Employee
  • 1 kudos

Hi @Simen Småriset​,Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

  • 1 kudos
2 More Replies
konda1
by New Contributor
  • 1033 Views
  • 0 replies
  • 0 kudos

Getting Executor lost due to stage failure error on writing data frame to a delta table or any file like parquet or csv or avro

We are working on multiline nested ( multilevel).The file is read and flattened using pyspark and the data frame is showing data using display() method. when saving the same dataframe it is giving executor lost failure error.for some files it is givi...

  • 1033 Views
  • 0 replies
  • 0 kudos
Dean_Lovelace
by New Contributor III
  • 4925 Views
  • 1 replies
  • 1 kudos

Resolved! Efficiently move multiple files with dbutils.fs.mv command on abfs storage

As part of my batch processing I archive a large number of small files received from the source system each day using the dbutils.fs.mv command. This takes hours as dbutils.fs.mv moves the files one at a time.How can I speed this up?

  • 4925 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@Dean Lovelace​ You can use multithreading.See example here: https://nealanalytics.com/blog/databricks-spark-jobs-optimization-techniques-multi-threading/

  • 1 kudos
Labels