Hi, I noticed that there is quite a significant delay (2 - 10s) between making a change to some file in Repos via Databricks file edit window and propagation of such change to the filesystem. Our engineers and scientists use YAML config files. If the...
When I use the following code: df
.coalesce(1)
.write.format("com.databricks.spark.csv")
.option("header", "true")
.save("/path/mydata.csv")it writes several files, and when used with .mode("overwrite"), it will overwrite everything in th...
Hi Daniel,May I know, how did you fix this issue. I am facing similar issue while writing csv/parquet to blob/adls, it creates a separate folder with the filename and creates a partition file within that folder.I need to write just a file on to the b...
How to check if a file exists in DBFS?Let's write a Python function to check if the file exists or not-------------------------------------------------------------def file_exists(path): try: dbutils.fs.ls(path) return True except ...
I created translations for decoded values and want to save the dictionary object the DBFS for mapping. However, I am unable to access the DBFS without using dbutils or PySpark library. Is there a way to access the DBFS with OS and Pandas Python libra...
I'm using autoloader directory listing mode (without incremental file listing) and sometimes, new files are not picked up and found in the cloud_files-listing.I have found that using the 'cloudfiles.backfillInterval'-option can resolve the detection ...
I've been running a notebook using files-in-repo. Previously this has worked fine. I'm unsure what's changed (I was testing integration with DCS on older runtimes, but don't think I made any persistent changes)--but now it's throwing an error (always...
the documentation states that "drop table":Deletes the table and removes the directory associated with the table from the file system if the table is not EXTERNAL table. An exception is thrown if the table does not exist.In case of an external table...
Hi,There is a way to force delete files after drop the table and don't wait 30 days to see size in S3 decrease?Tables that I dropped related to the dev and staging, I don't want to keep there files for 30 days
I keep getting and error when creating dataframe or steam from certain CSV files where the header contains BOM (Byte Order Mark) character This is the error message:AnalysisException: [RequestId=e09c7c8d-2399-4d6a-84ae-216e6a9f8f6e ErrorClass=INVALI...
Hi @mohamed harraz​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.
According to the documentation, the usage of external locations is preferred over the use of mount points.Unfortunately the basic funtionality to manipulate files seems to be missing.This is my scenario:create a download folder in an external locatio...
The main problem was related to the network configuration of the storage account: Databricks did not have access. Quite strange that it did manage to create folders...Currently dbutils.fs functionality is working.For the zipfile manipulation: that on...
Hey, I get wide table format in csv file. Where each sensor have its own column. I want to store it in Delta Live Streaming Table. But since it is inefficient to process it and storage space, due to varying frequency and sensor amount. I want to tran...
Hi @Simen Småriset​,Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...
We are working on multiline nested ( multilevel).The file is read and flattened using pyspark and the data frame is showing data using display() method. when saving the same dataframe it is giving executor lost failure error.for some files it is givi...
As part of my batch processing I archive a large number of small files received from the source system each day using the dbutils.fs.mv command. This takes hours as dbutils.fs.mv moves the files one at a time.How can I speed this up?
@Dean Lovelace​ You can use multithreading.See example here: https://nealanalytics.com/blog/databricks-spark-jobs-optimization-techniques-multi-threading/
I have uploaded a csv data file and used it in a spark job three months back. I am now running the same spark job with a new cluster created. Program is running properly. I want to know where I can see the previously uploaded csv data file.
@Pranay Gupta​ you can see that in dbfs root directory, based on path you provided in job. please check .please go to data explorer and select below option that i shown in screen shot
Hey,we have an issue in that we can access the SQL files whenever the notebook is in the repo path but whenever the CICD pipeline imports the repo notebooks and SQL files to the shared workspace, we can list the SQL files but can not read them.we cha...
@Nermin Yehia​ yes, as you are moving files to different location manually , just update as can manage permissions in target and that should take care of everything
I have below code which works for the path below but fails for path = azure storage account path. i have enough access to write and update the storage account. I would like to know what wrong am I doing and the path below which works , how can i phys...
@Ankit Kumar​ :The error message you received indicates that the user does not have sufficient permission to access the Azure Blob Storage account. You mentioned that you have enough access to write and update the storage account, but it's possible t...