cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

chandan_a_v
by Valued Contributor
  • 1257 Views
  • 2 replies
  • 1 kudos

Can't import local files under repo

I have a yaml file inside one of the sub dir in Databricks, I have appended the repo path to sys. Still I can't access this file. https://docs.databricks.com/_static/notebooks/files-in-repos.html

image
  • 1257 Views
  • 2 replies
  • 1 kudos
Latest Reply
Abhishek10745
New Contributor III
  • 1 kudos

Hello @chandan_a_v ,were you able to solve this issue?I am also experiencing the same thing where i cannot move file with extension .yml from repo folder to shared workspace folder.As per documentation, this is the limitation or functionality of data...

  • 1 kudos
1 More Replies
Jiri_Koutny
by New Contributor III
  • 3076 Views
  • 10 replies
  • 3 kudos

Delay in files update on filesystem

Hi, I noticed that there is quite a significant delay (2 - 10s) between making a change to some file in Repos via Databricks file edit window and propagation of such change to the filesystem. Our engineers and scientists use YAML config files. If the...

  • 3076 Views
  • 10 replies
  • 3 kudos
Latest Reply
DaniyarZ
New Contributor II
  • 3 kudos

There is a trick: if you execute "%sh ls" command, it forces update of filesystem immediately

  • 3 kudos
9 More Replies
Danielsg94
by New Contributor II
  • 15634 Views
  • 6 replies
  • 2 kudos

Resolved! How can I write a single file to a blob storage using a Python notebook, to a folder with other data?

When I use the following code: df .coalesce(1) .write.format("com.databricks.spark.csv") .option("header", "true") .save("/path/mydata.csv")it writes several files, and when used with .mode("overwrite"), it will overwrite everything in th...

  • 15634 Views
  • 6 replies
  • 2 kudos
Latest Reply
Simha
New Contributor II
  • 2 kudos

Hi Daniel,May I know, how did you fix this issue. I am facing similar issue while writing csv/parquet to blob/adls, it creates a separate folder with the filename and creates a partition file within that folder.I need to write just a file on to the b...

  • 2 kudos
5 More Replies
SimhadriRaju
by New Contributor
  • 42076 Views
  • 7 replies
  • 0 kudos

How to check file exists in databricks

I Have a while loop there i have to check a file exists or not if exists read the file in a data frame else go to another file

  • 42076 Views
  • 7 replies
  • 0 kudos
Latest Reply
Amit_Dass
New Contributor II
  • 0 kudos

How to check if a file exists in DBFS?Let's write a Python function to check if the file exists or not-------------------------------------------------------------def file_exists(path):    try:        dbutils.fs.ls(path)        return True    except ...

  • 0 kudos
6 More Replies
MattPython
by New Contributor
  • 11909 Views
  • 4 replies
  • 0 kudos

How do you read files from the DBFS with OS and Pandas Python libraries?

I created translations for decoded values and want to save the dictionary object the DBFS for mapping. However, I am unable to access the DBFS without using dbutils or PySpark library. Is there a way to access the DBFS with OS and Pandas Python libra...

image.png image image image
  • 11909 Views
  • 4 replies
  • 0 kudos
Latest Reply
User16789202230
New Contributor II
  • 0 kudos

db_path = 'file:///Workspace/Users/l<xxxxx>@databricks.com/TITANIC_DEMO/tested.csv' df = spark.read.csv(db_path, header = "True", inferSchema="True")

  • 0 kudos
3 More Replies
FabriceDeseyn
by Contributor
  • 4024 Views
  • 5 replies
  • 6 kudos

Resolved! What does autoloader's cloudfiles.backfillInterval do?

I'm using autoloader directory listing mode (without incremental file listing) and sometimes, new files are not picked up and found in the cloud_files-listing.I have found that using the 'cloudfiles.backfillInterval'-option can resolve the detection ...

image
  • 4024 Views
  • 5 replies
  • 6 kudos
Latest Reply
Kiranrathod
New Contributor III
  • 6 kudos

Hi @Lakshay Goel​ ,where can I set the backFillInterval property in the code? Do you have any sample codes for this use case?

  • 6 kudos
4 More Replies
jfarmer
by New Contributor II
  • 3544 Views
  • 3 replies
  • 1 kudos

PermissionError / Operation not Permitted with Files-in-Repos

I've been running a notebook using files-in-repo. Previously this has worked fine. I'm unsure what's changed (I was testing integration with DCS on older runtimes, but don't think I made any persistent changes)--but now it's throwing an error (always...

image image
  • 3544 Views
  • 3 replies
  • 1 kudos
Latest Reply
_carleto_
New Contributor II
  • 1 kudos

Hi @jfarmer , did you solved this issue? I'm having exactly the same challenge.Thanks!

  • 1 kudos
2 More Replies
my_community2
by New Contributor III
  • 7625 Views
  • 8 replies
  • 6 kudos

Resolved! dropping a managed table does not remove the underlying files

the documentation states that "drop table":Deletes the table and removes the directory associated with the table from the file system if the table is not EXTERNAL  table. An exception is thrown if the table does not exist.In case of an external table...

image.png
  • 7625 Views
  • 8 replies
  • 6 kudos
Latest Reply
MajdSAAD_7953
New Contributor II
  • 6 kudos

Hi,There is a way to force delete files after drop the table and don't wait 30 days to see size in S3 decrease?Tables that I dropped related to the dev and staging, I don't want to keep there files for 30 days 

  • 6 kudos
7 More Replies
harraz
by New Contributor III
  • 1285 Views
  • 1 replies
  • 0 kudos

Issues loading files csv files that contain BOM (Byte Order Mark) character

I keep getting and error when creating dataframe or steam from certain CSV files where the header contains BOM (Byte Order Mark) character  This is the error message:AnalysisException: [RequestId=e09c7c8d-2399-4d6a-84ae-216e6a9f8f6e ErrorClass=INVALI...

  • 1285 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @mohamed harraz​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 0 kudos
Tjomme
by New Contributor III
  • 6486 Views
  • 7 replies
  • 8 kudos

Resolved! How to manipulate files in an external location?

According to the documentation, the usage of external locations is preferred over the use of mount points.Unfortunately the basic funtionality to manipulate files seems to be missing.This is my scenario:create a download folder in an external locatio...

  • 6486 Views
  • 7 replies
  • 8 kudos
Latest Reply
Tjomme
New Contributor III
  • 8 kudos

The main problem was related to the network configuration of the storage account: Databricks did not have access. Quite strange that it did manage to create folders...Currently dbutils.fs functionality is working.For the zipfile manipulation: that on...

  • 8 kudos
6 More Replies
simensma
by New Contributor II
  • 917 Views
  • 3 replies
  • 1 kudos

Resolved! Autoload files in wide table format, but store it unpivot in Streaming Table

Hey, I get wide table format in csv file. Where each sensor have its own column. I want to store it in Delta Live Streaming Table. But since it is inefficient to process it and storage space, due to varying frequency and sensor amount. I want to tran...

  • 917 Views
  • 3 replies
  • 1 kudos
Latest Reply
Vartika
Moderator
  • 1 kudos

Hi @Simen Småriset​,Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

  • 1 kudos
2 More Replies
konda1
by New Contributor
  • 582 Views
  • 0 replies
  • 0 kudos

Getting Executor lost due to stage failure error on writing data frame to a delta table or any file like parquet or csv or avro

We are working on multiline nested ( multilevel).The file is read and flattened using pyspark and the data frame is showing data using display() method. when saving the same dataframe it is giving executor lost failure error.for some files it is givi...

  • 582 Views
  • 0 replies
  • 0 kudos
Dean_Lovelace
by New Contributor III
  • 2688 Views
  • 1 replies
  • 1 kudos

Resolved! Efficiently move multiple files with dbutils.fs.mv command on abfs storage

As part of my batch processing I archive a large number of small files received from the source system each day using the dbutils.fs.mv command. This takes hours as dbutils.fs.mv moves the files one at a time.How can I speed this up?

  • 2688 Views
  • 1 replies
  • 1 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 1 kudos

@Dean Lovelace​ You can use multithreading.See example here: https://nealanalytics.com/blog/databricks-spark-jobs-optimization-techniques-multi-threading/

  • 1 kudos
Prannu
by New Contributor II
  • 1094 Views
  • 2 replies
  • 1 kudos

Location of files previously uploaded on DBFS

I have uploaded a csv data file and used it in a spark job three months back. I am now running the same spark job with a new cluster created. Program is running properly. I want to know where I can see the previously uploaded csv data file.

  • 1094 Views
  • 2 replies
  • 1 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 1 kudos

@Pranay Gupta​ you can see that in dbfs root directory, based on path you provided in job. please check .please go to data explorer and select below option that i shown in screen shot

  • 1 kudos
1 More Replies
nyehia
by Contributor
  • 5161 Views
  • 19 replies
  • 1 kudos

Can not access SQL files in the Shared workspace

Hey,we have an issue in that we can access the SQL files whenever the notebook is in the repo path but whenever the CICD pipeline imports the repo notebooks and SQL files to the shared workspace, we can list the SQL files but can not read them.we cha...

  • 5161 Views
  • 19 replies
  • 1 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 1 kudos

@Nermin Yehia​ yes, as you are moving files to different location manually , just update as can manage permissions in target and that should take care of everything

  • 1 kudos
18 More Replies
Labels