Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I need to move group of files(python or scala file from)or folder from dbfs location to user workspace directory in azure databricks to do testing on file.Its verify difficult to upload each file one by one into the user workspace directory, so is it...
Problem statement:Source file format : .tar.gzAvg size: 10 mbnumber of tar.gz files: 1000Each tar.gz file contails around 20000 csv files.Requirement : Untar the tar.gz file and write CSV files to blob storage / intermediate storage layer for further...
@Hubert Dudek Thanks for your suggestions.After creating storage account in same region as databricks I can see that performance is as expected.Now it is clear that issue is with /mnt/ location is being in different region than databricks. I would ...
Yes you need just to create service account for databricks and than assign storage admin role to bucket. After that you can mount GCS standard way:bucket_name = "<bucket-name>"
mount_name = "<mount-name>"
dbutils.fs.mount("gs://%s" % bucket_name, "/m...
Hi Folks,I have installed and configured databricks CLI in my local machine. I tried to move a local file from my personal computer using dbfs cp to dbfs:/ path. I can see the file is copied from local, and is only visible in local. I am not able to ...
Hi, Could you try to save the file from your local machine to dbfs:/FileStore location?# Put local file test.py to dbfs:/FileStore/test.pydbfs cp test.py dbfs:/FileStore/test.py
Dear community,I have the following problem:%fs mv '/FileStore/Tree_point_classification-1.dlpk' '/dbfs/mnt/group22/Tree_point_classification-1.dlpk'I have uploaded a file of a ML-model and have transferred it to the directory with When I now check ...
There is dbfs:/dbfs/ displayed maybe file is in /dbfs/dbfs directory? Please check it and try to open with open('/dbfs/dbfs. You can also use "data" from left menu to check what is in dbfs file system more easily.
I am downloading multiple files by web scraping and by default they are stored in /tmp
I can copy a single file by providing the filename and path
%fs cp file:/tmp/2020-12-14_listings.csv.gz dbfs:/tmp
but when I try to copy multiple files I get an ...
Hi,
I am looking at my Databricks workspace and it looks like I am missing DBFS Databricks-dataset root folder. The dbfs root folders I can view are FileStore, local_disk(),mnt, pipelines and user.
Can I mount Databricks-dataset or am I missing some...
When trying to upload to DBFS from local machine getting error as "Error occurred when processing file ... : Server responded with 0 code"
DBR 7.3 LTSSpark 3.0.1
Scala 2.12
Uploading the file using the "upload" in the Databricks cloud console, the c...
Even I am facing the same issue with GCP databricks. I am able to upload files with smaller size. When i tried with 3MB file, databricks chokes. I get the above error.
I tried with aws databricks, it works good even for bigger size files.
I ran the below statement and got the error
%python
data = sqlContext.read.parquet("/FileStore/tables/ganesh.parquet")
display(data)
Error:
SparkException: Job aborted due to stage failure: Task 0 in stage 27.0 failed 1 times, most recent failure:...
I'm having a similar issue reading a JSON file. It is ~550MB compressed and is on a single line:
val cfilename = "c_datafeed_20200128.json.gz"
val events = spark.read.json(s"/mnt/c/input1/$cfilename")
display(events)
The filename is correct and t...
I have been trying to open a file on the dbfs using all different combinations:
if I use the following code:
with open("/dbfs/FileStore/df/Downloadedfile.csv", 'r', newline='') as f
I get IsADirectoryError: [Errno 21] Is a directory
with open("dbfs:...
To get rid of this error you can try using Python file exists methods to check that at least python sees the file exists or not. In other words, you can make sure that the user has indeed typed a correct path for a real existing file. If the user do...
I have created a databricks in azure. I have created a cluster for python 3. I am creating a job using spark-submit parameters. How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in ...
Hi @Nandha Kumar,please go through the below docs to pass python files as job,https://docs.databricks.com/dev-tools/api/latest/jobs.html#sparkpythontask