cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

su
by New Contributor
  • 2759 Views
  • 3 replies
  • 0 kudos

Reading from /tmp no longer working

Since yesterday, reading a file copied into the cluster is no longer working.What used to work:blob = gcs_bucket.get_blob("dev/data.ndjson") -> worksblob.download_to_filename("/tmp/data-copy.ndjson") -> worksdf = spark.read.json("/tmp/data-copy.ndjso...

  • 2759 Views
  • 3 replies
  • 0 kudos
Latest Reply
Evan_From_Bosto
New Contributor II
  • 0 kudos

I encountered this same issue, and figured out a fix!For some reason, it seems like only %sh cells can access the /tmp directory. So I just did...%sh ch /tmp/<file> /dbfs/<desired-location> and then accessed it form there using Spark.

  • 0 kudos
2 More Replies
ferbystudy
by New Contributor III
  • 2388 Views
  • 4 replies
  • 3 kudos

Resolved! Can´t read a simple .CSV from a blob

Guys, I am using "Databricks Community" to study. I put some files in a Blob, granted all access but I have no ideia why DB is not reading. Please see the code below and thanks for helping! thanks!

csf
  • 2388 Views
  • 4 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Fernando Rezende​, Thank you for sharing the solution with us.It would mean a lot if you could select the "Best Answer" to help others find the correct answer faster.This makes that answer appear right after the question, so it's easier to find w...

  • 3 kudos
3 More Replies
hare
by New Contributor III
  • 3390 Views
  • 3 replies
  • 6 kudos

"Databricks" - "PySpark" - Read "JSON" file - Azure Blob container - "APPEND BLOB"

Hi All, We are getting JSON files in Azure blob container and its "Blob Type" is "Append Blob".We are getting an error "AnalysisException: Unable to infer schema for JSON. It must be specified manually.", when we try to read using below mentioned scr...

  • 3390 Views
  • 3 replies
  • 6 kudos
Latest Reply
User16856839485
New Contributor II
  • 6 kudos

There currently does not appear to be direct support for append blob reads, however, converting the append blob to block blob [and then parquet or delta, etc.] are a viable option:https://kb.databricks.com/en_US/data-sources/wasb-check-blob-types?_ga...

  • 6 kudos
2 More Replies
Athar
by New Contributor
  • 1521 Views
  • 4 replies
  • 3 kudos

How to import blob storage container with sub-directories as a database in databricks sql?

I am trying to upload blob storage on databricks sql warehouse. I followed this document https://docs.databricks.com/data/data-sources/azure/azure-storage.html. but this doesn't seem to be working. Query executed fine but created schema was empty. An...

  • 1521 Views
  • 4 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Athar Abbas​ , We haven't heard from you on the last response from @Prabakar Ammeappin​​ and @Bilal Aslam​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please share it with the community as i...

  • 3 kudos
3 More Replies
deisou
by New Contributor
  • 2130 Views
  • 4 replies
  • 2 kudos

Resolved! What is the best strategy for backing up a large Databricks Delta table that is stored in Azure blob storage?

I have a large delta table that I would like to back up and I am wondering what is the best practice for backing it up. The goal is so that if there is any accidental corruption or data loss either at the Azure blob storage level or within Databricks...

  • 2130 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @deisou​ Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark the answer as best? If not, please tell us so we can help you.Cheers!

  • 2 kudos
3 More Replies
mayuri18kadam
by New Contributor II
  • 3836 Views
  • 3 replies
  • 0 kudos

Resolved! com.databricks.sql.io.FileReadException Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch

Hi, I am getting the following error:com.databricks.sql.io.FileReadException: Error while reading file wasbs:REDACTED_LOCAL_PART@blobStorageName.blob.core.windows.net/cook/processYear=2021/processMonth=12/processDay=30/processHour=18/part-00003-tid-4...

  • 3836 Views
  • 3 replies
  • 0 kudos
Latest Reply
mayuri18kadam
New Contributor II
  • 0 kudos

yes, I can read from notebook with DBR 6.4, when I specify this path: wasbs:REDACTED_LOCAL_PART@blobStorageName.blob.core.windows.net/cook/processYear=2021/processMonth=12/processDay=30/processHour=18but the same using DBR 6.4 from spark-submit, it f...

  • 0 kudos
2 More Replies
frank26364
by New Contributor III
  • 23789 Views
  • 7 replies
  • 4 kudos

Resolved! Export Databricks results to Blob in a csv file

Hello everyone,I want to export my data from Databricks to the blob. My Databricks commands select some pdf from my blob, run Form Recognizer and export the output results in my blob. Here is the code: %pip install azure.storage.blob %pip install...

  • 23789 Views
  • 7 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Francis Bouliane​ - Thank you for sharing the solution.

  • 4 kudos
6 More Replies
RKNutalapati
by Valued Contributor
  • 2816 Views
  • 4 replies
  • 4 kudos

Read and saving Blob data from oracle to databricks S3 is slow

I am trying to import a table from oracle which has around 1.3 mill rows and one of the column is a Blob, the total size of data on oracle is around 250+ GB. read and save to S3 as delta table is taking around 60 min. I tried with parallel(200 thread...

  • 2816 Views
  • 4 replies
  • 4 kudos
Latest Reply
User16829050420
New Contributor III
  • 4 kudos

Hello @Rama Krishna N​ - We will need to check the task on the Spark UI to validate if the operation is a read from oracle database or write into S3. The task should show the specific operation on the UI.Also, the active threads on the Spark UI will ...

  • 4 kudos
3 More Replies
Nik
by New Contributor III
  • 10238 Views
  • 19 replies
  • 0 kudos

write from a Dataframe to a CSV file, CSV file is blank

Hi i am reading from a text file from a blob val sparkDF = spark.read.format(file_type) .option("header", "true") .option("inferSchema", "true") .option("delimiter", file_delimiter) .load(wasbs_string + "/" + PR_FileName) Then i test my Datafra...

  • 10238 Views
  • 19 replies
  • 0 kudos
Latest Reply
nl09
New Contributor II
  • 0 kudos

Create temp folder inside output folder. Copy file part-00000* with the file name to output folder. Delete the temp folder. Python code snippet to do the same. fpath=output+'/'+'temp' def file_exists(path): try: dbutils.fs.ls(path) return...

  • 0 kudos
18 More Replies
AlaQabaja
by New Contributor II
  • 4096 Views
  • 3 replies
  • 0 kudos

Get last modified date or create date for azure blob container

Hi Everyone, I am trying to implement a way in Python to only read files that weren't loaded since the last run of my notebook. The way I am thinking of implementing this is to keep of the last time my notebook has finished in a database table. Nex...

  • 4096 Views
  • 3 replies
  • 0 kudos
Latest Reply
Forum_Admin
Contributor
  • 0 kudos

Hello! I just wanted to share my point of view on the topic of dating sites. I have been looking for a decent Asian catch-up site for a very long time, in addition to them I found https://hookupsearch.org/asian-hookup-sites/. We definitely recommend...

  • 0 kudos
2 More Replies
Labels