cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

RKNutalapati
by Valued Contributor
  • 4304 Views
  • 5 replies
  • 4 kudos

Read and saving Blob data from oracle to databricks S3 is slow

I am trying to import a table from oracle which has around 1.3 mill rows and one of the column is a Blob, the total size of data on oracle is around 250+ GB. read and save to S3 as delta table is taking around 60 min. I tried with parallel(200 thread...

  • 4304 Views
  • 5 replies
  • 4 kudos
Latest Reply
vinita_mehta
New Contributor II
  • 4 kudos

Any update on this topic what should be the best option to read from oracle and write in ADLS. 

  • 4 kudos
4 More Replies
su
by New Contributor
  • 4117 Views
  • 3 replies
  • 0 kudos

Reading from /tmp no longer working

Since yesterday, reading a file copied into the cluster is no longer working.What used to work:blob = gcs_bucket.get_blob("dev/data.ndjson") -> worksblob.download_to_filename("/tmp/data-copy.ndjson") -> worksdf = spark.read.json("/tmp/data-copy.ndjso...

  • 4117 Views
  • 3 replies
  • 0 kudos
Latest Reply
Evan_From_Bosto
New Contributor II
  • 0 kudos

I encountered this same issue, and figured out a fix!For some reason, it seems like only %sh cells can access the /tmp directory. So I just did...%sh ch /tmp/<file> /dbfs/<desired-location> and then accessed it form there using Spark.

  • 0 kudos
2 More Replies
ferbystudy
by New Contributor III
  • 3789 Views
  • 3 replies
  • 3 kudos

Resolved! Can´t read a simple .CSV from a blob

Guys, I am using "Databricks Community" to study. I put some files in a Blob, granted all access but I have no ideia why DB is not reading. Please see the code below and thanks for helping! thanks!

csf
  • 3789 Views
  • 3 replies
  • 3 kudos
Latest Reply
ferbystudy
New Contributor III
  • 3 kudos

Guys, i found the problem! ****, databricks! HhahahaFirst i went to datalake and set all access to public/grant all user owner access..I already mounted before.. So after this changes you will need toUnmount and then Mount again! Yeah, after that it ...

  • 3 kudos
2 More Replies
hare
by New Contributor III
  • 4407 Views
  • 1 replies
  • 5 kudos

"Databricks" - "PySpark" - Read "JSON" file - Azure Blob container - "APPEND BLOB"

Hi All, We are getting JSON files in Azure blob container and its "Blob Type" is "Append Blob".We are getting an error "AnalysisException: Unable to infer schema for JSON. It must be specified manually.", when we try to read using below mentioned scr...

  • 4407 Views
  • 1 replies
  • 5 kudos
Latest Reply
User16856839485
Databricks Employee
  • 5 kudos

There currently does not appear to be direct support for append blob reads, however, converting the append blob to block blob [and then parquet or delta, etc.] are a viable option:https://kb.databricks.com/en_US/data-sources/wasb-check-blob-types?_ga...

  • 5 kudos
Athar
by New Contributor
  • 2180 Views
  • 3 replies
  • 3 kudos

How to import blob storage container with sub-directories as a database in databricks sql?

I am trying to upload blob storage on databricks sql warehouse. I followed this document https://docs.databricks.com/data/data-sources/azure/azure-storage.html. but this doesn't seem to be working. Query executed fine but created schema was empty. An...

  • 2180 Views
  • 3 replies
  • 3 kudos
Latest Reply
BilalAslamDbrx
Databricks Employee
  • 3 kudos

@Athar Abbas​ the simplest thing would be to create a SAS token to the ADLS Gen 2 container and then use the COPY INTO command with the AZURE_SAS_TOKEN credential: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/adls-gen2/az...

  • 3 kudos
2 More Replies
deisou
by New Contributor
  • 3618 Views
  • 4 replies
  • 2 kudos

Resolved! What is the best strategy for backing up a large Databricks Delta table that is stored in Azure blob storage?

I have a large delta table that I would like to back up and I am wondering what is the best practice for backing it up. The goal is so that if there is any accidental corruption or data loss either at the Azure blob storage level or within Databricks...

  • 3618 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @deisou​ Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark the answer as best? If not, please tell us so we can help you.Cheers!

  • 2 kudos
3 More Replies
mayuri18kadam
by New Contributor II
  • 5879 Views
  • 3 replies
  • 0 kudos

Resolved! com.databricks.sql.io.FileReadException Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch

Hi, I am getting the following error:com.databricks.sql.io.FileReadException: Error while reading file wasbs:REDACTED_LOCAL_PART@blobStorageName.blob.core.windows.net/cook/processYear=2021/processMonth=12/processDay=30/processHour=18/part-00003-tid-4...

  • 5879 Views
  • 3 replies
  • 0 kudos
Latest Reply
mayuri18kadam
New Contributor II
  • 0 kudos

yes, I can read from notebook with DBR 6.4, when I specify this path: wasbs:REDACTED_LOCAL_PART@blobStorageName.blob.core.windows.net/cook/processYear=2021/processMonth=12/processDay=30/processHour=18but the same using DBR 6.4 from spark-submit, it f...

  • 0 kudos
2 More Replies
frank26364
by New Contributor III
  • 36170 Views
  • 5 replies
  • 4 kudos

Resolved! Export Databricks results to Blob in a csv file

Hello everyone,I want to export my data from Databricks to the blob. My Databricks commands select some pdf from my blob, run Form Recognizer and export the output results in my blob. Here is the code: %pip install azure.storage.blob %pip install...

  • 36170 Views
  • 5 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

@Francis Bouliane​ - Thank you for sharing the solution.

  • 4 kudos
4 More Replies
Nik
by New Contributor III
  • 14782 Views
  • 19 replies
  • 0 kudos

write from a Dataframe to a CSV file, CSV file is blank

Hi i am reading from a text file from a blob val sparkDF = spark.read.format(file_type) .option("header", "true") .option("inferSchema", "true") .option("delimiter", file_delimiter) .load(wasbs_string + "/" + PR_FileName) Then i test my Datafra...

  • 14782 Views
  • 19 replies
  • 0 kudos
Latest Reply
nl09
New Contributor II
  • 0 kudos

Create temp folder inside output folder. Copy file part-00000* with the file name to output folder. Delete the temp folder. Python code snippet to do the same. fpath=output+'/'+'temp' def file_exists(path): try: dbutils.fs.ls(path) return...

  • 0 kudos
18 More Replies
AlaQabaja
by New Contributor II
  • 5685 Views
  • 3 replies
  • 0 kudos

Get last modified date or create date for azure blob container

Hi Everyone, I am trying to implement a way in Python to only read files that weren't loaded since the last run of my notebook. The way I am thinking of implementing this is to keep of the last time my notebook has finished in a database table. Nex...

  • 5685 Views
  • 3 replies
  • 0 kudos
Latest Reply
Forum_Admin
Contributor
  • 0 kudos

Hello! I just wanted to share my point of view on the topic of dating sites. I have been looking for a decent Asian catch-up site for a very long time, in addition to them I found https://hookupsearch.org/asian-hookup-sites/. We definitely recommend...

  • 0 kudos
2 More Replies
Labels