Data Engineering

Forum Posts

Sorted by:

by RKNutalapati • Valued Contributor

10-16-2021 2:16:34 AM

4624 Views
5 replies
4 kudos

Read and saving Blob data from oracle to databricks S3 is slow

I am trying to import a table from oracle which has around 1.3 mill rows and one of the column is a Blob, the total size of data on oracle is around 250+ GB. read and save to S3 as delta table is taking around 60 min. I tried with parallel(200 thread...

Data Engineering

4624 Views
5 replies
4 kudos

10-16-2021 2:16:34 AM

View Replies

Latest Reply

vinita_mehta
New Contributor II

10-16-2024 7:21:05 AM

4 kudos

Any update on this topic what should be the best option to read from oracle and write in ADLS.

4 kudos

10-16-2024 7:21:05 AM

4 More Replies

by su • New Contributor

10-14-2022 6:29:43 AM

4295 Views
3 replies
0 kudos

Reading from /tmp no longer working

Since yesterday, reading a file copied into the cluster is no longer working.What used to work:blob = gcs_bucket.get_blob("dev/data.ndjson") -> worksblob.download_to_filename("/tmp/data-copy.ndjson") -> worksdf = spark.read.json("/tmp/data-copy.ndjso...

Data Engineering

4295 Views
3 replies
0 kudos

10-14-2022 6:29:43 AM

View Replies

Latest Reply

Evan_From_Bosto
New Contributor II

01-05-2023 6:55:43 AM

0 kudos

I encountered this same issue, and figured out a fix!For some reason, it seems like only %sh cells can access the /tmp directory. So I just did...%sh ch /tmp/<file> /dbfs/<desired-location> and then accessed it form there using Spark.

0 kudos

01-05-2023 6:55:43 AM

2 More Replies

by ferbystudy • New Contributor III

11-07-2022 10:07:15 PM

3930 Views
3 replies
3 kudos

Resolved! Can´t read a simple .CSV from a blob

Guys, I am using "Databricks Community" to study. I put some files in a Blob, granted all access but I have no ideia why DB is not reading. Please see the code below and thanks for helping! thanks!

Data Engineering

3930 Views
3 replies
3 kudos

11-07-2022 10:07:15 PM

View Replies

Latest Reply

ferbystudy
New Contributor III

11-08-2022 1:45:15 PM

3 kudos

Guys, i found the problem! ****, databricks! HhahahaFirst i went to datalake and set all access to public/grant all user owner access..I already mounted before.. So after this changes you will need toUnmount and then Mount again! Yeah, after that it ...

3 kudos

11-08-2022 1:45:15 PM

2 More Replies

by hare • New Contributor III

05-19-2022 5:40:47 AM

4532 Views
1 replies
5 kudos

"Databricks" - "PySpark" - Read "JSON" file - Azure Blob container - "APPEND BLOB"

Hi All, We are getting JSON files in Azure blob container and its "Blob Type" is "Append Blob".We are getting an error "AnalysisException: Unable to infer schema for JSON. It must be specified manually.", when we try to read using below mentioned scr...

Data Engineering

4532 Views
1 replies
5 kudos

05-19-2022 5:40:47 AM

View Replies

Latest Reply

User16856839485
Databricks Employee

10-13-2022 9:25:54 AM

5 kudos

There currently does not appear to be direct support for append blob reads, however, converting the append blob to block blob [and then parquet or delta, etc.] are a viable option:https://kb.databricks.com/en_US/data-sources/wasb-check-blob-types?_ga...

5 kudos

10-13-2022 9:25:54 AM

by Athar • New Contributor

07-01-2022 3:10:59 AM

2246 Views
3 replies
3 kudos

How to import blob storage container with sub-directories as a database in databricks sql?

I am trying to upload blob storage on databricks sql warehouse. I followed this document https://docs.databricks.com/data/data-sources/azure/azure-storage.html. but this doesn't seem to be working. Query executed fine but created schema was empty. An...

Data Engineering

2246 Views
3 replies
3 kudos

07-01-2022 3:10:59 AM

View Replies

Latest Reply

BilalAslamDbrx
Databricks Employee

07-08-2022 6:34:16 AM

3 kudos

@Athar Abbas the simplest thing would be to create a SAS token to the ADLS Gen 2 container and then use the COPY INTO command with the AZURE_SAS_TOKEN credential: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/adls-gen2/az...

3 kudos

07-08-2022 6:34:16 AM

2 More Replies

by deisou • New Contributor

02-28-2022 9:56:37 PM

3834 Views
4 replies
2 kudos

Resolved! What is the best strategy for backing up a large Databricks Delta table that is stored in Azure blob storage?

I have a large delta table that I would like to back up and I am wondering what is the best practice for backing it up. The goal is so that if there is any accidental corruption or data loss either at the Azure blob storage level or within Databricks...

Data Engineering

3834 Views
4 replies
2 kudos

02-28-2022 9:56:37 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-27-2022 9:33:07 AM

2 kudos

Hi @deisou Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark the answer as best? If not, please tell us so we can help you.Cheers!

2 kudos

04-27-2022 9:33:07 AM

3 More Replies

by mayuri18kadam • New Contributor II

01-24-2022 8:45:25 PM

6207 Views
3 replies
0 kudos

Resolved! com.databricks.sql.io.FileReadException Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch

Hi, I am getting the following error:com.databricks.sql.io.FileReadException: Error while reading file wasbs:REDACTED_LOCAL_PART@blobStorageName.blob.core.windows.net/cook/processYear=2021/processMonth=12/processDay=30/processHour=18/part-00003-tid-4...

Data Engineering

6207 Views
3 replies
0 kudos

01-24-2022 8:45:25 PM

View Replies

Latest Reply

mayuri18kadam
New Contributor II

01-26-2022 10:05:45 AM

0 kudos

yes, I can read from notebook with DBR 6.4, when I specify this path: wasbs:REDACTED_LOCAL_PART@blobStorageName.blob.core.windows.net/cook/processYear=2021/processMonth=12/processDay=30/processHour=18but the same using DBR 6.4 from spark-submit, it f...

0 kudos

01-26-2022 10:05:45 AM

2 More Replies

by frank26364 • New Contributor III

12-20-2021 5:38:54 AM

36450 Views
5 replies
4 kudos

Resolved! Export Databricks results to Blob in a csv file

Hello everyone,I want to export my data from Databricks to the blob. My Databricks commands select some pdf from my blob, run Form Recognizer and export the output results in my blob. Here is the code: %pip install azure.storage.blob %pip install...

Data Engineering

36450 Views
5 replies
4 kudos

12-20-2021 5:38:54 AM

View Replies

Latest Reply

Anonymous
Not applicable

01-21-2022 12:01:01 PM

4 kudos

@Francis Bouliane - Thank you for sharing the solution.

4 kudos

01-21-2022 12:01:01 PM

4 More Replies

by Data_Bricks1 • New Contributor III

10-13-2021 12:00:45 PM

1023 Views
1 replies
0 kudos

BLOB multi container data loading

Data Engineering

1023 Views
1 replies
0 kudos

10-13-2021 12:00:45 PM

View Replies

Latest Reply

Anonymous
Not applicable

10-13-2021 12:31:37 PM

0 kudos

@Rajeswari Gummadi - Is this a duplicate of your other thread? I don't see any content and I want to make sure all your questions are answered.

0 kudos

10-13-2021 12:31:37 PM

by Nik • New Contributor III

09-04-2018 10:03:05 AM

15335 Views
19 replies
0 kudos

write from a Dataframe to a CSV file, CSV file is blank

Hi i am reading from a text file from a blob val sparkDF = spark.read.format(file_type) .option("header", "true") .option("inferSchema", "true") .option("delimiter", file_delimiter) .load(wasbs_string + "/" + PR_FileName) Then i test my Datafra...

Data Engineering

15335 Views
19 replies
0 kudos

09-04-2018 10:03:05 AM

View Replies

Latest Reply

nl09
New Contributor II

06-25-2020 9:15:52 AM

0 kudos

Create temp folder inside output folder. Copy file part-00000* with the file name to output folder. Delete the temp folder. Python code snippet to do the same. fpath=output+'/'+'temp' def file_exists(path): try: dbutils.fs.ls(path) return...

0 kudos

06-25-2020 9:15:52 AM

18 More Replies

by AlaQabaja • New Contributor II

09-19-2019 10:02:46 AM

5891 Views
3 replies
0 kudos

Get last modified date or create date for azure blob container

Hi Everyone, I am trying to implement a way in Python to only read files that weren't loaded since the last run of my notebook. The way I am thinking of implementing this is to keep of the last time my notebook has finished in a database table. Nex...

Data Engineering

5891 Views
3 replies
0 kudos

09-19-2019 10:02:46 AM

View Replies

Latest Reply

Forum_Admin
Contributor

03-18-2020 5:25:37 AM

0 kudos

Hello! I just wanted to share my point of view on the topic of dating sites. I have been looking for a decent Asian catch-up site for a very long time, in addition to them I found https://hookupsearch.org/asian-hookup-sites/. We definitely recommend...

0 kudos

03-18-2020 5:25:37 AM

2 More Replies