Data Engineering

Forum Posts

Sorted by:

by charlieyou • New Contributor

05-20-2023 2:29:08 PM

7593 Views
1 replies
0 kudos

StreamingQueryException: Read timed out // Reading from delta share'd dataset

I have a workspace in GCP that's reading from a delta-shared dataset hosted in S3. When trying to run a very basic DLT pipeline, I'm getting the below error. Any help would be awesome!Code:import dlt @dlt.table def fn(): return (spark.readStr...

Data Engineering

7593 Views
1 replies
0 kudos

05-20-2023 2:29:08 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-20-2023 5:24:01 AM

0 kudos

@Charlie You :The error message you're encountering suggests a timeout issue when reading from the Delta-shared dataset hosted in S3. There are a few potential reasons and solutions you can explore:Network connectivity: Verify that the network conne...

0 kudos

06-20-2023 5:24:01 AM

by Tracy_ • New Contributor II

01-31-2023 9:16:53 PM

16112 Views
5 replies
0 kudos

Incorrect reading csv format with inferSchema

Hi All,There is a CSV with a column ID (format: 8-digits & "D" at the end).When trying to read a csv with .option("inferSchema", "true"), it returns the ID as double and trim the "D". Is there any idea (apart from inferSchema=False) to get correct ...

Data Engineering

16112 Views
5 replies
0 kudos

01-31-2023 9:16:53 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 9:52:21 PM

0 kudos

Hi @tracy ng Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your...

0 kudos

04-10-2023 9:52:21 PM

4 More Replies

by Erik_L • Contributor II

03-22-2023 2:03:21 PM

5074 Views
2 replies
1 kudos

Resolved! Pyspark read multiple Parquet type expansion failure

ProblemReading nearly equivalent parquet tables in a directory with some having column X with type float and some with type double fails.Attempts at resolvingUsing streaming filesRemoving delta caching, vectorizationUsing ,cache() explicitlyNotesThis...

Data Engineering

5074 Views
2 replies
1 kudos

03-22-2023 2:03:21 PM

View Replies

Latest Reply

Anonymous
Not applicable

03-29-2023 9:45:38 PM

1 kudos

Hi @Erik Louie Help us build a vibrant and resourceful community by recognizing and highlighting insightful contributions. Mark the best answers and show your appreciation!Regards

1 kudos

03-29-2023 9:45:38 PM

1 More Replies

by su • New Contributor

10-14-2022 6:29:43 AM

5173 Views
3 replies
0 kudos

Reading from /tmp no longer working

Since yesterday, reading a file copied into the cluster is no longer working.What used to work:blob = gcs_bucket.get_blob("dev/data.ndjson") -> worksblob.download_to_filename("/tmp/data-copy.ndjson") -> worksdf = spark.read.json("/tmp/data-copy.ndjso...

Data Engineering

5173 Views
3 replies
0 kudos

10-14-2022 6:29:43 AM

View Replies

Latest Reply

Evan_From_Bosto
New Contributor II

01-05-2023 6:55:43 AM

0 kudos

I encountered this same issue, and figured out a fix!For some reason, it seems like only %sh cells can access the /tmp directory. So I just did...%sh ch /tmp/<file> /dbfs/<desired-location> and then accessed it form there using Spark.

0 kudos

01-05-2023 6:55:43 AM

2 More Replies

by syedmuhammedmeh • New Contributor III

09-17-2022 3:05:36 AM

4122 Views
2 replies
6 kudos

Resolved! Databricks Kafka Read Not connecting

I'm trying to read data from GCP kafka through azure databricks but getting below warning and notebook is simply not completing. Any suggestion please? WARN NetworkClient: Consumer groupId Bootstrap broker rack disconnectedPlease note I've properly c...

Data Engineering

4122 Views
2 replies
6 kudos

09-17-2022 3:05:36 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

09-19-2022 2:45:43 PM

6 kudos

Could you share the full error stack trace from your driver's logs? This is a Warning message, we need to take a look at the error level messages.

6 kudos

09-19-2022 2:45:43 PM

1 More Replies

by BhagS • New Contributor II

05-17-2022 6:36:03 AM

7346 Views
2 replies
5 kudos

Resolved! Write Empty Delta file in Datalake

hi all,Currently, i am trying to write an empty delta file in data lake, to do this i am doing the following:Reading parquet file from my landing zone ( this file consists only of the schema of SQL tables)df=spark.read.format('parquet').load(landingZ...

Data Engineering

7346 Views
2 replies
5 kudos

05-17-2022 6:36:03 AM

View Replies

Latest Reply

Noopur_Nigam
Databricks Employee

06-01-2022 8:20:42 PM

5 kudos

Hi @bhagya s Since your source file is empty, there is no data file inside the centralizedZonePath directory i.e .parquet file is not created in the target location. However, _delta_log is the transaction log that holds the metadata of the delta for...

5 kudos

06-01-2022 8:20:42 PM

1 More Replies

by Ben_Spark • New Contributor III

04-14-2022 3:11:54 AM

9215 Views
4 replies
2 kudos

Resolved! Databricks Spark XML parser : support for namespace declared at the ancestor level.

I'm trying to use Spark-XML API and I'm facing issue with the XSD validation option.Actually when I parser an XML file using the "rowValidationXSDPath" option the parser can't recognize the Prefixes/Namespaces declared at the root level. For this to...

Data Engineering

9215 Views
4 replies
2 kudos

04-14-2022 3:11:54 AM

View Replies

Latest Reply

Ben_Spark
New Contributor III

05-11-2022 6:34:54 AM

2 kudos

Hi sorry for the late response got busy looking for a permanent solution to this problem .At the end we are giving up on the XSDpath parser. This option does not work when Prefixes namespaces are declared at the ancestor level .Thank you anyway for ...

2 kudos

05-11-2022 6:34:54 AM

3 More Replies

by Krishscientist • New Contributor III

04-01-2022 2:39:37 PM

3577 Views
1 replies
2 kudos

Resolved! Issue when reading .wav file

Hi, I am developing notebook to read .wav files and build Speech Matching Scenario. I have saved files in "/FileStore/tables/doors_and_corners_kid_thats_where_they_get_you.wav".When I wrote code like thisfrom scipy.io import wavfileimport numpy as np...

Data Engineering

3577 Views
1 replies
2 kudos

04-01-2022 2:39:37 PM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

04-01-2022 4:07:24 PM

2 kudos

Try to prefix it with dbfs dbfs:/FileStore or /dbfs/FileStore

2 kudos

04-01-2022 4:07:24 PM

by Anonymous • Not applicable

11-24-2021 2:17:28 AM

890 Views
0 replies
0 kudos

Find a local spokesperson for advice. Ask about their career path, how did they "get here"?Read books about speaking and writing.Analyze fa...

Find a local spokesperson for advice. Ask about their career path, how did they "get here"?Read books about speaking and writing.Analyze famous speeches text to speech software for yourself and do not rely on books that tell you the "why" and "how" o...

Data Engineering

890 Views
0 replies
0 kudos

11-24-2021 2:17:28 AM

by Erik • Valued Contributor III

11-05-2021 11:45:45 AM

4957 Views
4 replies
2 kudos

Resolved! Does Z-ordering speed up reading of a single file?

Situation: we have one partion per date, and it just so happens that each partition ends up (after optimize) as *a single* 128mb file. We partition on date, and zorder on userid, and our query is something like "find max value of column A where useri...

Data Engineering

4957 Views
4 replies
2 kudos

11-05-2021 11:45:45 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

11-07-2021 10:52:51 PM

2 kudos

Z-Order will make sure that in case you need to read multiple files, these files are co-located.For a single file this does not matter as a single file is always local to itself.If you are certain that your spark program will only read a single file,...

2 kudos

11-07-2021 10:52:51 PM

3 More Replies

by User16826987838 • Databricks Employee

06-25-2021 9:31:38 AM

1213 Views
0 replies
0 kudos

How do I toggle between reading encrypted and writing unencrypted

If we want to read from a kms encrypted s3 bucket, but write out unencrypted, Do we use the global init script?I am wondering how to “toggle” btw reading encrypted, and writing unencrypted

Data Engineering

1213 Views
0 replies
0 kudos

06-25-2021 9:31:38 AM

Databricks Community

StreamingQueryException: Read timed out // Reading from delta share'd dataset

Incorrect reading csv format with inferSchema

Resolved! Pyspark read multiple Parquet type expansion failure

Reading from /tmp no longer working

Resolved! Databricks Kafka Read Not connecting

Resolved! Write Empty Delta file in Datalake

Resolved! Databricks Spark XML parser : support for namespace declared at the ancestor level.

Resolved! Issue when reading .wav file

Find a local spokesperson for advice. Ask about their career path, how did they "get here"?Read books about speaking and writing.Analyze fa...

Resolved! Does Z-ordering speed up reading of a single file?

How do I toggle between reading encrypted and writing unencrypted