cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

charlieyou
by New Contributor
  • 1168 Views
  • 1 replies
  • 0 kudos

StreamingQueryException: Read timed out // Reading from delta share'd dataset

I have a workspace in GCP that's reading from a delta-shared dataset hosted in S3. When trying to run a very basic DLT pipeline, I'm getting the below error. Any help would be awesome!Code:import dlt     @dlt.table def fn(): return (spark.readStr...

  • 1168 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Charlie You​ :The error message you're encountering suggests a timeout issue when reading from the Delta-shared dataset hosted in S3. There are a few potential reasons and solutions you can explore:Network connectivity: Verify that the network conne...

  • 0 kudos
Tracy_
by New Contributor II
  • 5300 Views
  • 5 replies
  • 0 kudos

Incorrect reading csv format with inferSchema

Hi All,There is a CSV with a column ID (format: 8-digits & "D" at the end).When trying to read a csv with .option("inferSchema", "true"), it returns the ID as double and trim the "D". Is there any idea (apart from inferSchema=False) to get correct ...

image.png
  • 5300 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @tracy ng​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your...

  • 0 kudos
4 More Replies
Erik_L
by Contributor II
  • 1443 Views
  • 2 replies
  • 1 kudos

Resolved! Pyspark read multiple Parquet type expansion failure

ProblemReading nearly equivalent parquet tables in a directory with some having column X with type float and some with type double fails.Attempts at resolvingUsing streaming filesRemoving delta caching, vectorizationUsing ,cache() explicitlyNotesThis...

  • 1443 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Erik Louie​ Help us build a vibrant and resourceful community by recognizing and highlighting insightful contributions. Mark the best answers and show your appreciation!Regards

  • 1 kudos
1 More Replies
su
by New Contributor
  • 2250 Views
  • 3 replies
  • 0 kudos

Reading from /tmp no longer working

Since yesterday, reading a file copied into the cluster is no longer working.What used to work:blob = gcs_bucket.get_blob("dev/data.ndjson") -> worksblob.download_to_filename("/tmp/data-copy.ndjson") -> worksdf = spark.read.json("/tmp/data-copy.ndjso...

  • 2250 Views
  • 3 replies
  • 0 kudos
Latest Reply
Evan_From_Bosto
New Contributor II
  • 0 kudos

I encountered this same issue, and figured out a fix!For some reason, it seems like only %sh cells can access the /tmp directory. So I just did...%sh ch /tmp/<file> /dbfs/<desired-location> and then accessed it form there using Spark.

  • 0 kudos
2 More Replies
syedmuhammedmeh
by New Contributor III
  • 1620 Views
  • 3 replies
  • 6 kudos

Resolved! Databricks Kafka Read Not connecting

I'm trying to read data from GCP kafka through azure databricks but getting below warning and notebook is simply not completing. Any suggestion please? WARN NetworkClient: Consumer groupId Bootstrap broker rack disconnectedPlease note I've properly c...

  • 1620 Views
  • 3 replies
  • 6 kudos
Latest Reply
Kaniz
Community Manager
  • 6 kudos

Hi @Syed Mohammed Mehdi​, We haven’t heard from you since the last response from @Jose Gonzalez​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to other...

  • 6 kudos
2 More Replies
BhagS
by New Contributor II
  • 3258 Views
  • 4 replies
  • 5 kudos

Resolved! Write Empty Delta file in Datalake

hi all,Currently, i am trying to write an empty delta file in data lake, to do this i am doing the following:Reading parquet file from my landing zone ( this file consists only of the schema of SQL tables)df=spark.read.format('parquet').load(landingZ...

image
  • 3258 Views
  • 4 replies
  • 5 kudos
Latest Reply
Kaniz
Community Manager
  • 5 kudos

Hi @bhagya s​ ​, We haven’t heard from you on the last response from @Noopur Nigam​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Otherwise,...

  • 5 kudos
3 More Replies
Ben_Spark
by New Contributor III
  • 3951 Views
  • 9 replies
  • 2 kudos

Resolved! Databricks Spark XML parser : support for namespace declared at the ancestor level.

I'm trying to use Spark-XML API and I'm facing issue with the XSD validation option.Actually when I parser an XML file using the "rowValidationXSDPath" option the parser can't recognize the Prefixes/Namespaces declared at the root level. For this to...

  • 3951 Views
  • 9 replies
  • 2 kudos
Latest Reply
Ben_Spark
New Contributor III
  • 2 kudos

Hi sorry for the late response got busy looking for a permanent solution to this problem .At the end we are giving up on the XSDpath parser. This option does not work when Prefixes namespaces are declared at the ancestor level .Thank you anyway for ...

  • 2 kudos
8 More Replies
Krishscientist
by New Contributor III
  • 1640 Views
  • 1 replies
  • 2 kudos

Resolved! Issue when reading .wav file

Hi, I am developing notebook to read .wav files and build Speech Matching Scenario. I have saved files in "/FileStore/tables/doors_and_corners_kid_thats_where_they_get_you.wav".When I wrote code like thisfrom scipy.io import wavfileimport numpy as np...

  • 1640 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Try to prefix it with dbfs dbfs:/FileStore or /dbfs/FileStore

  • 2 kudos
Anonymous
by Not applicable
  • 277 Views
  • 0 replies
  • 0 kudos

Find a local spokesperson for advice.  Ask about their career path, how did they "get here"?Read books about speaking and writing.Analyze fa...

Find a local spokesperson for advice. Ask about their career path, how did they "get here"?Read books about speaking and writing.Analyze famous speeches text to speech software for yourself and do not rely on books that tell you the "why" and "how" o...

  • 277 Views
  • 0 replies
  • 0 kudos
Erik
by Valued Contributor II
  • 1796 Views
  • 6 replies
  • 2 kudos

Resolved! Does Z-ordering speed up reading of a single file?

Situation: we have one partion per date, and it just so happens that each partition ends up (after optimize) as *a single* 128mb file. We partition on date, and zorder on userid, and our query is something like "find max value of column A where useri...

  • 1796 Views
  • 6 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

Z-Order will make sure that in case you need to read multiple files, these files are co-located.For a single file this does not matter as a single file is always local to itself.If you are certain that your spark program will only read a single file,...

  • 2 kudos
5 More Replies
Labels