cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

charlieyou
by New Contributor
  • 6319 Views
  • 1 replies
  • 0 kudos

StreamingQueryException: Read timed out // Reading from delta share'd dataset

I have a workspace in GCP that's reading from a delta-shared dataset hosted in S3. When trying to run a very basic DLT pipeline, I'm getting the below error. Any help would be awesome!Code:import dlt     @dlt.table def fn(): return (spark.readStr...

  • 6319 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@Charlie You​ :The error message you're encountering suggests a timeout issue when reading from the Delta-shared dataset hosted in S3. There are a few potential reasons and solutions you can explore:Network connectivity: Verify that the network conne...

  • 0 kudos
Tracy_
by New Contributor II
  • 11041 Views
  • 5 replies
  • 0 kudos

Incorrect reading csv format with inferSchema

Hi All,There is a CSV with a column ID (format: 8-digits & "D" at the end).When trying to read a csv with .option("inferSchema", "true"), it returns the ID as double and trim the "D". Is there any idea (apart from inferSchema=False) to get correct ...

image.png
  • 11041 Views
  • 5 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @tracy ng​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your...

  • 0 kudos
4 More Replies
Erik_L
by Contributor II
  • 3578 Views
  • 2 replies
  • 1 kudos

Resolved! Pyspark read multiple Parquet type expansion failure

ProblemReading nearly equivalent parquet tables in a directory with some having column X with type float and some with type double fails.Attempts at resolvingUsing streaming filesRemoving delta caching, vectorizationUsing ,cache() explicitlyNotesThis...

  • 3578 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Erik Louie​ Help us build a vibrant and resourceful community by recognizing and highlighting insightful contributions. Mark the best answers and show your appreciation!Regards

  • 1 kudos
1 More Replies
su
by New Contributor
  • 3928 Views
  • 3 replies
  • 0 kudos

Reading from /tmp no longer working

Since yesterday, reading a file copied into the cluster is no longer working.What used to work:blob = gcs_bucket.get_blob("dev/data.ndjson") -> worksblob.download_to_filename("/tmp/data-copy.ndjson") -> worksdf = spark.read.json("/tmp/data-copy.ndjso...

  • 3928 Views
  • 3 replies
  • 0 kudos
Latest Reply
Evan_From_Bosto
New Contributor II
  • 0 kudos

I encountered this same issue, and figured out a fix!For some reason, it seems like only %sh cells can access the /tmp directory. So I just did...%sh ch /tmp/<file> /dbfs/<desired-location> and then accessed it form there using Spark.

  • 0 kudos
2 More Replies
syedmuhammedmeh
by New Contributor III
  • 2951 Views
  • 2 replies
  • 6 kudos

Resolved! Databricks Kafka Read Not connecting

I'm trying to read data from GCP kafka through azure databricks but getting below warning and notebook is simply not completing. Any suggestion please? WARN NetworkClient: Consumer groupId Bootstrap broker rack disconnectedPlease note I've properly c...

  • 2951 Views
  • 2 replies
  • 6 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 6 kudos

Could you share the full error stack trace from your driver's logs? This is a Warning message, we need to take a look at the error level messages.

  • 6 kudos
1 More Replies
BhagS
by New Contributor II
  • 5500 Views
  • 2 replies
  • 5 kudos

Resolved! Write Empty Delta file in Datalake

hi all,Currently, i am trying to write an empty delta file in data lake, to do this i am doing the following:Reading parquet file from my landing zone ( this file consists only of the schema of SQL tables)df=spark.read.format('parquet').load(landingZ...

image
  • 5500 Views
  • 2 replies
  • 5 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 5 kudos

Hi @bhagya s​ Since your source file is empty, there is no data file inside the centralizedZonePath directory i.e .parquet file is not created in the target location. However, _delta_log is the transaction log that holds the metadata of the delta for...

  • 5 kudos
1 More Replies
Ben_Spark
by New Contributor III
  • 7129 Views
  • 4 replies
  • 2 kudos

Resolved! Databricks Spark XML parser : support for namespace declared at the ancestor level.

I'm trying to use Spark-XML API and I'm facing issue with the XSD validation option.Actually when I parser an XML file using the "rowValidationXSDPath" option the parser can't recognize the Prefixes/Namespaces declared at the root level. For this to...

  • 7129 Views
  • 4 replies
  • 2 kudos
Latest Reply
Ben_Spark
New Contributor III
  • 2 kudos

Hi sorry for the late response got busy looking for a permanent solution to this problem .At the end we are giving up on the XSDpath parser. This option does not work when Prefixes namespaces are declared at the ancestor level .Thank you anyway for ...

  • 2 kudos
3 More Replies
Krishscientist
by New Contributor III
  • 2654 Views
  • 1 replies
  • 2 kudos

Resolved! Issue when reading .wav file

Hi, I am developing notebook to read .wav files and build Speech Matching Scenario. I have saved files in "/FileStore/tables/doors_and_corners_kid_thats_where_they_get_you.wav".When I wrote code like thisfrom scipy.io import wavfileimport numpy as np...

  • 2654 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Try to prefix it with dbfs dbfs:/FileStore or /dbfs/FileStore

  • 2 kudos
Anonymous
by Not applicable
  • 530 Views
  • 0 replies
  • 0 kudos

Find a local spokesperson for advice.  Ask about their career path, how did they "get here"?Read books about speaking and writing.Analyze fa...

Find a local spokesperson for advice. Ask about their career path, how did they "get here"?Read books about speaking and writing.Analyze famous speeches text to speech software for yourself and do not rely on books that tell you the "why" and "how" o...

  • 530 Views
  • 0 replies
  • 0 kudos
Erik
by Valued Contributor III
  • 3639 Views
  • 4 replies
  • 2 kudos

Resolved! Does Z-ordering speed up reading of a single file?

Situation: we have one partion per date, and it just so happens that each partition ends up (after optimize) as *a single* 128mb file. We partition on date, and zorder on userid, and our query is something like "find max value of column A where useri...

  • 3639 Views
  • 4 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

Z-Order will make sure that in case you need to read multiple files, these files are co-located.For a single file this does not matter as a single file is always local to itself.If you are certain that your spark program will only read a single file,...

  • 2 kudos
3 More Replies
Labels