cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

hyedesign
by New Contributor II
  • 3882 Views
  • 6 replies
  • 0 kudos

Getting SparkConnectGrpcException: (java.io.EOFException) error when using foreachBatch

Hello, I am trying to write a simple upsert statement following the steps in the tutorials. here is what my code looks like:from pyspark.sql import functions as Fdef upsert_source_one(self df_source = spark.readStream.format("delta").table(self.so...

  • 3882 Views
  • 6 replies
  • 0 kudos
Latest Reply
seans
New Contributor III
  • 0 kudos

Here is the full message  Exception has occurred: SparkConnectGrpcException (java.io.IOException) Connection reset by peer grpc._channel._MultiThreadedRendezvous: _MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.INTERNAL deta...

  • 0 kudos
5 More Replies
brianbraunstein
by New Contributor II
  • 1731 Views
  • 2 replies
  • 0 kudos

spark.sql not supporting kwargs as documented

This documentation https://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/api/pyspark.sql.SparkSession.sql.html#pyspark.sql.SparkSession.sql claims that spark.sql() should be able to take kwargs, such that the following should work:display...

  • 1731 Views
  • 2 replies
  • 0 kudos
Latest Reply
adriennn
Valued Contributor
  • 0 kudos

It's working again in 15.4 LTS

  • 0 kudos
1 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 2159 Views
  • 2 replies
  • 2 kudos

foreachBatch

With parameterized SQL queries in Structured Streaming's foreachBatch, there's no longer a need to create temp views for the MERGE command.

structured1.png
  • 2159 Views
  • 2 replies
  • 2 kudos
Latest Reply
adriennn
Valued Contributor
  • 2 kudos

Note that this functionality broke somewhere between DBR 13.3 and 15, so best bet is 15.4 LTSSolved: Parameterized spark.sql() not working - Databricks Community - 56510

  • 2 kudos
1 More Replies
ekdz__
by New Contributor III
  • 6343 Views
  • 5 replies
  • 10 kudos

Is there any way to save the notebook in the "Results Only" view?

Hi! I'm looking for a solution to save a notebook in HTML format that has the "Results Only" view (without the executed code). Is there any possibility to do that?Thank you

  • 6343 Views
  • 5 replies
  • 10 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 10 kudos

Use option "+New dashboard" in the top menu (picture icon). Add results there (use display() in code to show data), and then you can export the dashboard to HTML.

  • 10 kudos
4 More Replies
jonyvp
by New Contributor III
  • 2422 Views
  • 7 replies
  • 4 kudos

Resolved! Databricks Asset Bundles complex variable for cluster configuration substitute

Using this page of the DAB docs, I tried to substitute cluster configuration by a variable. That way, I want to predefine different job cluster configurations. Doing exactly what is used in the docs yields this error:Error: failed to load [...]/speci...

  • 2422 Views
  • 7 replies
  • 4 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 4 kudos

Amazing! Great!

  • 4 kudos
6 More Replies
NC
by New Contributor III
  • 514 Views
  • 1 replies
  • 1 kudos

Logging in Databricks

Hi All,I am trying to create a log using python logging package.Is this allowed in databricks and any sample working code that you can share?Thank you for your guidance.Best regards,NC

  • 514 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @NC ,I recommend below video that guides you how to use logging in databricks:Introduction to Logging and Quality Control in Databricks (youtube.com)

  • 1 kudos
joshbuttler
by New Contributor II
  • 643 Views
  • 1 replies
  • 1 kudos

Seeking Advice on Data Lakehouse Architecture with Databricks

I'm currently designing a data lakehouse architecture using Databricks and have a few questions. What are the best practices for efficiently ingesting both batch and streaming data into Delta Lake? Any recommended tools or approaches?

  • 643 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @joshbuttler,I think the best way is to use auto loader, which  provides a highly efficient way to incrementally process new data, while also guaranteeing each file is processed exactly once.It supports ingestion in a batch mode (Trigger.Available...

  • 1 kudos
Faiyaz17
by New Contributor
  • 1040 Views
  • 1 replies
  • 1 kudos

Best Practices as a Beginner

Hello everyone,I am working on a project where I need to conduct some analysis on a dataset with 1 billion rows. I extracted the Parquet file from Azure and saved it onto the DBFS. Every time I want to run SQL queries and do preprocessing/analysis, I...

  • 1040 Views
  • 1 replies
  • 1 kudos
Latest Reply
jennie258fitz
New Contributor III
  • 1 kudos

@Faiyaz17 wrote:Hello everyone,I am working on a project where I need to conduct some analysis on a dataset with 1 billion rows. I extracted the Parquet file from Azure and saved it onto the DBFS. Every time I want to run SQL queries and do preproces...

  • 1 kudos
jpwp
by New Contributor III
  • 25577 Views
  • 9 replies
  • 9 kudos

Resolved! How to specify entry_point for python_wheel_task?

Can someone provide me an example for a python_wheel_task and what the entry_point field should be?The jobs UI help popup says this about "entry_point":"Function to call when starting the wheel, for example: main. If the entry point does not exist in...

  • 25577 Views
  • 9 replies
  • 9 kudos
Latest Reply
MRMintechGlobal
New Contributor II
  • 9 kudos

Just want to confirm - my project uses PDM not poetryand as such uses[project.entry-points.packages]Rather than[tool.poetry.scripts]and the bundle is failing to run on the cluster - as it can't find the entry point - is this expected behavior?

  • 9 kudos
8 More Replies
Jpeterson
by New Contributor III
  • 3718 Views
  • 7 replies
  • 4 kudos

Databricks SQL Warehouse, Tableau and spark.driver.maxResultSize error

I'm attempting to create a tableau extract on tableau server with a connection to databricks large sql warehouse. The extract process fails due to spark.driver.maxResultSize error.Using a databricks interactive cluster in the data science & engineer...

  • 3718 Views
  • 7 replies
  • 4 kudos
Latest Reply
Gilbertson13
New Contributor II
  • 4 kudos

Despite these hurdles, it's a great experience to work with these tools. For a bit of fun and relaxation, I enjoy playing Wordle Unlimited to unwind after a day of tackling data issues.

  • 4 kudos
6 More Replies
ggsmith
by Contributor
  • 1335 Views
  • 1 replies
  • 2 kudos

Resolved! DLT create_table vs create_streaming_table

What is the difference between the create_table and create_streaming_table functions in dlt?For example, this is how I have created a table that streams data from kafka written as json files to a volume.  @Dlt.table( name="raw_orders", table_...

  • 1335 Views
  • 1 replies
  • 2 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 2 kudos

Hi @ggsmith ,If you check the examples, you will notice that dlt.create_streaming_table is more specialized and you may consider it to be your target.As per documentation:Check this example:https://www.reddit.com/r/databricks/comments/1b9jg3t/dedupin...

  • 2 kudos
Vishalakshi
by New Contributor II
  • 8141 Views
  • 5 replies
  • 0 kudos

Need to automatically rerun the failed jobs in databricks

Hi all, I need to retrigger the failed jobs automatically in data bricks, can you please help me with all the possible ways to make it possible 

  • 8141 Views
  • 5 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @Vishalakshi ,I have responded during the weekend, but it seems the responses were lost.You have here the run object. For example the current criteria is to return only runs where run[state][result_state] == "FAILED" so basically all failed jobs.W...

  • 0 kudos
4 More Replies
cadull
by New Contributor II
  • 1070 Views
  • 2 replies
  • 1 kudos

Permission Issue with IDENTIFIER clause

Hi all,we are parameterizing environment specific catalog names (like `mycatalog_dev` vs. `mycatalog_prd`) in Lakeview dashboard queries like this:SELECT * FROM IDENTIFIER(:catalog_name || '.myschema.mytable')Which works fine in most cases. We have o...

  • 1070 Views
  • 2 replies
  • 1 kudos
Latest Reply
madams
Contributor II
  • 1 kudos

I've had quite a bit of fun with UC and view permissions.  I don't think this is specific to using the IDENTIFIER() function, but I suspect it's related to UC permissions.  What you'll need to ensure:The user or group who owns the view on catalog_b h...

  • 1 kudos
1 More Replies
angel_ba
by New Contributor II
  • 4286 Views
  • 2 replies
  • 2 kudos

File Trigger using azure file share in unity Catalog

Hello, I have got the unity catalog eanbled in my workspace. the file srae manually copied by customers in azure file share(domain joint account, wabs) on adhoc basis. I would like to add a file trigger on the job so that as soon as file arrives in t...

  • 4286 Views
  • 2 replies
  • 2 kudos
Latest Reply
adriennn
Valued Contributor
  • 2 kudos

@Diego33 Kaniz is half-bot half-human, but unfortunately not gracing us with "sorry for the confusion" responses.After a quick search, I thought that maybe there's a possiblity to find use the web terminal and do a manual mount with the bash script t...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels