cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

databricksuser2
by New Contributor II
  • 1288 Views
  • 1 replies
  • 2 kudos

Structured streaming job sees throughput being capped after running normally for a few days

The job (written in PySpark) uses azure eventhub as source and use Databricks delta table as sink. The job is hosted in Azure Databricks.Transformation part is simple, the message body is converted from bytes to json string, the json string is then a...

figure 1
  • 1288 Views
  • 1 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 2 kudos

Hi @Databricks User10293847​ You can try using auto-inflate and let the TU increase automatically. The feature then scales automatically to the maximum limit of TUs you need, depending on the increase in your traffic. You can check the below doc: htt...

  • 2 kudos
ef-zee
by New Contributor III
  • 13682 Views
  • 3 replies
  • 7 kudos

How to resolve the error INVALID_PARAMETER_VALUE error in the Delta Live Table pipeline?

I am trying to execute a DLT pipeline, but I am getting an error which says - "INVALID_PARAMETER_VALUE: The field 'node_type_id' cannot be supplied when an instance pool ID is provided."I am using my company's Azure Databricks platform with premium b...

  • 13682 Views
  • 3 replies
  • 7 kudos
Latest Reply
Debayan
Databricks Employee
  • 7 kudos

Do you have cluster ACL enabled?

  • 7 kudos
2 More Replies
Cosimo_F_
by Contributor
  • 2767 Views
  • 3 replies
  • 3 kudos

Resolved! Do Databricks ipywidgets support plotly FigureWidget?

Hello,I'm trying to use plotly's FigureWidget but getting this error:"Error displaying widget: Cannot read properties of undefined (reading 'buffer')"This is the codefrom plotly import graph_objects as gofrom plotly import express as pxfrom plotly im...

  • 2767 Views
  • 3 replies
  • 3 kudos
Latest Reply
Cosimo_F_
Contributor
  • 3 kudos

Thank you for the suggestion! 10.4 does not seem to support ipywidgets but I tried with 11.0 and it works!

  • 3 kudos
2 More Replies
Karthe
by New Contributor III
  • 3608 Views
  • 3 replies
  • 5 kudos

Resolved! Error while installed "tsfresh" python library in databricks

Hi all,I am trying to install "tsfresh" library in databricks. However, I get the following error. Could anyone please help me here. ImportError: cannot import name 'rng_integers' from 'scipy._lib._util' (/databricks/python/lib/python3.7/site-package...

  • 3608 Views
  • 3 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

Hi, you posted it three times. Please kindly delete duplicate posts.Please try to install via compute -> choose your cluster -> librariesI checked that on DBR 11. x it works

  • 5 kudos
2 More Replies
aben1
by New Contributor
  • 1214 Views
  • 0 replies
  • 0 kudos

I have created a piece of python code, which lead to some python error.The job have failed with Internal Error, see below.The message after clicking o...

I have created a piece of python code, which lead to some python error.The job have failed with Internal Error, see below.The message after clicking on it states somewhat miseleading info:Meanwhile the real issue is fortunatelly described in Logs I d...

image image image
  • 1214 Views
  • 0 replies
  • 0 kudos
RohitKulkarni
by Contributor II
  • 4465 Views
  • 5 replies
  • 4 kudos

External table issue

Hello Team,I am using df.write command and the table is getting created. If you refer the below screenshot the table got created in Tables folder in dedicated sql pool. But i required in the External Tables folder. RegardsRK

image image
  • 4465 Views
  • 5 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

if you actually write into Synapse, it is not an external table. the data resides on synapse.If you want to have an external table, write the data on your data lake in parquet/delta lake format and then create an external table on that location in s...

  • 4 kudos
4 More Replies
SujitOjha
by New Contributor
  • 1882 Views
  • 1 replies
  • 1 kudos

What is the way to do DEEP CLONE and copy the checkpoints folder also?

When I use Deep Clone, I dont see the checkpoint folder to be copied?Is there a possibilities to copy the checkpoint folder also, as I have to resume the streaming job to updated location.

  • 1882 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16753725469
Contributor II
  • 1 kudos

Delta clones are recommended for disaster recovery it doesn't exactly replicate table history in the context of specific snapshots, but it does ensure that the changes are replicatedBut we can't use a clone table with the copy of the source checkpoin...

  • 1 kudos
Phani1
by Valued Contributor II
  • 1649 Views
  • 2 replies
  • 5 kudos

Delta table Concurrent Updates for Non-partitioned tables

When we implemented the concurrent updates on a table which do not have a partition column we ran into ConcurrentAppendException [ensured where the condition is different for each concurrent update statement]So do we need to go by partition approach ...

  • 1649 Views
  • 2 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

Please check that both streaming queries don't use the same checkpoint,Auto increment id can also make problems as it is kept in schemaSchema evolution also can make problems

  • 5 kudos
1 More Replies
19582
by New Contributor II
  • 1015 Views
  • 1 replies
  • 2 kudos

Run a simple spark-scala jar (hello-world) on existing running cluster

I have created a simple hello-world jar that I would like to run as a job. I also have an existing cluster. Now when I create a job to run on existing cluster, it fails for some unknown reason (don't see much in the errors), while if I run the same j...

  • 1015 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Can you share screenshot and your example jar?

  • 2 kudos
User16826992666
by Valued Contributor
  • 4100 Views
  • 3 replies
  • 0 kudos

How do I make a parameter in a Databricks SQL dashboard apply to multiple visuals?

I have created a few queries and visualizations in Databricks SQL which use parameters. Each query has the same parameter. But when I pin the visualizations to a dashboard, each of the visuals keeps it's own parameter drop down.I want to have one dro...

  • 4100 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16826992666
Valued Contributor
  • 0 kudos

To achieve this, you can edit the source of the parameters on each of the visuals on the dashboard. The source for each visual can be changed to read from a shared dashboard parameter. These are the steps to do this- 1.) First click on the three dots...

  • 0 kudos
2 More Replies
Panna
by New Contributor II
  • 1784 Views
  • 1 replies
  • 3 kudos

Is there only one element type option for an array?

I'm creating an array which contains both string and double, just wondering if I can have multiple element type options for one array column? Thanks

  • 1784 Views
  • 1 replies
  • 3 kudos
Latest Reply
Debayan
Databricks Employee
  • 3 kudos

Elements of any type that share a least common type can be used, https://docs.databricks.com/sql/language-manual/functions/array.html#arguments.Please correct me if I misunderstood to understand the requirement.

  • 3 kudos
kfoster
by Contributor
  • 3475 Views
  • 4 replies
  • 0 kudos

Resolved! DLT Pipeline Retries

Is there a way to limit how many retries DLT Pipelines run when in "Production" mode? What is the key value pair I use in the configuration?

  • 3475 Views
  • 4 replies
  • 0 kudos
Latest Reply
Cedric
Databricks Employee
  • 0 kudos

Hi @Kristian Foster​,Yes. We have two spark configuration that can be set. pipelines.numStreamRetryAttempts and pipelines.numUpdateRetryAttempts. The former configures how many times we retry each flow before failing the update. The latter configures...

  • 0 kudos
3 More Replies
jowziph
by New Contributor III
  • 4243 Views
  • 2 replies
  • 7 kudos

How can I fix strange magic command behaviour in Databricks notebooks?

Hi,I have encountered a strange issue when using magic commands on my Azure Databricks workspace.When I create a cell and type the magic command for a language which is not default in the notebook, the code is instead interpreted as the default noteb...

magic command manually typed magic command manually typed 2 image
  • 4243 Views
  • 2 replies
  • 7 kudos
Latest Reply
jowziph
New Contributor III
  • 7 kudos

Hi,Yeah this is happening when magic command is used in between two cells of the default language. But this is not the only case.This is happening when magic command is manually typed out (e.g. %sql or %scala in a python notebook). OR when magic comm...

  • 7 kudos
1 More Replies
Gim
by Contributor
  • 10233 Views
  • 7 replies
  • 10 kudos

Resolved! Delta Table storage best practices

Hi!We have a project where we do some Data Engineering for a client. I implemented a scheduled batch processing of Databricks' autoloader (stream w/ availableNow) since they primarily have numerous file exports from several sources. We wanted to foll...

  • 10233 Views
  • 7 replies
  • 10 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 10 kudos

Hi @Gimwell Young​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 10 kudos
6 More Replies
kpendergast
by Contributor
  • 4732 Views
  • 6 replies
  • 4 kudos

Resolved! Hyperleaup to push data to Tableau Server

Would any care to share how they got the Hyperleaup library working I am currently stuck at an error at publish and cannot seem to find a solution. TypeError: publish() got an unexpected keyword argument 'file_path'I am %pip installing all the requir...

  • 4732 Views
  • 6 replies
  • 4 kudos
Latest Reply
badnishant79
New Contributor II
  • 4 kudos

Hi. Yes dashboard includes multiple filters but only uploaded dashboard on server without any other sheets. I am looking into the extract that other users have suggested. Thanks.

  • 4 kudos
5 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels