cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Bency
by New Contributor III
  • 11393 Views
  • 1 replies
  • 2 kudos

Queries with streaming sources must be executed with writeStream.start();

When I try to perform some transformations on a streaming data , I get Queries with streaming sources must be executed with writeStream.start(); error My aim is to do a lookup for every column in each rows in the streaming data . steaming_table=spark...

  • 11393 Views
  • 1 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 2 kudos

Hi @Bency Mathew​ You can use forEachBatch to perform the custom logic on each microbatch. Please refer to below document:https://docs.databricks.com/structured-streaming/foreach.html#perform-streaming-writes-to-arbitrary-data-sinks-with-structured-s...

  • 2 kudos
dceman
by New Contributor
  • 1650 Views
  • 0 replies
  • 0 kudos

Databricks with CloudWatch metrics without Instanceid dimension

I have jobs running on job clusters. And I want to send metrics to the CloudWatch. I set CW agent followed this guide.But issue is that I can't create useful metrics dashboard and alarms because I always have InstanceId dimension, and InstanceId is d...

image
  • 1650 Views
  • 0 replies
  • 0 kudos
477061
by Contributor
  • 2753 Views
  • 3 replies
  • 0 kudos

Resolved! Renamed table cannot be written to or deleted from

I have renamed a table, however on trying to write to it (or delete from it) I get the following error: `java.io.FileNotFoundException: No such file or directory: s3a://.../hive/warehouse/testing.db/renamed_table_name/_delta_log/00000000000000000002....

  • 2753 Views
  • 3 replies
  • 0 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 0 kudos

Hi @477061 Could you please try to test it in DBR 11.1 and see if the issue persists for you?

  • 0 kudos
2 More Replies
Taha_Hussain
by Databricks Employee
  • 2173 Views
  • 2 replies
  • 6 kudos

Register for Databricks Office HoursSeptember 28: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMT Databricks Office Hours connects you directly with exper...

Register for Databricks Office HoursSeptember 28: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMTDatabricks Office Hours connects you directly with experts to answer your Databricks questions.Join us to:• Troubleshoot your technical questions• Learn the ...

  • 2173 Views
  • 2 replies
  • 6 kudos
Latest Reply
Taha_Hussain
Databricks Employee
  • 6 kudos

Cont...Q: Do generated columns in Delta Live Tables include IDENTITY columns?A: My understanding is that generated columns in Delta Live Tables do not contain IDENTITY columns. Here is more on generated columns in DLT.Q: We store raw data for each cu...

  • 6 kudos
1 More Replies
Invincible
by New Contributor
  • 1685 Views
  • 2 replies
  • 2 kudos
  • 1685 Views
  • 2 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 2 kudos

Hi @Pankaj Sharma​ Yes, you can run multiple jobs on one cluster if you choose an all-purpose cluster to run your jobs in Databricks.You can understand more about the clusters in the below document:https://docs.databricks.com/clusters/index.html

  • 2 kudos
1 More Replies
databricksuser2
by New Contributor II
  • 1289 Views
  • 1 replies
  • 2 kudos

Structured streaming job sees throughput being capped after running normally for a few days

The job (written in PySpark) uses azure eventhub as source and use Databricks delta table as sink. The job is hosted in Azure Databricks.Transformation part is simple, the message body is converted from bytes to json string, the json string is then a...

figure 1
  • 1289 Views
  • 1 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 2 kudos

Hi @Databricks User10293847​ You can try using auto-inflate and let the TU increase automatically. The feature then scales automatically to the maximum limit of TUs you need, depending on the increase in your traffic. You can check the below doc: htt...

  • 2 kudos
ef-zee
by New Contributor III
  • 13777 Views
  • 3 replies
  • 7 kudos

How to resolve the error INVALID_PARAMETER_VALUE error in the Delta Live Table pipeline?

I am trying to execute a DLT pipeline, but I am getting an error which says - "INVALID_PARAMETER_VALUE: The field 'node_type_id' cannot be supplied when an instance pool ID is provided."I am using my company's Azure Databricks platform with premium b...

  • 13777 Views
  • 3 replies
  • 7 kudos
Latest Reply
Debayan
Databricks Employee
  • 7 kudos

Do you have cluster ACL enabled?

  • 7 kudos
2 More Replies
Cosimo_F_
by Contributor
  • 2779 Views
  • 3 replies
  • 3 kudos

Resolved! Do Databricks ipywidgets support plotly FigureWidget?

Hello,I'm trying to use plotly's FigureWidget but getting this error:"Error displaying widget: Cannot read properties of undefined (reading 'buffer')"This is the codefrom plotly import graph_objects as gofrom plotly import express as pxfrom plotly im...

  • 2779 Views
  • 3 replies
  • 3 kudos
Latest Reply
Cosimo_F_
Contributor
  • 3 kudos

Thank you for the suggestion! 10.4 does not seem to support ipywidgets but I tried with 11.0 and it works!

  • 3 kudos
2 More Replies
Karthe
by New Contributor III
  • 3612 Views
  • 3 replies
  • 5 kudos

Resolved! Error while installed "tsfresh" python library in databricks

Hi all,I am trying to install "tsfresh" library in databricks. However, I get the following error. Could anyone please help me here. ImportError: cannot import name 'rng_integers' from 'scipy._lib._util' (/databricks/python/lib/python3.7/site-package...

  • 3612 Views
  • 3 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

Hi, you posted it three times. Please kindly delete duplicate posts.Please try to install via compute -> choose your cluster -> librariesI checked that on DBR 11. x it works

  • 5 kudos
2 More Replies
aben1
by New Contributor
  • 1216 Views
  • 0 replies
  • 0 kudos

I have created a piece of python code, which lead to some python error.The job have failed with Internal Error, see below.The message after clicking o...

I have created a piece of python code, which lead to some python error.The job have failed with Internal Error, see below.The message after clicking on it states somewhat miseleading info:Meanwhile the real issue is fortunatelly described in Logs I d...

image image image
  • 1216 Views
  • 0 replies
  • 0 kudos
RohitKulkarni
by Contributor II
  • 4474 Views
  • 5 replies
  • 4 kudos

External table issue

Hello Team,I am using df.write command and the table is getting created. If you refer the below screenshot the table got created in Tables folder in dedicated sql pool. But i required in the External Tables folder. RegardsRK

image image
  • 4474 Views
  • 5 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

if you actually write into Synapse, it is not an external table. the data resides on synapse.If you want to have an external table, write the data on your data lake in parquet/delta lake format and then create an external table on that location in s...

  • 4 kudos
4 More Replies
SujitOjha
by New Contributor
  • 1891 Views
  • 1 replies
  • 1 kudos

What is the way to do DEEP CLONE and copy the checkpoints folder also?

When I use Deep Clone, I dont see the checkpoint folder to be copied?Is there a possibilities to copy the checkpoint folder also, as I have to resume the streaming job to updated location.

  • 1891 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16753725469
Contributor II
  • 1 kudos

Delta clones are recommended for disaster recovery it doesn't exactly replicate table history in the context of specific snapshots, but it does ensure that the changes are replicatedBut we can't use a clone table with the copy of the source checkpoin...

  • 1 kudos
Phani1
by Valued Contributor II
  • 1654 Views
  • 2 replies
  • 5 kudos

Delta table Concurrent Updates for Non-partitioned tables

When we implemented the concurrent updates on a table which do not have a partition column we ran into ConcurrentAppendException [ensured where the condition is different for each concurrent update statement]So do we need to go by partition approach ...

  • 1654 Views
  • 2 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

Please check that both streaming queries don't use the same checkpoint,Auto increment id can also make problems as it is kept in schemaSchema evolution also can make problems

  • 5 kudos
1 More Replies
19582
by New Contributor II
  • 1015 Views
  • 1 replies
  • 2 kudos

Run a simple spark-scala jar (hello-world) on existing running cluster

I have created a simple hello-world jar that I would like to run as a job. I also have an existing cluster. Now when I create a job to run on existing cluster, it fails for some unknown reason (don't see much in the errors), while if I run the same j...

  • 1015 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Can you share screenshot and your example jar?

  • 2 kudos
User16826992666
by Valued Contributor
  • 4113 Views
  • 3 replies
  • 0 kudos

How do I make a parameter in a Databricks SQL dashboard apply to multiple visuals?

I have created a few queries and visualizations in Databricks SQL which use parameters. Each query has the same parameter. But when I pin the visualizations to a dashboard, each of the visuals keeps it's own parameter drop down.I want to have one dro...

  • 4113 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16826992666
Valued Contributor
  • 0 kudos

To achieve this, you can edit the source of the parameters on each of the visuals on the dashboard. The source for each visual can be changed to read from a shared dashboard parameter. These are the steps to do this- 1.) First click on the three dots...

  • 0 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels