cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jwilliam
by Contributor
  • 2337 Views
  • 2 replies
  • 2 kudos

Resolved! Does libraries installation happen on Data Plane or Control Plane?

Currently, when I install libraries on my clusters. This errors happens:WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol...

  • 2337 Views
  • 2 replies
  • 2 kudos
Latest Reply
Sivaprasad1
Valued Contributor II
  • 2 kudos

@John William​ : Yeah that's true. All the clusters will be residing in the data plane.

  • 2 kudos
1 More Replies
Bency
by New Contributor III
  • 13132 Views
  • 1 replies
  • 2 kudos

Queries with streaming sources must be executed with writeStream.start();

When I try to perform some transformations on a streaming data , I get Queries with streaming sources must be executed with writeStream.start(); error My aim is to do a lookup for every column in each rows in the streaming data . steaming_table=spark...

  • 13132 Views
  • 1 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 2 kudos

Hi @Bency Mathew​ You can use forEachBatch to perform the custom logic on each microbatch. Please refer to below document:https://docs.databricks.com/structured-streaming/foreach.html#perform-streaming-writes-to-arbitrary-data-sinks-with-structured-s...

  • 2 kudos
dceman
by New Contributor
  • 2053 Views
  • 0 replies
  • 0 kudos

Databricks with CloudWatch metrics without Instanceid dimension

I have jobs running on job clusters. And I want to send metrics to the CloudWatch. I set CW agent followed this guide.But issue is that I can't create useful metrics dashboard and alarms because I always have InstanceId dimension, and InstanceId is d...

image
  • 2053 Views
  • 0 replies
  • 0 kudos
477061
by Contributor
  • 3347 Views
  • 3 replies
  • 0 kudos

Resolved! Renamed table cannot be written to or deleted from

I have renamed a table, however on trying to write to it (or delete from it) I get the following error: `java.io.FileNotFoundException: No such file or directory: s3a://.../hive/warehouse/testing.db/renamed_table_name/_delta_log/00000000000000000002....

  • 3347 Views
  • 3 replies
  • 0 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 0 kudos

Hi @477061 Could you please try to test it in DBR 11.1 and see if the issue persists for you?

  • 0 kudos
2 More Replies
Taha_Hussain
by Databricks Employee
  • 2476 Views
  • 2 replies
  • 6 kudos

Register for Databricks Office HoursSeptember 28: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMT Databricks Office Hours connects you directly with exper...

Register for Databricks Office HoursSeptember 28: 11:00 AM - 12:00 PM PT | 6:00 - 7:00 PM GMTDatabricks Office Hours connects you directly with experts to answer your Databricks questions.Join us to:• Troubleshoot your technical questions• Learn the ...

  • 2476 Views
  • 2 replies
  • 6 kudos
Latest Reply
Taha_Hussain
Databricks Employee
  • 6 kudos

Cont...Q: Do generated columns in Delta Live Tables include IDENTITY columns?A: My understanding is that generated columns in Delta Live Tables do not contain IDENTITY columns. Here is more on generated columns in DLT.Q: We store raw data for each cu...

  • 6 kudos
1 More Replies
Invincible
by New Contributor
  • 2107 Views
  • 2 replies
  • 2 kudos
  • 2107 Views
  • 2 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 2 kudos

Hi @Pankaj Sharma​ Yes, you can run multiple jobs on one cluster if you choose an all-purpose cluster to run your jobs in Databricks.You can understand more about the clusters in the below document:https://docs.databricks.com/clusters/index.html

  • 2 kudos
1 More Replies
databricksuser2
by New Contributor II
  • 1571 Views
  • 1 replies
  • 2 kudos

Structured streaming job sees throughput being capped after running normally for a few days

The job (written in PySpark) uses azure eventhub as source and use Databricks delta table as sink. The job is hosted in Azure Databricks.Transformation part is simple, the message body is converted from bytes to json string, the json string is then a...

figure 1
  • 1571 Views
  • 1 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Databricks Employee
  • 2 kudos

Hi @Databricks User10293847​ You can try using auto-inflate and let the TU increase automatically. The feature then scales automatically to the maximum limit of TUs you need, depending on the increase in your traffic. You can check the below doc: htt...

  • 2 kudos
ef-zee
by New Contributor III
  • 16355 Views
  • 3 replies
  • 7 kudos

How to resolve the error INVALID_PARAMETER_VALUE error in the Delta Live Table pipeline?

I am trying to execute a DLT pipeline, but I am getting an error which says - "INVALID_PARAMETER_VALUE: The field 'node_type_id' cannot be supplied when an instance pool ID is provided."I am using my company's Azure Databricks platform with premium b...

  • 16355 Views
  • 3 replies
  • 7 kudos
Latest Reply
Debayan
Databricks Employee
  • 7 kudos

Do you have cluster ACL enabled?

  • 7 kudos
2 More Replies
Cosimo_F_
by Contributor
  • 4292 Views
  • 3 replies
  • 3 kudos

Resolved! Do Databricks ipywidgets support plotly FigureWidget?

Hello,I'm trying to use plotly's FigureWidget but getting this error:"Error displaying widget: Cannot read properties of undefined (reading 'buffer')"This is the codefrom plotly import graph_objects as gofrom plotly import express as pxfrom plotly im...

  • 4292 Views
  • 3 replies
  • 3 kudos
Latest Reply
Cosimo_F_
Contributor
  • 3 kudos

Thank you for the suggestion! 10.4 does not seem to support ipywidgets but I tried with 11.0 and it works!

  • 3 kudos
2 More Replies
Karthe
by New Contributor III
  • 4178 Views
  • 3 replies
  • 5 kudos

Resolved! Error while installed "tsfresh" python library in databricks

Hi all,I am trying to install "tsfresh" library in databricks. However, I get the following error. Could anyone please help me here. ImportError: cannot import name 'rng_integers' from 'scipy._lib._util' (/databricks/python/lib/python3.7/site-package...

  • 4178 Views
  • 3 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

Hi, you posted it three times. Please kindly delete duplicate posts.Please try to install via compute -> choose your cluster -> librariesI checked that on DBR 11. x it works

  • 5 kudos
2 More Replies
aben1
by New Contributor
  • 1556 Views
  • 0 replies
  • 0 kudos

I have created a piece of python code, which lead to some python error.The job have failed with Internal Error, see below.The message after clicking o...

I have created a piece of python code, which lead to some python error.The job have failed with Internal Error, see below.The message after clicking on it states somewhat miseleading info:Meanwhile the real issue is fortunatelly described in Logs I d...

image image image
  • 1556 Views
  • 0 replies
  • 0 kudos
RohitKulkarni
by Contributor II
  • 6122 Views
  • 5 replies
  • 4 kudos

External table issue

Hello Team,I am using df.write command and the table is getting created. If you refer the below screenshot the table got created in Tables folder in dedicated sql pool. But i required in the External Tables folder. RegardsRK

image image
  • 6122 Views
  • 5 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

if you actually write into Synapse, it is not an external table. the data resides on synapse.If you want to have an external table, write the data on your data lake in parquet/delta lake format and then create an external table on that location in s...

  • 4 kudos
4 More Replies
SujitOjha
by New Contributor
  • 2292 Views
  • 1 replies
  • 1 kudos

What is the way to do DEEP CLONE and copy the checkpoints folder also?

When I use Deep Clone, I dont see the checkpoint folder to be copied?Is there a possibilities to copy the checkpoint folder also, as I have to resume the streaming job to updated location.

  • 2292 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16753725469
Contributor II
  • 1 kudos

Delta clones are recommended for disaster recovery it doesn't exactly replicate table history in the context of specific snapshots, but it does ensure that the changes are replicatedBut we can't use a clone table with the copy of the source checkpoin...

  • 1 kudos
Phani1
by Valued Contributor II
  • 2061 Views
  • 2 replies
  • 5 kudos

Delta table Concurrent Updates for Non-partitioned tables

When we implemented the concurrent updates on a table which do not have a partition column we ran into ConcurrentAppendException [ensured where the condition is different for each concurrent update statement]So do we need to go by partition approach ...

  • 2061 Views
  • 2 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

Please check that both streaming queries don't use the same checkpoint,Auto increment id can also make problems as it is kept in schemaSchema evolution also can make problems

  • 5 kudos
1 More Replies
19582
by New Contributor II
  • 1183 Views
  • 1 replies
  • 2 kudos

Run a simple spark-scala jar (hello-world) on existing running cluster

I have created a simple hello-world jar that I would like to run as a job. I also have an existing cluster. Now when I create a job to run on existing cluster, it fails for some unknown reason (don't see much in the errors), while if I run the same j...

  • 1183 Views
  • 1 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Can you share screenshot and your example jar?

  • 2 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels