cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

NickMendes
by New Contributor III
  • 1985 Views
  • 3 replies
  • 1 kudos

Resolved! Databricks SQL duplicates alert e-mail

Hi everyone, I've been working in DBC SQL and creating few e-mails alerts and have noticed that when it triggers, e-mail notification is getting duplicated. I've been trying lots of testing in different situations, however, it keeps duplicating in my...

  • 1985 Views
  • 3 replies
  • 1 kudos
Latest Reply
NickMendes
New Contributor III
  • 1 kudos

After lots of testing, I've finally figured out one solution. I've changed "Notifications" settings to "When triggered, send notification At most every 1 day" and "Refresh" to "Refresh every 1 day". Now it is working perfectly.

  • 1 kudos
2 More Replies
harsha4u
by New Contributor II
  • 790 Views
  • 1 replies
  • 2 kudos

Any suggestions around automating sizing of clusters and best practices around it? Other than enabling auto scaling, are there any other practices aro...

Any suggestions around automating sizing of clusters and best practices around it? Other than enabling auto scaling, are there any other practices around creating a right size driver/worker nodes?

  • 790 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16766737456
New Contributor III
  • 2 kudos

Autoscaling should help in sizing the clusters according to the workload. You may want to consider the recommendations here: https://docs.databricks.com/clusters/cluster-config-best-practices.html#cluster-sizing-considerations

  • 2 kudos
Valentin1
by New Contributor III
  • 1372 Views
  • 3 replies
  • 0 kudos

docs.databricks.com

Is there an any example that contains at least one widget that works with the Databricks SQL Create Dashboard API? I tried the following simple dashboard:{ "name": "Delta View", "dashboard_filters_enabled": false, "widgets": [ { ...

  • 1372 Views
  • 3 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

@Valentin Rosca​ , Right now, Databricks also does not recommend creating new widgets via queries and dashboards API (https://docs.databricks.com/sql/api/queries-dashboards.html#operation/sql-analytics-create-dashboard).  Also, copying a dashboard fr...

  • 0 kudos
2 More Replies
Isaac_Low
by New Contributor II
  • 1384 Views
  • 3 replies
  • 3 kudos
  • 1384 Views
  • 3 replies
  • 3 kudos
Latest Reply
Isaac_Low
New Contributor II
  • 3 kudos

All good. I just imported the training material manually using the dbc link. Didn't need repos for that.

  • 3 kudos
2 More Replies
davidvb
by New Contributor II
  • 1965 Views
  • 2 replies
  • 1 kudos

I have a big problem creating a community account

It is impossible for me create a community account. I put my data on web and in the next step, when the website show me the 3 type of data ( google, amazn etc) and I click on the “ "Get started with community account" the web show me this  I have try...

problem
  • 1965 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @david vazquez​,It seems like the website was down due to maintenance. You can check the status page next time to check why the website is down https://status.databricks.com/

  • 1 kudos
1 More Replies
darshan
by New Contributor III
  • 1304 Views
  • 3 replies
  • 1 kudos

job init takes longer than notebook run

I am trying to understand why running a job takes longer than running the notebook manually.And if I try to run jobs concurrently using workflow or threads then is there a way to reduce job init time ?

  • 1304 Views
  • 3 replies
  • 1 kudos
Latest Reply
Vivian_Wilfred
Honored Contributor
  • 1 kudos

Hi @darshan doshi​ , Jobs creates a job cluster in the backend before it starts the task execution and this cluster creation may take extra time when compared to running a notebook on a existing cluster.1) If you run a multi-task job, you could selec...

  • 1 kudos
2 More Replies
karthik_p
by Esteemed Contributor
  • 2417 Views
  • 4 replies
  • 9 kudos

Resolved! Unable to create Databricks workspace using Terraform on AWS

HI Team,we are using below workspace config scripts, when we try to create workspace previously from EC2 Instance, we are able to create Workspace without any issue. but when we are trying to run through Github actions, we are getting below errorErro...

  • 2417 Views
  • 4 replies
  • 9 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 9 kudos

@karthik p​ this can be fixed by setting timeout. Please check this https://kb.databricks.com/en_US/cloud/failed-credential-validation-checks-error-with-terraform

  • 9 kudos
3 More Replies
MrsBaker
by New Contributor II
  • 1112 Views
  • 1 replies
  • 1 kudos

display() not updating after 1000 rows

Hello folks! I am calling display() on a streaming query sourced from a delta table. The output from display() displays the new rows added to the source table. But as soon as the output results hit 1000 rows, the output is not updated anymore. As a r...

  • 1112 Views
  • 1 replies
  • 1 kudos
Latest Reply
MrsBaker
New Contributor II
  • 1 kudos

aggregate function followed by timestamp field sorted in descending order did the trick:streaming_df.groupBy("field1", "time_field").max("field2").orderBy(col("time_field").desc()).display()

  • 1 kudos
KateK
by New Contributor II
  • 1669 Views
  • 3 replies
  • 1 kudos

How do you correctly access the spark context in DLT pipelines?

I have some code that uses RDDs, and the sc.parallelize() and rdd.toDF() methods to get a dataframe back out. The code works in a regular notebook (and if I run the notebook as a job) but fails if I do the same thing in a DLT pipeline. The error mess...

  • 1669 Views
  • 3 replies
  • 1 kudos
Latest Reply
KateK
New Contributor II
  • 1 kudos

Thanks for your help Alex, I ended up re-writing my code with spark UDFs -- maybe there is a better solution with only the Dataframe API but I couldn't find it. To summarize my problem: I was trying to un-nest a large json blob (the fake data in my f...

  • 1 kudos
2 More Replies
palak231
by New Contributor
  • 719 Views
  • 0 replies
  • 0 kudos

A/B Testing:-  A/B testing is the process of comparing two variations as well as two versions of same item and offer the best one in between both of t...

A/B Testing:-  A/B testing is the process of comparing two variations as well as two versions of same item and offer the best one in between both of them. Well before doing A/B testing you need to be focus on one problem that you want to resolve and ...

  • 719 Views
  • 0 replies
  • 0 kudos
Anonymous
by Not applicable
  • 714 Views
  • 1 replies
  • 4 kudos

Happy August! �� On August 25th we are hosting another Community Social - we're doing these monthly ! We want to make sure that we all have...

Happy August! On August 25th we are hosting another Community Social - we're doing these monthly ! We want to make sure that we all have the chance to connect as a community often. Come network, talk data, and just get social! Join us for our August ...

  • 714 Views
  • 1 replies
  • 4 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 4 kudos

Wow! Super Exciting.

  • 4 kudos
Rajendra
by New Contributor II
  • 1072 Views
  • 0 replies
  • 2 kudos

Does databricks support writing the data in Iceberg format?

As I understand databricks supports conversion from Iceberg format to Delta using the command belowCONVERT TO DELTA iceberg.`abfss://container-name@storage-account-name.dfs.core.windows.net/path/to/table`; -- uses Iceberg manifest for metadataHowever...

  • 1072 Views
  • 0 replies
  • 2 kudos
jakubk
by Contributor
  • 722 Views
  • 0 replies
  • 0 kudos

databricks spark sql Custom table valued function + struct really slow (minutes for a single row)

I'm using azure databricksI have a custom table valued function which takes a URL as a parameter and outputs a single row table with certain elements from the URL extracted/labelled(i get search activity URLs and when in a specific format I can retri...

  • 722 Views
  • 0 replies
  • 0 kudos
sage5616
by Valued Contributor
  • 13648 Views
  • 3 replies
  • 2 kudos

Resolved! Choosing the optimal cluster size/specs.

Hello everyone,I am trying to determine the appropriate cluster specifications/sizing for my workload:Run a PySpark task to transform a batch of input avro files to parquet files and create or re-create persistent views on these parquet files. This t...

  • 13648 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

If the data is 100MB, then I'd try a single node cluster, which will be the smallest and least expensive. You'll have more than enough memory to store it all. You can automate this and use a jobs cluster.

  • 2 kudos
2 More Replies
Ross
by New Contributor II
  • 2806 Views
  • 4 replies
  • 0 kudos

Failed to install cluster scoped SparkR library

Attempting to install SparkR to the cluster and successfully installed other packages such as tidyverse via CRAN. The error is copied below, any help you can provide is greatly appreciated!Databricks runtime 10.4 LTSLibrary installation attempted on ...

  • 2806 Views
  • 4 replies
  • 0 kudos
Latest Reply
Vivian_Wilfred
Honored Contributor
  • 0 kudos

Hi @Ross Hamilton​ ,I believe SparkR comes inbuilt with Databricks RStudio and you don't have to install it explicitly. You can directly import it with library(SparkR) and it works for you from your above comment.The error message you see could be re...

  • 0 kudos
3 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels