Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hello,I have a streaming dataset -I used delta live tables-, and I want to create a live dashboard that shows the changes instantly without the need to query the table every specific time -without the need to refresh-, What would be the best solution...
I'm currently trying to attach more than one maven artifact to my terraform configuration of a cluster.How can we add more than one artifact in my terraform configuration ?
Hi @KunalGaurav,This can be done by using a dynamic configuration block inside your databricks_cluster resource definition.In variable.tf make a library block as:-variable "listOfMavenPackages" {
type = list(string)
default = [ "com.google.gua...
this is located in the azure portal (I hope you have access to it), in your Databricks Workspace settings.there you have 'virtual network' and 'private subnet name'.If you click on these, you get the address range (in CIDR notation, you can do a web ...
I am trying to access a delta share table which has a field of datatype interval day to second below.Sample data in the table:- The above table while accessing through delta share giving error as below.Any help in resolving this issue will be appreci...
If there is no data abnormality in redshift connecting to spark from shared in databricks, and the data suddenly decreases, what cause should I check? Also, is there any way to check the variables in widget or code on each execution?
Hello everyone,So I am having an issue when running "ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS". The way I understand it this should update the min/max value for a column when you run it for all or one column. One way to verify it from what I ...
Hello @vlado101
The ANALYZE TABLE COMPUTE STATISTICS FOR ALL COLUMNS command in Databricks is used to compute statistics for all columns of a table. This information is persisted in the metastore and helps the query optimizer make decisions such as ...
Utilizing structured streaming to read the change data feed from your Delta table empowers you to execute incremental streaming aggregations, such as counting and summing.
I have an SFTP server I need to routinely download Excel files from and put into GCP cloud storage buckets.Every variation of the filepath to either my GCP path or just the dbfs in-built file system is giving an error of " [Errno 2] No such file or d...
Hi everybody,sharing data with an access token and Databricks Connector works fine in Power BI (desktop). Now we wanted to switch to Delta Sharing.We setup a delta share to distribute data via open share to anyone outside our organization. Unity Cata...
Hi everybody,for anybody running into the same issue.It is a bug in the current Power Bi version (2.121.644.0). I reverted back to the April release (2.116.404.0), which does work as expected.
I'm creating a new job in databricks using the databricks-cli:databricks jobs create --json-file ./deploy/databricks/config/job.config.jsonWith the following json:{
"name": "Job Name",
"new_cluster": {
"spark_version": "4.1.x-scala2.1...
This is an old post but still relevant for future readers, so will answer how it is done. You need to add base_parameters flag in the notebook_task config, like the following.
"notebook_task": {
"notebook_path": "...",
"base_parameters": {
...
Using the 9.1ML cluster atm but also tried the 7.3 and 8.1.Databricks is deployed on google platform and I was using the trial.It is quite difficult to debug if the spark ui is only semi accessible.Part of the results in raw html are visible but all ...
Hello everyone, I tried to change a Databricks Runtime Cluster from 12.2 LTS ML to 13.3 LTS ML, however I got this error: Failed to add 1 container to the compute. Will attempt retry: false. Reason: Global init script failureGlobal init script Instal...
Do Databrick Asset Bundles support run_job_task tasks?I've made various attempts to add a run_job_task with a specified job_id. See my the code_snippet below. I tried substituting the job_id using ${...} syntax, as well as three other ways which I've...
Ah, I see it is a known bug in the Databricks CLI: Asset bundle run_job_task fails · Issue #812 · databricks/cli (github.com). Anyone facing this issue should comment on and keep an eye on that ticket for resolution.
Does anybody know any in-notebook or JAR code to pull cluster tags from the runtime environment? Something like... dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().apply('user')but for the cluster name?
Did you find any documentation for spark.conf.get properties? I am trying to get some metadata about the environment my notebook is running in (specifically cluster custom tags)? But cannot find any information beside a couple of forum posts.
I have a process that should run the same notebook with varying parameters, thus translating to a job with queue and concurrency enabled. When the first executions are triggered the Jobs Runs work as expected, i.e. if the job has a max concurrency se...
Hi @Retired_mod, we double-checked everything, the resources are enough and all settings are properly set. I'll reach out the support by filing a new ticket. Thank you for your help.