cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MauricioS
by Databricks Partner
  • 2202 Views
  • 3 replies
  • 2 kudos

Delta Live Tables - Dynamic Target Schema

Hi all,I have a requirement where I need to migrate a few jobs from standard databricks notebooks that are orchestrated by Azure Data Factory to DLT Pipelines, pretty straight forward so far. The tricky part is that the data tables in the catalog are...

image.png
  • 2202 Views
  • 3 replies
  • 2 kudos
Latest Reply
fmadeiro
Contributor II
  • 2 kudos

@MauricioS Great question!Databricks Delta Live Tables (DLT) pipelines are very flexible, but by default, the target schema specified in the pipeline configuration (such as target or schema) is fixed. That said, you can implement strategies to enable...

  • 2 kudos
2 More Replies
Jfoxyyc
by Valued Contributor
  • 7720 Views
  • 6 replies
  • 2 kudos

Is there a way to catch the cancel button or the interrupt button in a Databricks notebook?

I'm running oracledb package and it uses sessions. When you cancel a running query it doesn't close the session even if you have a try catch block because a cancel or interrupt issues a kill command on the process. Is there a method to catch the canc...

  • 7720 Views
  • 6 replies
  • 2 kudos
Latest Reply
gustavo_woiler
New Contributor II
  • 2 kudos

I was having the same issue and I think I was finally able to solve it!When you simply except and capture the KeyboardInterrupt signal and do not raise it, the notebook gets into an endless cycle of "interrupting..." and never does anything.However, ...

  • 2 kudos
5 More Replies
pranitha
by New Contributor II
  • 1222 Views
  • 3 replies
  • 0 kudos

instance_id in compute.node_timelines

I am trying to fetch active worker nodes from system tables using the code like below:select count(distinct instance_id)from system.compute.node_timelines where cluster_id = "xx"groupy by instance_id,start_time,end_timesIt gives an output like 20 but...

  • 1222 Views
  • 3 replies
  • 0 kudos
Latest Reply
pranitha
New Contributor II
  • 0 kudos

Hi @Alberto_Umana , Thanks for replying.Even if we add the driver node it should be around 16-17 right, not like 20. I checked for al the clusters, for every cluster there is a difference of 5-7 nodes between max_worker count and count(distinct insta...

  • 0 kudos
2 More Replies
TejeshS
by Contributor
  • 1610 Views
  • 3 replies
  • 0 kudos

Event based Alert based on certain events from System Audit tables

We need to implement an event-based trigger system that can detect any manual intervention performed by users. Upon detection of such an event, the system should automatically send a warning email. The events can be generated through DLT or other pro...

  • 1610 Views
  • 3 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Unfortunately the system events are only tracked via the system table, only option to have more recent data will be to re execute the query each time is needed.

  • 0 kudos
2 More Replies
thedatacrew
by Databricks Partner
  • 2897 Views
  • 6 replies
  • 0 kudos

Delta Live Tables - skipChangeCommits in SQL

Hi,Could anyone tell me if the skipChangeCommits option is supported in SQL mode? I can use it successfully using Python, but it doesn't look like it is supported by SQL.It seems to be a glaring omission from the SQL support, or support for this will...

thedatacrew_0-1736866714336.png
  • 2897 Views
  • 6 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Unfortunately there is no ETA on this yet. If I know about it, will let you know!

  • 0 kudos
5 More Replies
kyrrewk
by New Contributor II
  • 1172 Views
  • 3 replies
  • 0 kudos

Monitor progress when using databricks-connect

When using databricks-connect how can you monitor the progress? Ideally, we want something similar to what you get in the Databricks notebook, i.e., information about the jobs/stages. We are using Python.

  • 1172 Views
  • 3 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

I would suggest you to submit this as a feature request through https://docs.databricks.com/en/resources/ideas.html#ideas 

  • 0 kudos
2 More Replies
matthiasn
by Databricks Partner
  • 3629 Views
  • 6 replies
  • 0 kudos

Resolved! Use temporary table credentials to access data in Databricks

Hi everybody,I tested the temporary table credentials API. I works great, as long as I use the credentials outside of Databricks (e.g. in a local duckdb instance).But as soon as I try to use the short living credentials (Azure SAS for me) in Databric...

  • 3629 Views
  • 6 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Hello Matthias, many thanks for sharing this valuable information, it is great to hear your issue got resolved.

  • 0 kudos
5 More Replies
Hubert-Dudek
by Databricks MVP
  • 5244 Views
  • 2 replies
  • 3 kudos

Bridging the SQL-Python Gap

Python often edges out SQL with its metaprogramming capabilities. However, dbt bridges this gap with Jinja templates. Introducing simple "for" loops, especially for parameter iteration, in databricks SQL could significantly enhance the user experienc...

ezgif-5-0f8c73663b.gif
  • 5244 Views
  • 2 replies
  • 3 kudos
Latest Reply
Greg_c
New Contributor II
  • 3 kudos

Was this solved @Rajeev45  Do you have any docs?

  • 3 kudos
1 More Replies
sahasimran98
by New Contributor II
  • 1907 Views
  • 3 replies
  • 0 kudos

Data Volume Read/Processed for a Databricks Workflow Job

Hello All, I have a DBx instance hosted on Azure and I am using the Diagnostic Settings to collect Databricks Jobs related logs in log analytics workspace. So far, from the DatabricksJobs table in Azure Loganalytics, I am able to fetch basic job rela...

  • 1907 Views
  • 3 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor III
  • 0 kudos

Hi @sahasimran98 I think you're right this is more valid for synapse where such configuration exist but you can still give a try for databricks and let us know here the results. otherwise try to find some spark-monitoring package in github for databr...

  • 0 kudos
2 More Replies
KosmaS
by New Contributor III
  • 3864 Views
  • 4 replies
  • 1 kudos

Skewness / Salting with countDistinct

Hey Everyone,I experience data skewness for: df = (source_df .unionByName(source_df.withColumn("region", lit("Country"))) .groupBy("zip_code", "region", "device_type") .agg(countDistinct("device_id").alias("total_active_unique"), count("device_id").a...

Screenshot 2024-08-05 at 17.24.08.png
  • 3864 Views
  • 4 replies
  • 1 kudos
Latest Reply
Avinash_Narala
Databricks Partner
  • 1 kudos

you can make use of databricks native feature "Liquid Clustering", cluster by the columns which you want to use in grouping statements, it will handle the performance issue due to data skewness .For more information, please do visit :https://docs.dat...

  • 1 kudos
3 More Replies
garciargs
by New Contributor III
  • 1956 Views
  • 2 replies
  • 2 kudos

Resolved! Incremental load from two tables

Hi, I am looking to build a ETL process for a incremental load silver table.This silver table, lets say "contracts_silver", is built by joining two bronze tables, "contracts_raw" and "customer".contracts_silverCONTRACT_IDSTATUSCUSTOMER_NAME1SIGNEDPet...

  • 1956 Views
  • 2 replies
  • 2 kudos
Latest Reply
garciargs
New Contributor III
  • 2 kudos

Hi @hari-prasad ,Thank you! Will give it a try.Regards!

  • 2 kudos
1 More Replies
ashraf1395
by Honored Contributor
  • 1059 Views
  • 1 replies
  • 1 kudos

Solution Design for an ingestion workflow with 1000s of tables for each source

Working on an ingestion workflow in databricks which extracts data from on-prem sources in databricks following all standard practices of incremental load, indempotency, upsert, schema evolution etc and storing data properly.Now we want to optimize t...

  • 1059 Views
  • 1 replies
  • 1 kudos
Latest Reply
Avinash_Narala
Databricks Partner
  • 1 kudos

I do did the similar kind of work in my recent project, where I need to run many SQL DDL's , so I automated the process using databricks jobs, capturing the dependency using a metadata table and creating tasks likewise in job through job api's, doing...

  • 1 kudos
adityarai316
by New Contributor III
  • 3258 Views
  • 6 replies
  • 2 kudos

Mount point in unity catalog

Hi Everyone,In my existing notebooks we have used mount points url as /mnt/ and we have more than 200 notebooks where we have used the above url to fetch the data/file from the container. Now as we are upgrading to unity catalog these url will no lon...

  • 3258 Views
  • 6 replies
  • 2 kudos
Latest Reply
NaveenBedadala
New Contributor II
  • 2 kudos

@adityarai316  did u get the solution because I am facing the same issue?

  • 2 kudos
5 More Replies
michaelh
by Databricks Partner
  • 6061 Views
  • 5 replies
  • 4 kudos

Resolved! AWS Databricks Cluster terminated.Reason:Container launch failure

We're developing custom runtime for databricks cluster. We need to version and archive our clusters for client. We made it run successfully in our own environment but we're not able to make it work in client's environment. It's large corporation with...

  • 6061 Views
  • 5 replies
  • 4 kudos
Latest Reply
NandiniN
Databricks Employee
  • 4 kudos

This appears to be an issue with the security group. Kindly review security group inbound/outbound rules.

  • 4 kudos
4 More Replies
franc_bomb
by New Contributor II
  • 2541 Views
  • 7 replies
  • 0 kudos

Cluster creation issue

Hello,I just started using Databricks community version for learning purposes.I have been trying to create a cluster but the first time it failed asking me to retry or contact the support, and now it's just running forever.What could be the problem? 

  • 2541 Views
  • 7 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Can you please perform one test, check on the cloud provider if you are able to start a node?

  • 0 kudos
6 More Replies
Labels