cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Jfoxyyc
by Valued Contributor
  • 7718 Views
  • 6 replies
  • 2 kudos

Is there a way to catch the cancel button or the interrupt button in a Databricks notebook?

I'm running oracledb package and it uses sessions. When you cancel a running query it doesn't close the session even if you have a try catch block because a cancel or interrupt issues a kill command on the process. Is there a method to catch the canc...

  • 7718 Views
  • 6 replies
  • 2 kudos
Latest Reply
gustavo_woiler
New Contributor II
  • 2 kudos

I was having the same issue and I think I was finally able to solve it!When you simply except and capture the KeyboardInterrupt signal and do not raise it, the notebook gets into an endless cycle of "interrupting..." and never does anything.However, ...

  • 2 kudos
5 More Replies
pranitha
by New Contributor II
  • 1220 Views
  • 3 replies
  • 0 kudos

instance_id in compute.node_timelines

I am trying to fetch active worker nodes from system tables using the code like below:select count(distinct instance_id)from system.compute.node_timelines where cluster_id = "xx"groupy by instance_id,start_time,end_timesIt gives an output like 20 but...

  • 1220 Views
  • 3 replies
  • 0 kudos
Latest Reply
pranitha
New Contributor II
  • 0 kudos

Hi @Alberto_Umana , Thanks for replying.Even if we add the driver node it should be around 16-17 right, not like 20. I checked for al the clusters, for every cluster there is a difference of 5-7 nodes between max_worker count and count(distinct insta...

  • 0 kudos
2 More Replies
TejeshS
by Contributor
  • 1609 Views
  • 3 replies
  • 0 kudos

Event based Alert based on certain events from System Audit tables

We need to implement an event-based trigger system that can detect any manual intervention performed by users. Upon detection of such an event, the system should automatically send a warning email. The events can be generated through DLT or other pro...

  • 1609 Views
  • 3 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Unfortunately the system events are only tracked via the system table, only option to have more recent data will be to re execute the query each time is needed.

  • 0 kudos
2 More Replies
thedatacrew
by Databricks Partner
  • 2895 Views
  • 6 replies
  • 0 kudos

Delta Live Tables - skipChangeCommits in SQL

Hi,Could anyone tell me if the skipChangeCommits option is supported in SQL mode? I can use it successfully using Python, but it doesn't look like it is supported by SQL.It seems to be a glaring omission from the SQL support, or support for this will...

thedatacrew_0-1736866714336.png
  • 2895 Views
  • 6 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Unfortunately there is no ETA on this yet. If I know about it, will let you know!

  • 0 kudos
5 More Replies
kyrrewk
by New Contributor II
  • 1172 Views
  • 3 replies
  • 0 kudos

Monitor progress when using databricks-connect

When using databricks-connect how can you monitor the progress? Ideally, we want something similar to what you get in the Databricks notebook, i.e., information about the jobs/stages. We are using Python.

  • 1172 Views
  • 3 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

I would suggest you to submit this as a feature request through https://docs.databricks.com/en/resources/ideas.html#ideas 

  • 0 kudos
2 More Replies
matthiasn
by Databricks Partner
  • 3614 Views
  • 6 replies
  • 0 kudos

Resolved! Use temporary table credentials to access data in Databricks

Hi everybody,I tested the temporary table credentials API. I works great, as long as I use the credentials outside of Databricks (e.g. in a local duckdb instance).But as soon as I try to use the short living credentials (Azure SAS for me) in Databric...

  • 3614 Views
  • 6 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Hello Matthias, many thanks for sharing this valuable information, it is great to hear your issue got resolved.

  • 0 kudos
5 More Replies
Hubert-Dudek
by Databricks MVP
  • 5242 Views
  • 2 replies
  • 3 kudos

Bridging the SQL-Python Gap

Python often edges out SQL with its metaprogramming capabilities. However, dbt bridges this gap with Jinja templates. Introducing simple "for" loops, especially for parameter iteration, in databricks SQL could significantly enhance the user experienc...

ezgif-5-0f8c73663b.gif
  • 5242 Views
  • 2 replies
  • 3 kudos
Latest Reply
Greg_c
New Contributor II
  • 3 kudos

Was this solved @Rajeev45  Do you have any docs?

  • 3 kudos
1 More Replies
sahasimran98
by New Contributor II
  • 1904 Views
  • 3 replies
  • 0 kudos

Data Volume Read/Processed for a Databricks Workflow Job

Hello All, I have a DBx instance hosted on Azure and I am using the Diagnostic Settings to collect Databricks Jobs related logs in log analytics workspace. So far, from the DatabricksJobs table in Azure Loganalytics, I am able to fetch basic job rela...

  • 1904 Views
  • 3 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor III
  • 0 kudos

Hi @sahasimran98 I think you're right this is more valid for synapse where such configuration exist but you can still give a try for databricks and let us know here the results. otherwise try to find some spark-monitoring package in github for databr...

  • 0 kudos
2 More Replies
KosmaS
by New Contributor III
  • 3853 Views
  • 4 replies
  • 1 kudos

Skewness / Salting with countDistinct

Hey Everyone,I experience data skewness for: df = (source_df .unionByName(source_df.withColumn("region", lit("Country"))) .groupBy("zip_code", "region", "device_type") .agg(countDistinct("device_id").alias("total_active_unique"), count("device_id").a...

Screenshot 2024-08-05 at 17.24.08.png
  • 3853 Views
  • 4 replies
  • 1 kudos
Latest Reply
Avinash_Narala
Databricks Partner
  • 1 kudos

you can make use of databricks native feature "Liquid Clustering", cluster by the columns which you want to use in grouping statements, it will handle the performance issue due to data skewness .For more information, please do visit :https://docs.dat...

  • 1 kudos
3 More Replies
garciargs
by New Contributor III
  • 1949 Views
  • 2 replies
  • 2 kudos

Resolved! Incremental load from two tables

Hi, I am looking to build a ETL process for a incremental load silver table.This silver table, lets say "contracts_silver", is built by joining two bronze tables, "contracts_raw" and "customer".contracts_silverCONTRACT_IDSTATUSCUSTOMER_NAME1SIGNEDPet...

  • 1949 Views
  • 2 replies
  • 2 kudos
Latest Reply
garciargs
New Contributor III
  • 2 kudos

Hi @hari-prasad ,Thank you! Will give it a try.Regards!

  • 2 kudos
1 More Replies
ashraf1395
by Honored Contributor
  • 1058 Views
  • 1 replies
  • 1 kudos

Solution Design for an ingestion workflow with 1000s of tables for each source

Working on an ingestion workflow in databricks which extracts data from on-prem sources in databricks following all standard practices of incremental load, indempotency, upsert, schema evolution etc and storing data properly.Now we want to optimize t...

  • 1058 Views
  • 1 replies
  • 1 kudos
Latest Reply
Avinash_Narala
Databricks Partner
  • 1 kudos

I do did the similar kind of work in my recent project, where I need to run many SQL DDL's , so I automated the process using databricks jobs, capturing the dependency using a metadata table and creating tasks likewise in job through job api's, doing...

  • 1 kudos
adityarai316
by New Contributor III
  • 3256 Views
  • 6 replies
  • 2 kudos

Mount point in unity catalog

Hi Everyone,In my existing notebooks we have used mount points url as /mnt/ and we have more than 200 notebooks where we have used the above url to fetch the data/file from the container. Now as we are upgrading to unity catalog these url will no lon...

  • 3256 Views
  • 6 replies
  • 2 kudos
Latest Reply
NaveenBedadala
New Contributor II
  • 2 kudos

@adityarai316  did u get the solution because I am facing the same issue?

  • 2 kudos
5 More Replies
michaelh
by Databricks Partner
  • 6057 Views
  • 5 replies
  • 4 kudos

Resolved! AWS Databricks Cluster terminated.Reason:Container launch failure

We're developing custom runtime for databricks cluster. We need to version and archive our clusters for client. We made it run successfully in our own environment but we're not able to make it work in client's environment. It's large corporation with...

  • 6057 Views
  • 5 replies
  • 4 kudos
Latest Reply
NandiniN
Databricks Employee
  • 4 kudos

This appears to be an issue with the security group. Kindly review security group inbound/outbound rules.

  • 4 kudos
4 More Replies
franc_bomb
by New Contributor II
  • 2540 Views
  • 7 replies
  • 0 kudos

Cluster creation issue

Hello,I just started using Databricks community version for learning purposes.I have been trying to create a cluster but the first time it failed asking me to retry or contact the support, and now it's just running forever.What could be the problem? 

  • 2540 Views
  • 7 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Can you please perform one test, check on the cloud provider if you are able to start a node?

  • 0 kudos
6 More Replies
leymariv
by New Contributor
  • 777 Views
  • 1 replies
  • 0 kudos

Performance issue writing an extract of a huge unpartitionned single column dataframe

I have a huge df (40 billions rows) shared by delta share that has only one column 'payload' which contains json and that is not partitionned:Even if all those payloads are not the same, they have a common col sessionId that i need to extract to be a...

leymariv_2-1737155764713.png leymariv_0-1737155486874.png
  • 777 Views
  • 1 replies
  • 0 kudos
Latest Reply
hari-prasad
Valued Contributor II
  • 0 kudos

Hi @leymariv,You can check the schema of data in delta sharing table, using df.printSchema to better understand the JSON structure. Use from_json function to flatten or normalize the data to respective columns.Additionally, you can understand how dat...

  • 0 kudos
Labels