cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MauricioS
by New Contributor II
  • 388 Views
  • 3 replies
  • 2 kudos

Delta Live Tables - Dynamic Target Schema

Hi all,I have a requirement where I need to migrate a few jobs from standard databricks notebooks that are orchestrated by Azure Data Factory to DLT Pipelines, pretty straight forward so far. The tricky part is that the data tables in the catalog are...

image.png
  • 388 Views
  • 3 replies
  • 2 kudos
Latest Reply
fmadeiro
Contributor
  • 2 kudos

@MauricioS Great question!Databricks Delta Live Tables (DLT) pipelines are very flexible, but by default, the target schema specified in the pipeline configuration (such as target or schema) is fixed. That said, you can implement strategies to enable...

  • 2 kudos
2 More Replies
Jfoxyyc
by Valued Contributor
  • 4654 Views
  • 6 replies
  • 2 kudos

Is there a way to catch the cancel button or the interrupt button in a Databricks notebook?

I'm running oracledb package and it uses sessions. When you cancel a running query it doesn't close the session even if you have a try catch block because a cancel or interrupt issues a kill command on the process. Is there a method to catch the canc...

  • 4654 Views
  • 6 replies
  • 2 kudos
Latest Reply
gustavo_woiler
New Contributor II
  • 2 kudos

I was having the same issue and I think I was finally able to solve it!When you simply except and capture the KeyboardInterrupt signal and do not raise it, the notebook gets into an endless cycle of "interrupting..." and never does anything.However, ...

  • 2 kudos
5 More Replies
pranitha
by New Contributor II
  • 243 Views
  • 3 replies
  • 0 kudos

instance_id in compute.node_timelines

I am trying to fetch active worker nodes from system tables using the code like below:select count(distinct instance_id)from system.compute.node_timelines where cluster_id = "xx"groupy by instance_id,start_time,end_timesIt gives an output like 20 but...

  • 243 Views
  • 3 replies
  • 0 kudos
Latest Reply
pranitha
New Contributor II
  • 0 kudos

Hi @Alberto_Umana , Thanks for replying.Even if we add the driver node it should be around 16-17 right, not like 20. I checked for al the clusters, for every cluster there is a difference of 5-7 nodes between max_worker count and count(distinct insta...

  • 0 kudos
2 More Replies
TejeshS
by New Contributor III
  • 350 Views
  • 3 replies
  • 0 kudos

Event based Alert based on certain events from System Audit tables

We need to implement an event-based trigger system that can detect any manual intervention performed by users. Upon detection of such an event, the system should automatically send a warning email. The events can be generated through DLT or other pro...

  • 350 Views
  • 3 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Unfortunately the system events are only tracked via the system table, only option to have more recent data will be to re execute the query each time is needed.

  • 0 kudos
2 More Replies
thedatacrew
by New Contributor III
  • 430 Views
  • 6 replies
  • 0 kudos

Delta Live Tables - skipChangeCommits in SQL

Hi,Could anyone tell me if the skipChangeCommits option is supported in SQL mode? I can use it successfully using Python, but it doesn't look like it is supported by SQL.It seems to be a glaring omission from the SQL support, or support for this will...

thedatacrew_0-1736866714336.png
  • 430 Views
  • 6 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Unfortunately there is no ETA on this yet. If I know about it, will let you know!

  • 0 kudos
5 More Replies
kyrrewk
by New Contributor II
  • 245 Views
  • 3 replies
  • 0 kudos

Monitor progress when using databricks-connect

When using databricks-connect how can you monitor the progress? Ideally, we want something similar to what you get in the Databricks notebook, i.e., information about the jobs/stages. We are using Python.

  • 245 Views
  • 3 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

I would suggest you to submit this as a feature request through https://docs.databricks.com/en/resources/ideas.html#ideas 

  • 0 kudos
2 More Replies
matthiasn
by New Contributor III
  • 741 Views
  • 6 replies
  • 0 kudos

Resolved! Use temporary table credentials to access data in Databricks

Hi everybody,I tested the temporary table credentials API. I works great, as long as I use the credentials outside of Databricks (e.g. in a local duckdb instance).But as soon as I try to use the short living credentials (Azure SAS for me) in Databric...

  • 741 Views
  • 6 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Hello Matthias, many thanks for sharing this valuable information, it is great to hear your issue got resolved.

  • 0 kudos
5 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 2801 Views
  • 2 replies
  • 3 kudos

Bridging the SQL-Python Gap

Python often edges out SQL with its metaprogramming capabilities. However, dbt bridges this gap with Jinja templates. Introducing simple "for" loops, especially for parameter iteration, in databricks SQL could significantly enhance the user experienc...

ezgif-5-0f8c73663b.gif
  • 2801 Views
  • 2 replies
  • 3 kudos
Latest Reply
Greg_c
New Contributor II
  • 3 kudos

Was this solved @Rajeev45  Do you have any docs?

  • 3 kudos
1 More Replies
sahasimran98
by New Contributor II
  • 370 Views
  • 3 replies
  • 0 kudos

Data Volume Read/Processed for a Databricks Workflow Job

Hello All, I have a DBx instance hosted on Azure and I am using the Diagnostic Settings to collect Databricks Jobs related logs in log analytics workspace. So far, from the DatabricksJobs table in Azure Loganalytics, I am able to fetch basic job rela...

  • 370 Views
  • 3 replies
  • 0 kudos
Latest Reply
saurabh18cs
Valued Contributor III
  • 0 kudos

Hi @sahasimran98 I think you're right this is more valid for synapse where such configuration exist but you can still give a try for databricks and let us know here the results. otherwise try to find some spark-monitoring package in github for databr...

  • 0 kudos
2 More Replies
Odoo_ERP
by New Contributor II
  • 2769 Views
  • 1 replies
  • 1 kudos

Odoo ERP customization Odoo is one of the most popular ERP software. It is widely use by companies. Odoo customization mainly includes changing the sy...

Odoo ERP customizationOdoo is one of the most popular ERP software. It is widely use by companies. Odoo customization mainly includes changing the system by including new features and functionalities in accordance with the business needs of the clien...

  • 2769 Views
  • 1 replies
  • 1 kudos
Latest Reply
Odoo_ERPservice
New Contributor II
  • 1 kudos

Odoo ERP customization refers to the process of tailoring the Odoo software to better suit the specific needs and workflows of a business. As one of the most widely used ERP solutions, Odoo offers a modular approach that can be easily customized acro...

  • 1 kudos
KosmaS
by New Contributor III
  • 1141 Views
  • 4 replies
  • 1 kudos

Skewness / Salting with countDistinct

Hey Everyone,I experience data skewness for: df = (source_df .unionByName(source_df.withColumn("region", lit("Country"))) .groupBy("zip_code", "region", "device_type") .agg(countDistinct("device_id").alias("total_active_unique"), count("device_id").a...

Screenshot 2024-08-05 at 17.24.08.png
  • 1141 Views
  • 4 replies
  • 1 kudos
Latest Reply
Avinash_Narala
Valued Contributor II
  • 1 kudos

you can make use of databricks native feature "Liquid Clustering", cluster by the columns which you want to use in grouping statements, it will handle the performance issue due to data skewness .For more information, please do visit :https://docs.dat...

  • 1 kudos
3 More Replies
garciargs
by New Contributor III
  • 323 Views
  • 2 replies
  • 2 kudos

Resolved! Incremental load from two tables

Hi, I am looking to build a ETL process for a incremental load silver table.This silver table, lets say "contracts_silver", is built by joining two bronze tables, "contracts_raw" and "customer".contracts_silverCONTRACT_IDSTATUSCUSTOMER_NAME1SIGNEDPet...

  • 323 Views
  • 2 replies
  • 2 kudos
Latest Reply
garciargs
New Contributor III
  • 2 kudos

Hi @hari-prasad ,Thank you! Will give it a try.Regards!

  • 2 kudos
1 More Replies
ashraf1395
by Valued Contributor III
  • 252 Views
  • 1 replies
  • 1 kudos

Solution Design for an ingestion workflow with 1000s of tables for each source

Working on an ingestion workflow in databricks which extracts data from on-prem sources in databricks following all standard practices of incremental load, indempotency, upsert, schema evolution etc and storing data properly.Now we want to optimize t...

  • 252 Views
  • 1 replies
  • 1 kudos
Latest Reply
Avinash_Narala
Valued Contributor II
  • 1 kudos

I do did the similar kind of work in my recent project, where I need to run many SQL DDL's , so I automated the process using databricks jobs, capturing the dependency using a metadata table and creating tasks likewise in job through job api's, doing...

  • 1 kudos
adityarai316
by New Contributor III
  • 1319 Views
  • 6 replies
  • 2 kudos

Mount point in unity catalog

Hi Everyone,In my existing notebooks we have used mount points url as /mnt/ and we have more than 200 notebooks where we have used the above url to fetch the data/file from the container. Now as we are upgrading to unity catalog these url will no lon...

  • 1319 Views
  • 6 replies
  • 2 kudos
Latest Reply
NaveenBedadala
New Contributor II
  • 2 kudos

@adityarai316  did u get the solution because I am facing the same issue?

  • 2 kudos
5 More Replies
michaelh
by New Contributor III
  • 4263 Views
  • 5 replies
  • 4 kudos

Resolved! AWS Databricks Cluster terminated.Reason:Container launch failure

We're developing custom runtime for databricks cluster. We need to version and archive our clusters for client. We made it run successfully in our own environment but we're not able to make it work in client's environment. It's large corporation with...

  • 4263 Views
  • 5 replies
  • 4 kudos
Latest Reply
NandiniN
Databricks Employee
  • 4 kudos

This appears to be an issue with the security group. Kindly review security group inbound/outbound rules.

  • 4 kudos
4 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels