cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

allancruz
by New Contributor
  • 1678 Views
  • 1 replies
  • 1 kudos

Resolved! Embedding Dashboards on Databricks Apps

Hi Team,I recently tried the Hello World template and embedded the <iframe> from the dashboard that I created. It works properly fine before I added some code to have a Login Form (I used Dash Plotly on creating the Login Form) before the dashboard a...

  • 1678 Views
  • 1 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

in databricks, I recently tried the Hello World template and embedded the <iframe> from the dashboard that I created. It works properly fine before I added some code to have a Login Form (I used Dash Plotly on creating the Login Form) before the...

  • 1 kudos
databricksdata
by New Contributor
  • 1596 Views
  • 1 replies
  • 1 kudos

Resolved! Assistance Required with Auto Liquid Clustering Implementation Challenges

Hi Databricks Team,We are currently implementing Auto Liquid Clustering (ALC) on our Delta tables as part of our data optimization efforts. During this process, we have encountered several challenges and would appreciate your guidance on best practic...

  • 1596 Views
  • 1 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

To implement Auto Liquid Clustering (ALC) on Delta tables in Databricks, especially when transitioning from external partitioned tables to unpartitioned managed tables, a careful and ordered process is crucial to avoid data duplication and ensure con...

  • 1 kudos
saicharandeepb
by New Contributor III
  • 1831 Views
  • 1 replies
  • 1 kudos

Resolved! Understanding High I/O Wait Despite High CPU Utilization in system.compute Metrics

Hi everyone,I'm working on building a hardware metrics dashboard using the system.compute schema in Databricks, specifically leveraging the cluster, node_type, and node_timeline tables.While analyzing the data, I came across something that seems cont...

  • 1831 Views
  • 1 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

Your observation highlights a subtlety in interpreting CPU metrics, especially in distributed environments like Databricks, where cluster and node-level behaviors can diverge from typical single-server intuition. Direct Answer No, seeing both high cp...

  • 1 kudos
piotrsofts
by New Contributor III
  • 1549 Views
  • 1 replies
  • 0 kudos

Resolved! LakeFlow Connect-&gt;GA4 - creation of Liquid Clustered stream table

Hello While creating new Data Ingestion from GA4, can we set-up Liquid Clustering (either Manual or Automatical) on destination table which will contain fetched data from GA4? 

  • 1549 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Yes, in Databricks, it is possible to set up Liquid Clustering—both manual and automatic—on destination tables that store data ingested from Google Analytics 4 (GA4). This feature significantly improves table management and query performance compared...

  • 0 kudos
IONA
by New Contributor III
  • 155 Views
  • 1 replies
  • 0 kudos

Dabs Databricks asset bundles

Hi!I am relatively new to Dabs, but getting on quite well.I have managed to deploy both a job that uses a notebook defined in the bundle itself and a job that points to a notebook living in an azure devops git repo. While these are two viable solutio...

  • 155 Views
  • 1 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 0 kudos

Hi @IONA ,you need to add a step into your CD pipeline to copy the notebook:- checkout: self- script: |cp path/to/notebook_in_repo/notebook.py .bundle/notebook.pydisplayName: 'Copy notebook into bundle'- script: |databricks bundle deploydisplayName: ...

  • 0 kudos
fjrodriguez
by New Contributor III
  • 297 Views
  • 1 replies
  • 1 kudos

Resolved! Ingestion Framework

I would to like to update my ingestion framework that is orchestrated by ADF, running couples Databricks notebook and copying the data to DB afterwards. I want to rely everything on Databricks i though this could be the design:Step 1. Expose target t...

  • 297 Views
  • 1 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 1 kudos

Hi @fjrodriguez your assumptions are correct.1) If you want to query or write to Azure SQL DB directly from Databricks SQL (using Unity Catalog), you need to create an External Connection in Unity Catalog and then define External Tables that point to...

  • 1 kudos
BenLambert
by Contributor
  • 3845 Views
  • 2 replies
  • 0 kudos

How to deal with deleted files in source directory in DLT?

We have a DLT pipeline that uses the autoloader to detect files added to a source storage bucket. It reads these updated files and adds new records to a bronze streaming table. However we would also like to automatically delete records from the bronz...

  • 3845 Views
  • 2 replies
  • 0 kudos
Latest Reply
boitumelodikoko
Valued Contributor
  • 0 kudos

I am looking for a solution to use with DLTs

  • 0 kudos
1 More Replies
shashankB
by New Contributor III
  • 256 Views
  • 2 replies
  • 4 kudos

Resolved! How to invoke Databricks AI Assistant from a notebook cell?

Hello Community,I am exploring the Databricks AI Assistant and wondering if there is a way to invoke or interact with it directly from a notebook cell instead of using the workspace sidebar UI.Is there any built-in command (like %assistant) to open o...

  • 256 Views
  • 2 replies
  • 4 kudos
Latest Reply
nayan_wylde
Honored Contributor III
  • 4 kudos

@shashankB There are no command like %assistant exists today to interact with Databricks Assistant. As @szymon_dybczak mentioned in the reply  the exiting modes that you can interact with Assistant today.Also there is no published Assistant‑specific ...

  • 4 kudos
1 More Replies
getsome
by New Contributor
  • 1948 Views
  • 1 replies
  • 1 kudos

Resolved! How to Efficiently Sync MLflow Traces and Asynchronous User Feedback with a Delta Table

I’m building a custom UI table (using Next.js and FastAPI) to display MLflow trace data from a Retrieval-Augmented Generation (RAG) application running on Databricks Managed MLflow 3.0. The table needs to show answer generation speed (from CHAT_MODEL...

  • 1948 Views
  • 1 replies
  • 1 kudos
Latest Reply
sarahbhord
Databricks Employee
  • 1 kudos

Hello! Here are the answers to your questions:  - Yes! See databricks managed mlflow tracing - enable production monitor or endpoint config to collect traces in a delta table - We have example code for implementing async feedback collection - Definit...

  • 1 kudos
elgeo
by Valued Contributor II
  • 6076 Views
  • 3 replies
  • 2 kudos

SQL While do loops

Hello. Could you please suggest a workaround for a while do loop in Databricks SQL?WHILE LSTART>0 DO SET LSTRING=CONCAT(LSTRING, VSTRING2)Thank you in advance

  • 6076 Views
  • 3 replies
  • 2 kudos
Latest Reply
nayan_wylde
Honored Contributor III
  • 2 kudos

@elgeo Here are two alternatives.1. Use a recursive CTEWITH RECURSIVE loop_cte (lstart, lstring) AS ( SELECT 5 AS lstart, '' AS lstring UNION ALL SELECT lstart - 1, CONCAT(lstring, 'VSTRING2') FROM loop_cte WHERE lstart > 1 ) SELECT * FROM ...

  • 2 kudos
2 More Replies
ashishasr
by New Contributor II
  • 266 Views
  • 1 replies
  • 1 kudos

SQL Stored Procedure in Databricks

Hello, Is there a sql server equivalent stored procedure in Databricks which supports while loop along with delay as below. or are there any other alternative to achieve the same.while (select count(*) from schema.mart_daily with (nolock)) = 0 begin ...

Data Engineering
Databricks
DML
sql
stored procedure
  • 266 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @ashishasr ,Yes, support for stored procedures is in public preview:CREATE PROCEDURE | Databricks on AWSIn the definition of stored procedure you can use compound statement ( SQL compound statement (BEGIN ... END) with the definition of the SQL Pr...

  • 1 kudos
AanchalSoni
by New Contributor III
  • 459 Views
  • 9 replies
  • 6 kudos

Streaming- Results not getting updated on arrival of new files

Hi!I'm trying to stream some files using read_files.format("cloudFiles"). However, when new files arrive, the subsequent SQL query and monitoring graphs are not getting updated. Please suggest.

  • 459 Views
  • 9 replies
  • 6 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 6 kudos

Hi @AanchalSoni If you are using .readStream, make sure you have set a trigger interval (e.g., .trigger(processingTime='1 minute'))

  • 6 kudos
8 More Replies
shweta_m
by New Contributor III
  • 1280 Views
  • 3 replies
  • 4 kudos

Resolved! Best Practices for Managing ACLs on Jobs and Job Clusters in Databricks

 Hi all,I’m setting up access control for Databricks jobs and have two questions:Ephemeral Job Clusters: Since job clusters are created at runtime, is it best practice to set ACLs on the job itself? The /api/2.0/permissions/clusters/{cluster_id} endp...

  • 1280 Views
  • 3 replies
  • 4 kudos
Latest Reply
shweta_m
New Contributor III
  • 4 kudos

Thanks! @juan_maedo @saurabh18cs 

  • 4 kudos
2 More Replies
susmitsircar
by New Contributor III
  • 412 Views
  • 9 replies
  • 0 kudos

Proposal: Switch to Zstd Compression for Parquet to Reduce S3 Costs

We are thinking to change the Spark configuration for Parquet files to use zstd compression.Configuration: spark.sql.parquet.compression.codec = zstdThis will only affect new data written by our Spark jobs. All existing data will remain compressed wi...

  • 412 Views
  • 9 replies
  • 0 kudos
Latest Reply
susmitsircar
New Contributor III
  • 0 kudos

Yes my believe is it should support 7.3 LTS as well, we will prove it with thorough testingThanks for the discussion. Cheers

  • 0 kudos
8 More Replies
parthesh24
by New Contributor II
  • 387 Views
  • 3 replies
  • 3 kudos

Resolved! from pyspark.ml.stat import KolmogorovSmirnovTest is not working on Serverless compute.

 Hi everyone,I am trying to run a Kolmogorov–Smirnov (KS) test on a Spark DataFrame column in Databricks using the built-in pyspark.ml.stat.KolmogorovSmirnovTest. The goal is to apply the KS test directly on Spark DataFrame data without converting it...

  • 387 Views
  • 3 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

 Hi @parthesh24 ,It looks more like KolmogorovSmirnovTest module under the hood is trying to access SparkContext which is not supported in serverless.  You can check it yourself by trying to use sparkContext in serverless

  • 3 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels