cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Bart_DE
by New Contributor II
  • 396 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks Asset Bundle conditional job cluster size?

Hey folks,Can someone please suggest if there is a way to spawn a job cluster of a given size if a parameter of the job invocation (e.g file_name) contains a desired value? I have a job which 90% of the time deals with very small files, but the remai...

  • 396 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor II
  • 0 kudos

Hi @Bart_DE No — a single job.yml file can’t “look inside” a parameter like file_name and then decide to spin up a different job-cluster size on the fly.Job-cluster definitions in Databricks Workflows (Jobs) are static. All the heavy-lifting has to b...

  • 0 kudos
Vasu_Kumar_T
by New Contributor II
  • 212 Views
  • 1 replies
  • 0 kudos

Job performance issue : Configurations

Hello All, One job taking more than 7hrs, when we added below configuration its taking <2:30 mins but after deployment with same parameters again its taking 7+hrs. 1) spark.conf.set("spark.sql.shuffle.partitions", 500) --> spark.conf.set("spark.sql.s...

  • 212 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor II
  • 0 kudos

Hi @Vasu_Kumar_T This is a classic Spark performance inconsistency issue. The fact that it works fine in your notebookbut degrades after deployment suggests several potential causes. Here are the most likely culprits and solutions:Primary Suspects1. ...

  • 0 kudos
Mahtab67
by New Contributor
  • 326 Views
  • 1 replies
  • 0 kudos

Spark Kafka Client Not Using Certs from Default truststore

Hi Team, I'm working on connecting Databricks to an external Kafka cluster secured with SASL_SSL (SCRAM-SHA-512 + certificate trust). We've encountered an issue where certificates imported into the default JVM truststore (cacerts) via an init script ...

  • 326 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor II
  • 0 kudos

Hi @Mahtab67 This is a common issue with Databricks and Kafka SSL connectivity.The problem stems from how Spark's Kafka connector handles SSL context initialization versus the JVM's default truststore.Root Cause Analysis:The Spark Kafka connector cre...

  • 0 kudos
Sainath368
by New Contributor III
  • 289 Views
  • 1 replies
  • 0 kudos

COMPUTE DELTA STATISTICS vs COMPUTE STATISTICS - Data Skipping

Hi all,I recently altered the data skipping stats columns on my Delta Lake table to optimize data skipping. Now, I’m wondering about the best practice for updating statistics:Is running ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICS sufficient a...

  • 289 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @Sainath368! Running ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICS is a good practice after modifying data skipping stats columns on a Delta Lake table. However, this command doesn’t update query optimizer stats. For that, you’ll need to ...

  • 0 kudos
Miloud_G
by New Contributor III
  • 543 Views
  • 2 replies
  • 2 kudos

Resolved! issue on databricks bundle deploy

HiI am trying to configure Databricks Asset Bundle, but got error on deploymentDatabricks bundle init ----------- OKDatabricks bundle validate ----- OKDatabricks bundle deploy ------ Failerror : PS C:\Databricks_DABs\DABs_Init\DABS_Init> databricks b...

  • 543 Views
  • 2 replies
  • 2 kudos
Latest Reply
Miloud_G
New Contributor III
  • 2 kudos

Thank you AdvilaI was enable to enable worspace files with scrip :from databricks.sdk.core import ApiClientclient = ApiClient()client.do("PATCH", "/api/2.0/workspace-conf", body={"enableWorkspaceFilesystem": "true"}, headers={"Content-Type": "applica...

  • 2 kudos
1 More Replies
Aidonis
by New Contributor III
  • 20789 Views
  • 3 replies
  • 4 kudos

Resolved! Load Data from Sharepoint Site to Delta table in Databricks

Hi New to the community so sorry if my post lacks detail.I am trying to create a connection between databricks and a sharepoint site to read excel files into a delta tableI can see there is a FiveTran partner connection that we can use to get sharepo...

  • 20789 Views
  • 3 replies
  • 4 kudos
Latest Reply
gaurav_singh_14
New Contributor II
  • 4 kudos

@Ajay-Pandey can we connect using user ID without using client id and secrets

  • 4 kudos
2 More Replies
turagittech
by New Contributor III
  • 224 Views
  • 1 replies
  • 0 kudos

DLT pipeline python stop scanning all databases in source

Hi All,I have set up a DLT pipleline for SQL Server to use CDC as per this instruction https://learn.microsoft.com/en-us/azure/databricks/ingestion/lakeflow-connect/sql-server-pipeline I have it in principal working, however, it scans all databases a...

  • 224 Views
  • 1 replies
  • 0 kudos
Latest Reply
Renu_
Contributor III
  • 0 kudos

Hi @turagittech, to prevent the Databricks CDC pipeline from scanning all databases on your SQL Server, try setting up a new account with read access only to the specific database. Just ensure this account doesn’t have permissions to any other databa...

  • 0 kudos
ankit001mittal
by New Contributor III
  • 278 Views
  • 1 replies
  • 0 kudos

How to stop access SQL AI Functions usage

Hi Guys,Recently, Databricks came up with a new feature  SQL AI FunctionsIs there a way to stop users from using it without downgrading the runtime on cluster? by using Policies?Also, is there a way to stop users from using serverless, before there w...

  • 278 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @ankit001mittal! Currently, there's no direct way to disable SQL AI Functions in Databricks. To restrict the use of serverless compute, you can set up serverless budget policies that allow you to monitor and limit usage to some extent. However,...

  • 0 kudos
Divya_Bhadauria
by New Contributor III
  • 11427 Views
  • 6 replies
  • 2 kudos

Unable to run python script from git repo in Databricks job

I'm getting cannot read python file on running this job which is configured to run a python script from git repo. Run result unavailable: run failed with error message Cannot read the python file /Repos/.internal/7c39d645692_commits/ff669d089cd8f93e9...

  • 11427 Views
  • 6 replies
  • 2 kudos
Latest Reply
SakthiGanesh
New Contributor II
  • 2 kudos

Hi @Divya_Bhadauria, I'm facing the same internal commit issue from my end. I don't gave any internal path in the databricks workflow. I gave the source to azure DevOps services with branch name. But when I ran the workflow it gives the below error a...

  • 2 kudos
5 More Replies
amarnathpal
by New Contributor III
  • 422 Views
  • 4 replies
  • 0 kudos

Adding a New Column for Updated Date in Pipeline

I've successfully set up my pipeline and everything is working fine. I'd like to add a new column to our table that records the date whenver any records got updated. Could you advise on how to go about this?

  • 422 Views
  • 4 replies
  • 0 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 0 kudos

Do you want to add dates for the historical data as well?

  • 0 kudos
3 More Replies
Ramakrishnan83
by New Contributor III
  • 2957 Views
  • 2 replies
  • 0 kudos

Optimize and Vaccum Command

Hi team,I am running a weekly purge process from databricks notebooks that cleans up chunk of records from my tables used for audit purposes. Tables are external tables. I need clarification on below items1.Should I need to  run Optimize and Vacuum c...

  • 2957 Views
  • 2 replies
  • 0 kudos
Latest Reply
JaimeAnders
New Contributor II
  • 0 kudos

That's a valid point about minimal read queries! However, while immediate storage reduction might not be necessary, consistent data integrity and potential future reporting needs might still warrant occasional optimize and vacuuming, even with extern...

  • 0 kudos
1 More Replies
chiruinfo5262
by New Contributor II
  • 297 Views
  • 3 replies
  • 0 kudos

Trying to convert oracle sql to databricks sql but not getting the desired output

ORACLE SQL: COUNT( CASE WHEN TRUNC(WORKORDER.REPORTDATE) BETWEEN SELECTED_PERIOD_START_DATE AND SELECTED_PERIOD_END_DATE THEN 1 END ) SELECTED_PERIOD_BM,COUNT( CASE WHEN TRUNC(WORKORDER.REPORTDATE) BETWEEN COMPARISON_PERIOD_START_DATE AND COMPARISON_...

  • 297 Views
  • 3 replies
  • 0 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 0 kudos

Please review this for the reference: https://www.databricks.com/blog/how-migrate-your-oracle-plsql-code-databricks-lakehouse-platform

  • 0 kudos
2 More Replies
jeremy98
by Honored Contributor
  • 1399 Views
  • 6 replies
  • 2 kudos

Resolved! Catch Metadata Workflow databricks

Hello community,Is it possible to get metadata workflow of a databricks job that is running? Like the start time, end time, triggered by etc.? Using dbutils.widgets.get()?

  • 1399 Views
  • 6 replies
  • 2 kudos
Latest Reply
Juan_Cardona
New Contributor II
  • 2 kudos

Now The best practice for this is not using the API (some functions were deprecated for this objective) instead you should use job parameters job_id = dbutils.widgets.get("job parameter name with job_id") job_run = dbutils.widgets.get("job parameter ...

  • 2 kudos
5 More Replies
eballinger
by Contributor
  • 749 Views
  • 5 replies
  • 0 kudos

Email notification to end users

Is there a way a way we can notify all of our databricks end users by email when there is a issue? We currently have our jobs setup to notify the technical team when a job workflow fails. That part works fine.But we would like the ability to maybe us...

  • 749 Views
  • 5 replies
  • 0 kudos
Latest Reply
eballinger
Contributor
  • 0 kudos

Thanks LRALVA & Isi,I like both of your suggestions. I did look into making my own notebook using smtplib but stopped because I do not know any open SMTP server or cloud email service in the Azure cloud environment. This is why I was hoping to levera...

  • 0 kudos
4 More Replies
Ankit_Kothiya
by New Contributor II
  • 638 Views
  • 2 replies
  • 1 kudos

Databricks JDBC Driver Version 42 Limitations

We found that the Databricks JDBC driver does not support:Connection.setAutoCommit(false)Connection.commit()Connection.rollback()Execution of BEGIN TRANSACTIONCan you help us understand why these operations are not supported by the Databricks JDBC dr...

  • 638 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ankit_Kothiya
New Contributor II
  • 1 kudos

Thank you, @SP_6721 , for your input!Could you please share an example snippet demonstrating how to handle batch processing, similar to what we typically do in a relational database?

  • 1 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels