cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Vasu_Kumar_T
by Databricks Partner
  • 586 Views
  • 1 replies
  • 0 kudos

Job performance issue : Configurations

Hello All, One job taking more than 7hrs, when we added below configuration its taking <2:30 mins but after deployment with same parameters again its taking 7+hrs. 1) spark.conf.set("spark.sql.shuffle.partitions", 500) --> spark.conf.set("spark.sql.s...

  • 586 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @Vasu_Kumar_T This is a classic Spark performance inconsistency issue. The fact that it works fine in your notebookbut degrades after deployment suggests several potential causes. Here are the most likely culprits and solutions:Primary Suspects1. ...

  • 0 kudos
Mahtab67
by New Contributor
  • 1880 Views
  • 1 replies
  • 0 kudos

Spark Kafka Client Not Using Certs from Default truststore

Hi Team, I'm working on connecting Databricks to an external Kafka cluster secured with SASL_SSL (SCRAM-SHA-512 + certificate trust). We've encountered an issue where certificates imported into the default JVM truststore (cacerts) via an init script ...

  • 1880 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @Mahtab67 This is a common issue with Databricks and Kafka SSL connectivity.The problem stems from how Spark's Kafka connector handles SSL context initialization versus the JVM's default truststore.Root Cause Analysis:The Spark Kafka connector cre...

  • 0 kudos
Sainath368
by Contributor
  • 1292 Views
  • 1 replies
  • 0 kudos

COMPUTE DELTA STATISTICS vs COMPUTE STATISTICS - Data Skipping

Hi all,I recently altered the data skipping stats columns on my Delta Lake table to optimize data skipping. Now, I’m wondering about the best practice for updating statistics:Is running ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICS sufficient a...

  • 1292 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Community Manager
  • 0 kudos

Hello @Sainath368! Running ANALYZE TABLE <table_name> COMPUTE DELTA STATISTICS is a good practice after modifying data skipping stats columns on a Delta Lake table. However, this command doesn’t update query optimizer stats. For that, you’ll need to ...

  • 0 kudos
Miloud_G
by New Contributor III
  • 1926 Views
  • 2 replies
  • 2 kudos

Resolved! issue on databricks bundle deploy

HiI am trying to configure Databricks Asset Bundle, but got error on deploymentDatabricks bundle init ----------- OKDatabricks bundle validate ----- OKDatabricks bundle deploy ------ Failerror : PS C:\Databricks_DABs\DABs_Init\DABS_Init> databricks b...

  • 1926 Views
  • 2 replies
  • 2 kudos
Latest Reply
Miloud_G
New Contributor III
  • 2 kudos

Thank you AdvilaI was enable to enable worspace files with scrip :from databricks.sdk.core import ApiClientclient = ApiClient()client.do("PATCH", "/api/2.0/workspace-conf", body={"enableWorkspaceFilesystem": "true"}, headers={"Content-Type": "applica...

  • 2 kudos
1 More Replies
ankit001mittal
by New Contributor III
  • 1084 Views
  • 1 replies
  • 0 kudos

How to stop access SQL AI Functions usage

Hi Guys,Recently, Databricks came up with a new feature  SQL AI FunctionsIs there a way to stop users from using it without downgrading the runtime on cluster? by using Policies?Also, is there a way to stop users from using serverless, before there w...

  • 1084 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Community Manager
  • 0 kudos

Hello @ankit001mittal! Currently, there's no direct way to disable SQL AI Functions in Databricks. To restrict the use of serverless compute, you can set up serverless budget policies that allow you to monitor and limit usage to some extent. However,...

  • 0 kudos
Divya_Bhadauria
by New Contributor III
  • 14099 Views
  • 6 replies
  • 2 kudos

Unable to run python script from git repo in Databricks job

I'm getting cannot read python file on running this job which is configured to run a python script from git repo. Run result unavailable: run failed with error message Cannot read the python file /Repos/.internal/7c39d645692_commits/ff669d089cd8f93e9...

  • 14099 Views
  • 6 replies
  • 2 kudos
Latest Reply
SakthiGanesh
New Contributor II
  • 2 kudos

Hi @Divya_Bhadauria, I'm facing the same internal commit issue from my end. I don't gave any internal path in the databricks workflow. I gave the source to azure DevOps services with branch name. But when I ran the workflow it gives the below error a...

  • 2 kudos
5 More Replies
amarnathpal
by New Contributor III
  • 1375 Views
  • 4 replies
  • 0 kudos

Adding a New Column for Updated Date in Pipeline

I've successfully set up my pipeline and everything is working fine. I'd like to add a new column to our table that records the date whenver any records got updated. Could you advise on how to go about this?

  • 1375 Views
  • 4 replies
  • 0 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 0 kudos

Do you want to add dates for the historical data as well?

  • 0 kudos
3 More Replies
Ramakrishnan83
by New Contributor III
  • 3713 Views
  • 2 replies
  • 0 kudos

Optimize and Vaccum Command

Hi team,I am running a weekly purge process from databricks notebooks that cleans up chunk of records from my tables used for audit purposes. Tables are external tables. I need clarification on below items1.Should I need to  run Optimize and Vacuum c...

  • 3713 Views
  • 2 replies
  • 0 kudos
Latest Reply
JaimeAnders
New Contributor II
  • 0 kudos

That's a valid point about minimal read queries! However, while immediate storage reduction might not be necessary, consistent data integrity and potential future reporting needs might still warrant occasional optimize and vacuuming, even with extern...

  • 0 kudos
1 More Replies
jeremy98
by Honored Contributor
  • 3213 Views
  • 6 replies
  • 2 kudos

Resolved! Catch Metadata Workflow databricks

Hello community,Is it possible to get metadata workflow of a databricks job that is running? Like the start time, end time, triggered by etc.? Using dbutils.widgets.get()?

  • 3213 Views
  • 6 replies
  • 2 kudos
Latest Reply
Juan_Cardona
Databricks Partner
  • 2 kudos

Now The best practice for this is not using the API (some functions were deprecated for this objective) instead you should use job parameters job_id = dbutils.widgets.get("job parameter name with job_id") job_run = dbutils.widgets.get("job parameter ...

  • 2 kudos
5 More Replies
Ankit_Kothiya
by Databricks Partner
  • 1566 Views
  • 2 replies
  • 1 kudos

Databricks JDBC Driver Version 42 Limitations

We found that the Databricks JDBC driver does not support:Connection.setAutoCommit(false)Connection.commit()Connection.rollback()Execution of BEGIN TRANSACTIONCan you help us understand why these operations are not supported by the Databricks JDBC dr...

  • 1566 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ankit_Kothiya
Databricks Partner
  • 1 kudos

Thank you, @SP_6721 , for your input!Could you please share an example snippet demonstrating how to handle batch processing, similar to what we typically do in a relational database?

  • 1 kudos
1 More Replies
venkad
by Contributor
  • 14788 Views
  • 5 replies
  • 7 kudos

Passing proxy configurations with databricks-sql-connector python?

Hi,I am trying to connect to databricks workspace which has IP Access restriction enabled using databricks-sql-connector. Only my Proxy server IPs are added in the allow list.from databricks import sql   connection = sql.connect( server_hostname ='...

  • 14788 Views
  • 5 replies
  • 7 kudos
Latest Reply
ss2025
New Contributor II
  • 7 kudos

Is there any resolution for the above setting up proxy with databricks sql connector

  • 7 kudos
4 More Replies
Upendra_Dwivedi
by Databricks Partner
  • 3591 Views
  • 4 replies
  • 0 kudos

Resolved! How to enable Databricks Apps User Authorization?

Hi All,I am working on implementation of user authorization in my databricks app. but to enable user auth it is asking:"A workspace admin must enable this feature to be able to request additional scopes. The user's API downscoped access token is incl...

  • 3591 Views
  • 4 replies
  • 0 kudos
Latest Reply
Upendra_Dwivedi
Databricks Partner
  • 0 kudos

Hi All, We can find this setting under Previews. Go to workspace>click your username>Previews

  • 0 kudos
3 More Replies
Ipshi
by New Contributor
  • 1011 Views
  • 1 replies
  • 0 kudos

databricks Data Engineer associate

Hi everyone , can anyone guide me about any test papers or any test materials anyone can go through for the databricks data engineer associate exam 

  • 1011 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Community Manager
  • 0 kudos

Hello @Ipshi! You can find resources for the Databricks Certified Data Engineer Associate exam in the Getting Ready for the Exam section of the exam-specific webpage on the website. This section includes a detailed list of topics covered and sample q...

  • 0 kudos
lawrence009
by Contributor
  • 3198 Views
  • 4 replies
  • 0 kudos

Blank Page after Logging In

On Feb 8 Singapore time, our Singapore workspace displayed a blank page (no interface or content) after login. Meanwhile our workspace in Tokyo reason worked normally. This lasted whole day and none of our troubleshooting yielded any clues. Then ever...

  • 3198 Views
  • 4 replies
  • 0 kudos
Latest Reply
ciro
New Contributor II
  • 0 kudos

After logging in, I’m getting a white screen, and it won’t load. I’ve tried clearing my cache and switching browsers, but nothing seems to work. This feels like something that really needs to be looked into. Has anyone figured out a way to fix it?

  • 0 kudos
3 More Replies
pargit2
by New Contributor II
  • 951 Views
  • 1 replies
  • 0 kudos

feature store delta sharing

Hi, I have 2 workspaces one for data engineers and one for data science team and I need to create in data engineering workspace the bronze and silver.I want to built them a feature store should I do it from data science workspace or data engineering ...

  • 951 Views
  • 1 replies
  • 0 kudos
Latest Reply
ciro
New Contributor II
  • 0 kudos

I like the idea of using Feature Store with Delta Sharing, but I’m a bit worried about its limits like no partition filtering and no streaming support. These could cause problems with performance and scaling in real situations.

  • 0 kudos
Labels