cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Raj_DB
by Contributor
  • 307 Views
  • 1 replies
  • 1 kudos

Resolved! Automating Job Permission Updates in Databricks Using a Notebook

Hi everyone,I am looking to create a notebook that, when executed by a user, performs the following actions:Retrieves all Databricks jobs created by the current userChecks whether a specific role already has permissions on those jobsAutomatically add...

  • 307 Views
  • 1 replies
  • 1 kudos
Latest Reply
ziafazal
Databricks Partner
  • 1 kudos

Hi @Raj_DB You can use databricks SDK to retrieve all jobs filter them by selecting only those where owner is current usersomething like thisfrom databricks.sdk import WorkspaceClient w = WorkspaceClient() # Specify the user email/username you want...

  • 1 kudos
malterializedvw
by New Contributor III
  • 2449 Views
  • 9 replies
  • 4 kudos

Parametrizing queries in DAB deployments

Hi folks,I would like to ask for best practises concerning the topic of parametrizing queries in Databricks Asset Bundle deployments.This topic is relevant to differentiate between deployments on different environments as well as [dev]-deployments vs...

  • 2449 Views
  • 9 replies
  • 4 kudos
Latest Reply
abohlin
New Contributor II
  • 4 kudos

Came across this thread as I was facing the same exact issue as @malterializedvw and I want to comment my fix in case anyone else tears their hair out on this problem.In my databricks.yml file I put in a gold_catalog and a silver_catalog variable and...

  • 4 kudos
8 More Replies
erigaud
by Honored Contributor
  • 9317 Views
  • 3 replies
  • 4 kudos

Get total number of files of a Delta table

I'm looking to know programatically how many files a delta table is made of.I know I can do %sqlDESCRIBE DETAIL my_tableBut that would only give me the number of files of the current version. I am looking to know the total number of files (basically ...

  • 9317 Views
  • 3 replies
  • 4 kudos
Latest Reply
gmiguel
Databricks Partner
  • 4 kudos

The best way to get this is executing the following statement:ANALYZE TABLE [table_name] COMPUTE STORAGE METRICS;Applies to: Databricks Runtime 18.0 and above

  • 4 kudos
2 More Replies
flourishingsing
by New Contributor III
  • 421 Views
  • 1 replies
  • 0 kudos

Resolved! How can retrieve backfill run parameter in Python?

I'm trying to run backfill with the following parameter. How can I access this in the Python script?Do I need to change anything in the yml?I usually set task parameters the following way:These are then parsed using argparse Python module.  

flourishingsing_0-1779284296139.png flourishingsing_1-1779284438804.png
  • 421 Views
  • 1 replies
  • 0 kudos
Latest Reply
flourishingsing
New Contributor III
  • 0 kudos

Found the following solution:Add job level parameters:parameters: - name: run_timestamp default: "some_default_value" Reference in task level parameters:tasks: - task_key: my_task spark_python_task: python_file: ../../script.py ...

  • 0 kudos
manish_de
by New Contributor III
  • 650 Views
  • 5 replies
  • 5 kudos

query based connector snapshot feature

In ingestion pipeline, for query based connector there is option of selecting batch snapshot instead of column name under dropdown - Cursor column. If I choose batch snapshot, will the databricks engine run select * from my source table, say Sql serv...

  • 650 Views
  • 5 replies
  • 5 kudos
Latest Reply
michaelfriendly
New Contributor II
  • 5 kudos

@rbtv It may execute something very similar to a `SELECT *` on the source table unless the platform adds its own partitioning or optimisation behind the scenes. From what I've observed, selecting batch snapshot often means the connector handles each ...

  • 5 kudos
4 More Replies
koen_hai
by New Contributor II
  • 494 Views
  • 2 replies
  • 0 kudos

Resolved! Custom and community connectors

Hi,The option to enable custom and community connectors does not seem to be available on the Previews page, how can this be enabled? Feature I'm referencing: Community connectors in Lakeflow Connect - Azure Databricks | Microsoft Learn

  • 494 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @koen_hai, The Community Connectors feature is controlled from the workspace-level Previews page by a workspace admin. If you don’t see that option there, the workspace likely hasn’t been enrolled for the preview yet. In that case, please contact ...

  • 0 kudos
1 More Replies
RTabur
by New Contributor III
  • 2765 Views
  • 4 replies
  • 2 kudos

[Bug] Orphan storage location

Hello,I'm not able to re-create an external location after removing its owner from Databricks Account. I'm getting the following error:Input path url 'abfss://foo@bar.dfs.core.windows.net/' overlaps with an existing external location within 'CreateEx...

  • 2765 Views
  • 4 replies
  • 2 kudos
Latest Reply
PL_db
Databricks Employee
  • 2 kudos

Your metastore admin can list all external locationsYour metastore admin can then drop the external location 

  • 2 kudos
3 More Replies
mnissen1337
by Contributor
  • 420 Views
  • 1 replies
  • 1 kudos

Resolved! Managing Default Start State for Continuous Streaming Jobs in Databricks Asset Bundles

 â€™ve created a notebook that uses Spark Structured Streaming and runs continuously, so I’ve deployed the corresponding Databricks job using the continuous trigger mode.What I’d like is for this job to start automatically only in certain environments ...

  • 420 Views
  • 1 replies
  • 1 kudos
Latest Reply
mnissen1337
Contributor
  • 1 kudos

I figured out that the continuous property has a pause_status aswell, not sure why I did not see this. So I think the above is solved!

  • 1 kudos
mnissen1337
by Contributor
  • 786 Views
  • 3 replies
  • 0 kudos

Resolved! Best Compute Option for Near-Real-Time Databricks API Ingestion Pipeline

I’ve built an ingestion pipeline in Databricks consisting of two notebooks:The first notebook calls an external API every four minutes to retrieve the latest available data.Each API call returns approximately 109 rows.The API only exposes the most re...

  • 786 Views
  • 3 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @mnissen1337 ,I would use serverless for that use case. It takes a time for job cluster to spin up (of course you can use pools, but given that your job needs to run every 5 minutes it doesn't make much sense), so serverless seems to be a great fi...

  • 0 kudos
2 More Replies
vedanth
by New Contributor
  • 254 Views
  • 1 replies
  • 0 kudos

Salesforce Connector - Lakeflow Connect 400 Error

HI All,I have been trying to setup Salesforce using Lakeflow Connect and followed instructions on the docshttps://docs.databricks.com/aws/en/connect/managed-ingestion#sfdcHowever I face into invalid_grant error  However login history on salesforce sh...

vedanth_0-1779009668052.png
  • 254 Views
  • 1 replies
  • 0 kudos
Latest Reply
GaneshI
New Contributor III
  • 0 kudos

Hi Vedanth,The invalid_grant error usually occurs due to authentication or OAuth configuration issues between Salesforce and Databricks Lakeflow Connect.Could you please verify the following points:Ensure the Salesforce user account is not locked and...

  • 0 kudos
Yannick_B
by New Contributor
  • 341 Views
  • 2 replies
  • 0 kudos

[DELTA_CREATE_EXTERNAL_TABLE_WITHOUT_TXN_LOG]

We are testing Delta writer in our environment  to create bronze tables and recently, I just needed to add one table to the notebook code and rerun the whole notebook that failed because of this error : [DELTA_CREATE_EXTERNAL_TABLE_WITHOUT_TXN_LOG] Y...

  • 341 Views
  • 2 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

@Yannick_B You are trying to register an external table pointing to a directory that does not contain required Delta transaction logs (_delta_log folder) and hence you see the error.When you run External Table Command, Databricks generally expects th...

  • 0 kudos
1 More Replies
GaneshI
by New Contributor III
  • 289 Views
  • 1 replies
  • 0 kudos

Does enabling Change Data Feed on a Delta table affect OPTIMIZE and ZORDER performance?

Does enabling Change Data Feed on a Delta table affect OPTIMIZE and ZORDER performance?After enabling CDF on several large Delta tables, our OPTIMIZE jobs are taking noticeably longer. Is this expected, and are there any tuning parameters to minimize...

  • 289 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sumit_7
Esteemed Contributor
  • 0 kudos

@GaneshI Yes, some overhead is expected. Databricks recommends predicate-based OPTIMIZE for large tables and Liquid Clustering over ZORDER.

  • 0 kudos
ThiagoRosetti
by New Contributor
  • 248 Views
  • 1 replies
  • 0 kudos

Serverless Compute connectivity issues with .com.br domains vs. Classic Clusters Spark hangs

Hi everyone,I'm facing two specific issues in my Databricks Premium workspace (AWS - sa-east-1).Serverless Connectivity Issue: When using Serverless compute, I can successfully call APIs ending in .com, but calls to .com.br domains fail with connecti...

  • 248 Views
  • 1 replies
  • 0 kudos
Latest Reply
GaneshI
New Contributor III
  • 0 kudos

Hi there,Great breakdown of the symptoms — these are actually two distinct issues likely sharing a common root cause in your VPC/network configuration. Let me address both:Issue 1: Serverless Compute — .com.br DNS Resolution FailureRoot CauseServerle...

  • 0 kudos
andytate
by New Contributor
  • 606 Views
  • 2 replies
  • 0 kudos

Lakebase not showing up

I am fairly new to Databricks and am learning it because a company I am working on is going to use it. One of the things they are going to use is Lakebase postgres so I thought I'd set it up on my personal account. First I don't see app switcher, sec...

  • 606 Views
  • 2 replies
  • 0 kudos
Latest Reply
rdokala
New Contributor III
  • 0 kudos

If it is available, you would see at Compute->Lakebase and tabs for Provisioned and Autoscaling. This option Lakebase is next to Apps. There is another option dotted grid on the top right corner, the option just before your profile name, if you expan...

  • 0 kudos
1 More Replies
GaneshI
by New Contributor III
  • 393 Views
  • 1 replies
  • 0 kudos

Resolved! What is the recommended approach to enforce row-level security in Unity Catalog for external BI tool

We connect Tableau and Power BI to our Databricks SQL warehouse via OAuth tokens. Does Unity Catalog row filters apply at the SQL layer regardless of the BI tool, or do we need additional enforcement at the warehouse level?

  • 393 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lu_Wang_ENB_DBX
Databricks Employee
  • 0 kudos

Unity Catalog row filters apply at the SQL/query layer, so if Tableau or Power BI is querying a Databricks SQL warehouse, the filters are enforced there — you do not need a separate warehouse-level row-filter feature. Row filters and column masks are...

  • 0 kudos
Labels