cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Miloud_G
by New Contributor III
  • 2247 Views
  • 1 replies
  • 1 kudos

Resolved! connecting to unity catalog with power BI

i created a group of powerBI users, i grant consumer access permission to this groupe on unity catalog i started a shared compute cluster (Standard_D4ds_v5). when trying to connect from power BI, i will connect to tables only if i grant access to wor...

  • 2247 Views
  • 1 replies
  • 1 kudos
Latest Reply
radothede
Valued Contributor II
  • 1 kudos

Hi @Miloud_G Instead of granting full workspace access, you can grant "Databricks SQL access" entitlement, which provides limited workspace access specifically for BI tools.Go to Workspace Settings → Identity and Access → Groups/Users/SPs Manage → Se...

  • 1 kudos
smoortema
by Contributor
  • 1006 Views
  • 1 replies
  • 1 kudos

Resolved! Complex pipeline with many tasks and dependencies: orchestration in Jobs or in Notebook?

We need to set up a job that consists of several hundred tasks with many dependencies between each other. We are considering two different directions:1. Databricks job with tasks, with dependencies defined as code and deployed with Databricks asset b...

  • 1006 Views
  • 1 replies
  • 1 kudos
Latest Reply
radothede
Valued Contributor II
  • 1 kudos

Hi @smoortema To my best knowlegde:Option 1)You can create jobs that contain up to 1000 tasks, however, it is recommended to split tasks into logical subgroups.Jobs with more than 100 tasks require API 2.2 and above Jobs with a large number of tasks ...

  • 1 kudos
Locomo_Dncr
by New Contributor
  • 1121 Views
  • 4 replies
  • 0 kudos

Time Travel vs Bronze historical archive

HelloI am working on building a pipeline using Medallion architecture. The source tables in this pipeline are overwritten each time the table is updated. In the bronze ingestion layer, I plan to append this new table to the current bronze table, addi...

Data Engineering
Medallion
Processing Time
Storage Costs
Time Travel
  • 1121 Views
  • 4 replies
  • 0 kudos
Latest Reply
MariuszK
Valued Contributor III
  • 0 kudos

Hi @Locomo_Dncr Time travel isn't recomended to store historical data. It's for backup, audit purpose. You can store snapshot data or use SCD2 to keep history."Databricks does not recommend using Delta Lake table history as a long-term backup solutio...

  • 0 kudos
3 More Replies
jar
by Contributor
  • 1046 Views
  • 6 replies
  • 0 kudos

Understanding infra costs of Databricks compute

Hi.I can see in our Azure cost analysis tool that a not insignificant part of our costs come from the managed Databricks RG deployed with the workspace, and that it relates particularly to VMs (so compute, I assume?) and storage, the later which, tho...

  • 1046 Views
  • 6 replies
  • 0 kudos
Latest Reply
jar
Contributor
  • 0 kudos

Thank you all for your replies. The issue is not getting an overview of costs - I already have that from the Cost Management Export function in Azure, and by using the system.billing tables in Databricks. The issue is understanding the relation betwe...

  • 0 kudos
5 More Replies
Sanjeeb2024
by New Contributor III
  • 1587 Views
  • 4 replies
  • 1 kudos

Not able to get the location of the table in Databricks Free edition

Hi Team,I am exploring the delta table properties and wants to see the internal folder structure of the delta table. I created a delta table using Databricks Free edition and when I executed the describe extended command, the location field is coming...

Sanjeeb2024_0-1752760734723.png
  • 1587 Views
  • 4 replies
  • 1 kudos
Latest Reply
akshayt272
New Contributor II
  • 1 kudos

Then how can we see the versions of our delta file then ?

  • 1 kudos
3 More Replies
parthSundarka
by Databricks Employee
  • 2125 Views
  • 4 replies
  • 6 kudos

Job Failures Observed In DBR 13.3 due to changes in pip package - aiosignal

Problem Statement Few jobs running on DBR 13.3 have started failing with error mentioned below - TypeError: TypeVarTuple.__init__() got an unexpected keyword argument 'default'--------------------------------------------------------------------------...

image (1) (1).png
  • 2125 Views
  • 4 replies
  • 6 kudos
Latest Reply
pradr
New Contributor II
  • 6 kudos

Pinning down aiosignal==1.3.2 worked for me, are you sure the package you are building is using the pinned version?

  • 6 kudos
3 More Replies
vedant1611
by New Contributor
  • 2450 Views
  • 1 replies
  • 1 kudos

Resolved! How can I connect databricks apps with Unity Catalog?

Hi,I'm working on building a Databricks app to streamline several team tasks through a user-friendly data application.One key feature I’d like to implement is integrating Databricks Apps with Unity Catalog. Specifically, I want to display a dropdown ...

  • 2450 Views
  • 1 replies
  • 1 kudos
Latest Reply
ximenesfel
Databricks Employee
  • 1 kudos

Hi @vedant1611 , Yes, it is possible to integrate Databricks Apps with Unity Catalog to achieve the functionality you're describing. You can build a Databricks app—such as one using Streamlit—that dynamically queries Unity Catalog to list schemas, ta...

  • 1 kudos
noorbasha534
by Valued Contributor II
  • 799 Views
  • 5 replies
  • 4 kudos

Resolved! OPTIMIZE in parallel with actual data load

Dear allIf I understand correctly, OPTIMIZE cannot run in parallel with actual data load. We see 'concurrent update' errors in our environment if this happens; due to which we are unable to dedicate a maintenance window for the tables health.And, I s...

  • 799 Views
  • 5 replies
  • 4 kudos
Latest Reply
noorbasha534
Valued Contributor II
  • 4 kudos

@MariuszK @szymon_dybczak thanks both. appreciate your support.

  • 4 kudos
4 More Replies
jollymon
by New Contributor II
  • 474 Views
  • 3 replies
  • 1 kudos

Resolved! Access notebooks parameters from a bash cell

How can I access a notebook parameter from a bash cell (%sh)? For python I use dbutils.widgets.get('param'), and for SQL I can use :param. Is there a something similar for bash?

  • 474 Views
  • 3 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @jollymon ,I believe there is no direct way to do this. But maybe there are some workarounds though. You can try to read widgets in python and set those values as envrionment variables. Then you can use shell to read that variables. Something like...

  • 1 kudos
2 More Replies
raypritha
by New Contributor II
  • 397 Views
  • 1 replies
  • 1 kudos

Resolved! Switch from Partner Academy to Customer Academy

I accidentally signed up for the partner academy when I should have signed up for the customer academy. How can I switch to the customer academy? My e-mail is the same as I use for this community platform.

  • 397 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Databricks Employee
  • 1 kudos

Hello @raypritha! Please raise a ticket with the Databricks Support Team. They’ll be able to assist you with switching to the Customer Academy.

  • 1 kudos
yvesbeutler
by New Contributor III
  • 704 Views
  • 2 replies
  • 5 kudos

Resolved! run_if dependencies configuration within YAML

Hi guysI have a workflow with various python wheel tasks and one job task to call another workflow. How can I prevent my original workflow from getting an unsuccessful state if the second workflow fails? These workflows are independent and shouldn't ...

dbx-issue.png
  • 704 Views
  • 2 replies
  • 5 kudos
Latest Reply
eniwoke
Contributor II
  • 5 kudos

Hi @yvesbeutler here is a sample way I did  it using databricks asset bundles for notebook tasksresources: jobs: chained_jobs: name: chained-jobs tasks: - task_key: main notebook_task: notebook_path: /W...

  • 5 kudos
1 More Replies
pgruetter
by Contributor
  • 30312 Views
  • 8 replies
  • 2 kudos

Resolved! How to use Service Principal to connect PowerBI to Databrick SQL Warehouse

Hi allI'm struggling to connect PowerBI service to a Databricks SQL Warehouse using a service principal. I'm following mostly this guide.I created a new app registration in the AAD and created a client secret for it.Now I'm particularly struggling wi...

  • 30312 Views
  • 8 replies
  • 2 kudos
Latest Reply
Lone
New Contributor II
  • 2 kudos

Hello All,After successfully adding a service principal to Databricks and generating a client ID and client secret, I plan to utilize these credentials for authentication when configuring Databricks as a data source in Power BI. Could you please clar...

  • 2 kudos
7 More Replies
HariPrasad1
by New Contributor II
  • 4544 Views
  • 3 replies
  • 0 kudos

Jobs in Spark UI

Is there a way to get the url where all the spark jobs which are created in a specific notebook run can be found? I am creating an audit framework, in that the requirement is to get the spark jobs of a specific task or a notebook run so that we can d...

  • 4544 Views
  • 3 replies
  • 0 kudos
Latest Reply
eniwoke
Contributor II
  • 0 kudos

Hi @HariPrasad1 here is a way to get the job list (note: works for non-serverless clusters)from dbruntime.databricks_repl_context import get_context cluster_id = spark.conf.get("spark.databricks.clusterUsageTags.clusterId") workspaceUrl = spark.conf...

  • 0 kudos
2 More Replies
turagittech
by Contributor
  • 1133 Views
  • 1 replies
  • 0 kudos

Concatenating a row to be able to hash

Hi All,Sometimes to load data we want to only update a row based on changes in values. SCD1 type scenarios for a data warehouse. One approach is equivalency A=A, B=B etc. Another is generating a hash of all rows of interest, I believe pretty common. ...

turagittech_0-1752799736547.png
  • 1133 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor
  • 0 kudos

Hi @turagittech ,concat_ws() is generally the most practical and reliable option here. It handles mixed datatypes well and safely skips nulls. The only edge cases you'd typically run into are with complex or unsupported custom datatypes or if the sep...

  • 0 kudos
HoussemBL
by New Contributor III
  • 5177 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks Asset Bundle deploy failure

Hello,I have deployed successfully a Databricks Job that contains one task of type DLT using Databricks Asset Bundle.First deployment works well. For this particular Databricks job, I have clicked on "disconnect from source" to do some customization....

  • 5177 Views
  • 2 replies
  • 0 kudos
Latest Reply
thibault
Contributor III
  • 0 kudos

@Walter_C, should this property be set at the same level as name, catalog, channel? I'm getting an error at the schema validation (using the template from databricks bundle schema with databricls-cli v0.260.0), and the deployment does not succeed, du...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels