cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Pedro1
by New Contributor II
  • 3194 Views
  • 2 replies
  • 0 kudos

databricks_grants fails because it keeps track of a removed principal

Hi all,My terraform script fails on a databricks_grants with the error: "Error: cannot update grants: Could not find principal with name DataUsers". The principal DataUsers does not exist anymore because it has previously been deleted by terraform.Bo...

  • 3194 Views
  • 2 replies
  • 0 kudos
Latest Reply
wkeifenheim-og
New Contributor II
  • 0 kudos

I'm here searching for a similar but different issue, so this is just a suggestion of something to try..Have you tried setting a depends_on argument within your databricks_grants block?

  • 0 kudos
1 More Replies
pooja_bhumandla
by New Contributor III
  • 623 Views
  • 1 replies
  • 1 kudos

Deletion Vectors on Partioned Tables

Are Deletion Vectors supported for partitioned delta tables in Databricks?

  • 623 Views
  • 1 replies
  • 1 kudos
Latest Reply
paolajara
Databricks Employee
  • 1 kudos

Hi @pooja_bhumandla , Yes, deletion vectors are supported for partitioned delta tables in Databricks. It comes as part of storage optimization that allows  to delete, update, merge operations to mark existing rows as removed or changed without rewrit...

  • 1 kudos
rcostanza
by New Contributor III
  • 1376 Views
  • 1 replies
  • 1 kudos

Resolved! DataFrame.localCheckpoint() and cluster autoscaling at odds with each other

I have a notebook where at the beginning I load several dataframes and cache them using localCheckpoint(). I run this notebook using an all-purpose cluster with autoscaling enabled, with a mininum of 1 worker and maximum 2.The cluster often autoscale...

  • 1376 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 1 kudos

Hi @rcostanza You're facing a common issue with autoscaling clusters and cached data locality.There are several approaches to address this:Preventing Downscaling During Execution1. Disable Autoscaling Temporarily- You can disable autoscaling programm...

  • 1 kudos
hpant
by New Contributor III
  • 1355 Views
  • 2 replies
  • 1 kudos

Is it possible to create external volume using databricks asset bundle?

Is it possible to create external volume using databricks asset bundle? I have this code from databricks.yml file which is working perfectly fine for manged volume:    resources:      volumes:        bronze_checkpoints_volume:          catalog_name: ...

  • 1355 Views
  • 2 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 1 kudos

bundle:name: my_azure_volume_bundleresources:volumes:my_external_volume:catalog_name: mainschema_name: my_schemaname: my_external_volumevolume_type: EXTERNALstorage_location: abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path>...

  • 1 kudos
1 More Replies
Ivan_Pyrog
by New Contributor
  • 1960 Views
  • 2 replies
  • 0 kudos

Azure Event Hub throws Timeout Exceptio: Timed out waiting for a node assignment. Call: describeTopi

Hello team, We are researching the streaming capabilities of our data platform and currently in need of reading data from EVH ( event hub) with our Databricks notebooks. Unfortunately there seems to be an error somewhere due to Timeout Exception: Tim...

  • 1960 Views
  • 2 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@Ivan_Pyrog what's the full error message as per the Spark Driver log and what is your Kafka Broker version? I suspect you may be actually be hitting an incompatible of client-server.

  • 0 kudos
1 More Replies
kwasi
by New Contributor II
  • 22289 Views
  • 10 replies
  • 2 kudos

Kafka timout

Hello, I am trying to read topics from a kafaka stream but I am getting the time out error below.java.util.concurrent.ExecutionException: kafkashaded.org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: describeT...

  • 22289 Views
  • 10 replies
  • 2 kudos
Latest Reply
VZLA
Databricks Employee
  • 2 kudos

What's your Kafka Broker version and which Kafka client is in use (spark's, python-kafka, kafka-confluent,...) ?

  • 2 kudos
9 More Replies
himanshu_k
by New Contributor
  • 7103 Views
  • 3 replies
  • 0 kudos

Clarification Needed: Ensuring Correct Pagination with Offset and Limit in PySpark

Hi community,I hope you're all doing well. I'm currently engaged in a PySpark project where I'm implementing pagination-like functionality using the offset and limit functions. My aim is to retrieve data between a specified starting_index and ending_...

  • 7103 Views
  • 3 replies
  • 0 kudos
Latest Reply
Mathias_Peters
Contributor II
  • 0 kudos

Hi, did you find answer to this question? I am having similar problems and a slow solution, which I need to improve upon. Thanks in advance

  • 0 kudos
2 More Replies
Reza
by New Contributor III
  • 14331 Views
  • 11 replies
  • 6 kudos

Resolved! How can search in a specific folder in Databricks?

There is a keyword search option in Databricks that searches for a command or word in the entire workspace. How can search for a command in a specific folder or repository?

  • 14331 Views
  • 11 replies
  • 6 kudos
Latest Reply
Jensz007
New Contributor II
  • 6 kudos

@AtanuI agree with nelsoncardenas, the problem is not solved, and the answer currently only provides us with saying we need to raise a feature request.Would it be possible to at least link the feature requested by nelsoncardenas to this post/answer? ...

  • 6 kudos
10 More Replies
nayan_wylde
by Esteemed Contributor II
  • 1063 Views
  • 3 replies
  • 0 kudos

Installing Maven in UC enabled Standard mode cluster.

Curios if anyone face the issue of installing Maven packages in UC enabled cluster. Traditionally we use to install maven packages from artifactory repo. I am trying to install the same package from a UC enabled cluster (Standard mode). It worked whe...

  • 1063 Views
  • 3 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @nayan_wylde Yes, this is a common challenge when transitioning to Unity Catalog (UC) enabled clusters.The installation of Maven packages from Artifactory repositories does work differently in UC environments,but there are several approaches you c...

  • 0 kudos
2 More Replies
PedroFaria2135
by New Contributor II
  • 2808 Views
  • 1 replies
  • 0 kudos

Resolved! How to add permissions to a Databricks Workflow deployed via Asset Bundle YAML?

Hey! I was deploying a new Databricks Workflow into my workspace via Databricks Asset Bundles. Currently, I have a very simple workflow, defined in a YAML file like this: resources:  jobs:    example_job:      name: example_job      schedule:        ...

  • 2808 Views
  • 1 replies
  • 0 kudos
Latest Reply
nikhilj0421
Databricks Employee
  • 0 kudos

Hi @PedroFaria2135, this can be done using the permission key in the YAML file. Please refer to this document: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/reference#permissions    permissions: - level: CAN_VIEW group_name: te...

  • 0 kudos
Sangamswadik
by New Contributor III
  • 3600 Views
  • 5 replies
  • 2 kudos

Resolved! Unable to see All purpose compute

In the workspace, I can only see SQL warehouse, and apps, I've attached a screenshot. I don't see an option to create all purpose compute. Can you please tell me if there is a way to create one? Under user entitlements page look Identity and access >...

TWn25NCajM.png
  • 3600 Views
  • 5 replies
  • 2 kudos
Latest Reply
Execute
New Contributor II
  • 2 kudos

Please let us know how did you resolve this

  • 2 kudos
4 More Replies
karthikmani
by New Contributor
  • 971 Views
  • 1 replies
  • 1 kudos

Resolved! How to log the errors?

We have a notebook with some generic framework that we created to run for multiple tables everyday. We wanted to log the error/success/exceptions any such errors needs to be recorded in a log table so that we can troubleshoot based on the error log f...

  • 971 Views
  • 1 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 1 kudos

You can basically create some custom functions to log the events and write it to a data lake and then use structured streaming to read the data from data lake to a delta table.%scala// Functionsdef set_local_variables() = {      // get the variables ...

  • 1 kudos
OODataEng
by New Contributor III
  • 2749 Views
  • 6 replies
  • 1 kudos

Liquid clustering performance issue

Hello,I have a table with approximately 300 million records. It weighs 3.4 GB and consists of 305 files.I wanted to create liquid clustering for it and chose a date column as the key for clustering. When I created a new table with the above details b...

  • 2749 Views
  • 6 replies
  • 1 kudos
Latest Reply
Yogesh_Verma_
Contributor II
  • 1 kudos

Hey @OODDATAEng To create a new table in Databricks using the schema and data from an existing table, you can use the CREATE TABLE AS SELECT command. This command allows you to define a new table based on the results of a SELECT query executed on the...

  • 1 kudos
5 More Replies
JohanS
by New Contributor III
  • 6240 Views
  • 2 replies
  • 1 kudos

Resolved! WorkspaceClient authentication fails when running on a Docker cluster

from databricks.sdk import WorkspaceClientw = WorkspaceClient()ValueError: default auth: cannot configure default credentials ...I'm trying to instantiate a WorkspaceClient in a notebook on a cluster running a Docker image, but authentication fails.T...

  • 6240 Views
  • 2 replies
  • 1 kudos
Latest Reply
kyle_scherer1_5
New Contributor II
  • 1 kudos

Any progress here? Same issue, over a year later

  • 1 kudos
1 More Replies
OODataEng
by New Contributor III
  • 1203 Views
  • 2 replies
  • 0 kudos

Resolved! Git cerdentials for serivce principal running jobs

Hello, I have a permission issue when trying to access Azure DevOps and run a job using a Service Principal.I’ve read about the whole credentials topic, and indeed, when I create a PAT (Personal Access Token) through my personal user account, I can s...

OODataEng_0-1749968869036.png
  • 1203 Views
  • 2 replies
  • 0 kudos
Latest Reply
loui_wentzel
Contributor
  • 0 kudos

Using a PAT is how you authenticate as a user, so that you can configure your Service Principal (SP) - if you follow this link, there's a guide to the next steps (you're on step 3 now)Thie article explains a bit more on how to setup up the SP in Azur...

  • 0 kudos
1 More Replies
Labels