cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

MikeGo
by Contributor II
  • 7797 Views
  • 2 replies
  • 0 kudos

Why "rror: Invalid access to Org: xxx"

Hi team, I installed Databricks CLI, and run "databricks auth login --profile xxx" successfully. I can also connect from vscode to Databricks. "databricks clusters list -p xxx" also works. But when I tried to rundatabricks bundle validateI got"Error:...

  • 7797 Views
  • 2 replies
  • 0 kudos
Latest Reply
swhite
New Contributor II
  • 0 kudos

I just ran into this issue (in Azure Databricks) and found that it was caused by an incorrect `host` value specified in my databricks.yml file:targets: dev: default: true mode: production workspace: host: https://adb-<workspace-id...

  • 0 kudos
1 More Replies
afisl
by New Contributor II
  • 16938 Views
  • 8 replies
  • 5 kudos

Resolved! Apply unitycatalog tags programmatically

Hello,I'm interested in the "Tags" feature of columns/schemas/tables of the UnityCatalog (described here: https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/tags)I've been able to play with them by hand and would now lik...

Data Engineering
tags
unitycatalog
  • 16938 Views
  • 8 replies
  • 5 kudos
Latest Reply
Jiri_Koutny
Databricks Partner
  • 5 kudos

Hi, running ALTER TABLE SET TAGS works on views too!

  • 5 kudos
7 More Replies
Sadam97
by New Contributor III
  • 1043 Views
  • 3 replies
  • 0 kudos

GCE cluster chokes the secret api server.

Hi,We upgraded the GKE cluster to GCE cluster as per the databricks documentation. It works fine with one or two notebooks in job. Our production job has more than 40 notebooks and each notebook access the secret api and seems like secret api server ...

  • 1043 Views
  • 3 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @Sadam97, This looks to be a known issue. I will share more details soon. There is a known issue with NAT and GKE and we’ll follow up offline

  • 0 kudos
2 More Replies
DataGeek_JT
by New Contributor II
  • 4273 Views
  • 4 replies
  • 4 kudos

Is it possible to use Liquid Clustering on Delta Live Tables / Materialised Views?

Is it possible to use Liquid Clustering on Delta Live Tables? If it is available what is the Python syntax for adding liquid clustering to a Delta Live Table / Materialised view please? 

  • 4273 Views
  • 4 replies
  • 4 kudos
Latest Reply
surajitDE
Contributor
  • 4 kudos

@Dlt.table(name=table_name,comment="just_testing",table_properties={"quality": "gold","mergeSchema": "true"},cluster_by=["test_id", "find_date"] # Optimizes for queries filtering on these columns)def testing_table():return create_testing_table(df_fin...

  • 4 kudos
3 More Replies
IliaSinev
by Databricks Partner
  • 1420 Views
  • 2 replies
  • 0 kudos

Access mode for pool compute

Is there a way to set Access Mode: Shared to pool instances similar to All Purpose or Job clusters?We are getting an error reading from a table with a masking set up on a column:Failed to acquire a SAS token for list on /schema1/table1/_delta_log due...

  • 1420 Views
  • 2 replies
  • 0 kudos
Latest Reply
IliaSinev
Databricks Partner
  • 0 kudos

Hi @Brahmareddy, thanks for reply. It seems that a higher Runtime version could help: https://learn.microsoft.com/en-us/azure/databricks/compute/access-mode-limitations#fine-grained-access-control-limitations-for-unity-catalog-dedicated-access-mode I...

  • 0 kudos
1 More Replies
chris_y_1e
by New Contributor II
  • 5004 Views
  • 5 replies
  • 0 kudos

Self-joins are blocked on remote tables

In our production databricks workflow, we have been getting this error since yesterday in one of the steps:org.apache.spark.SparkException: Self-joins are blocked on remote tablesWe haven't changed our workflow or made any configurations for the data...

  • 5004 Views
  • 5 replies
  • 0 kudos
Latest Reply
chris_y_1e
New Contributor II
  • 0 kudos

@TomRenish Yeah, we fixed it by changing it to use a shared compute. It is called "USER_ISOLATION" in the `job.yaml` file:data_security_mode: USER_ISOLATION

  • 0 kudos
4 More Replies
Upendra_Dwivedi
by Databricks Partner
  • 935 Views
  • 1 replies
  • 0 kudos

Databricks-Sql-Connector

Hi,i am connecting with databricks sql_warehouse using vs_code and i am running following command: import osfrom databricks import sqlhost = 'adb-xxxxxxxxxxx.xx.azuredatabricks.net'http_path = '/sql/1.0/warehouses/xxxxxxxxxxxxxx'access_token = 'dapib...

  • 935 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16502773013
Databricks Employee
  • 0 kudos

Hello @Upendra_Dwivedi , This is potentially a missing package in your local Python setup, kindly can you check troubleshooting steps here and let me know  In the alternative this didn't work please share the output of the following commands: python ...

  • 0 kudos
Abser786
by New Contributor II
  • 1643 Views
  • 1 replies
  • 0 kudos

enable dynamic resource allocation on job cluster

I have a databricks job having two task those will run each alone or both parallel (will be controlled by if conditional task). When it runs parallel, one task is running for long time, but the same task finish quick when it runs alone. particularly ...

  • 1643 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16502773013
Databricks Employee
  • 0 kudos

Hello @Abser786, There is a difference between Dynamic Resource Allocation and the Scheduler policy Dynamic Resource Allocation means getting more compute as needed if current compute is totally consumed, this can be achieved by autoscaling feature/c...

  • 0 kudos
filipniziol
by Esteemed Contributor
  • 5628 Views
  • 3 replies
  • 0 kudos

Any known issue with interactive Shared Cluster Driver Memory Cleanup

I am experiencing memory leaks on a Standard (formerly shared) interactive cluster: 1. We run jobs regularly on the cluster2. After each job completes, driver memory usage continues to increase, suggesting resources aren't fully released3. Eventually...

  • 5628 Views
  • 3 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hello Team, I'll check internally if any known issue reported.

  • 0 kudos
2 More Replies
Vetrivel
by Databricks Partner
  • 1099 Views
  • 1 replies
  • 1 kudos

UC upgrade in Spark Streaming jobs

Kindly share the recommended approach for upgrading from HMS to UC for structured streaming jobs, ensuring seamless execution without any failures or data duplication? I would also appreciate insights into any best practices you have followed during ...

  • 1099 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi Vetrivel,How are you doing today?, As per my understanding, Upgrading from Hive Metastore (HMS) to Unity Catalog (UC) for structured streaming jobs needs a careful approach to avoid failures or data duplication. The best way is to first pause all ...

  • 1 kudos
Yutaro
by New Contributor III
  • 834 Views
  • 1 replies
  • 1 kudos

Resolved! How can I efficiently remove backslashes during a COPY INTO load in Databricks?

I’m using Databricks’ COPY INTO to load data from a CSV file into a Delta table. My input CSV looks like this:  CSV filecolumn1(string),column2(string) "[\,\,111\,222\,]","012\"34"After running COPY INTO, my Delta table currently contains:column1(str...

  • 834 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi Yutaro,You're doing great, and your question is very clear! In your case, the most efficient way to remove backslashes during the COPY INTO operation is to first load the raw CSV data into a temporary or staging Delta table, and then insert the cl...

  • 1 kudos
TimB
by New Contributor III
  • 1265 Views
  • 3 replies
  • 3 kudos

Adding dependencies to Serverless compute with concurrency slows processing right down

I am trying to run a job using the For Each command with many concurrent processes using serverless compute.To add dependencies to serverless jobs, it seems you have to add them to the notebook, rather than configure them on the tasks screen like you...

  • 1265 Views
  • 3 replies
  • 3 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 3 kudos

Yeah, TimB. Keep going.

  • 3 kudos
2 More Replies
glevin
by New Contributor II
  • 3431 Views
  • 7 replies
  • 1 kudos

JDBC Connection query row limit

Anyone know how to increase the amount of rows returned in a JDBC query? Currently we're receiving 1000 rows per query.Have tried adding a LIMIT 5000 to the end of the query, but no luck.

  • 3431 Views
  • 7 replies
  • 1 kudos
Latest Reply
glevin
New Contributor II
  • 1 kudos

Thanks all for your help.Looks like the bottleneck is the tool I'm using the make the connection (Appian). It limits JDBC responses to 1000 rows.

  • 1 kudos
6 More Replies
SaeedAsh
by New Contributor
  • 2594 Views
  • 3 replies
  • 0 kudos

How to Permanently Disable Serverless Compute in Azure Databricks?

Hi,I was wondering how to completely disable serverless compute in Azure Databricks. I am certain that it was disabled in my workspace before, but now it seems to be constantly available at the notebook level.Did Databricks release any recent updates...

  • 2594 Views
  • 3 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hey @noorbasha534 , I guess we dont have any feature to enable/disable databricks serverless compute at workspace level. You can confirm this once with your databricks account executive team. They might have a solution for this.

  • 0 kudos
2 More Replies
Yutaro
by New Contributor III
  • 4097 Views
  • 5 replies
  • 5 kudos

Resolved! Partitioning vs. Clustering for a 50 TiB Delta Lake Table on Databricks

Hello everyone,I’m planning to create a Delta Lake table on Databricks with an estimated size of ~50 TiB. The table includes three date columns — year, month, and day — and most of my queries will filter on these fields.I’m trying to decide whether t...

  • 4097 Views
  • 5 replies
  • 5 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 5 kudos

Hey Yutaro,Thank you so much for the kind words—it honestly means a lot! I'm really glad the guidance helped and that you're feeling more confident moving forward. You're doing all the right things by asking the right questions and planning ahead. If...

  • 5 kudos
4 More Replies
Labels