cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

dadrake3
by New Contributor II
  • 1108 Views
  • 1 replies
  • 1 kudos

Delta Live Tables Unity Catalog Insufficient Permissions

I am receiving the following error when I try to run my DLT pipeline with unity catalog enabled.```raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o950.load. : org.apache.spark.SparkSecurityException: [INSUFFICIENT_P...

  • 1108 Views
  • 1 replies
  • 1 kudos
Latest Reply
dadrake3
New Contributor II
  • 1 kudos

I have also tried granting all permissions on the schema to myself and to all users and neither helped

  • 1 kudos
mdelvaux
by New Contributor
  • 744 Views
  • 0 replies
  • 0 kudos

BigQuery as foreign catalog - full object structs

Hi -We have mounted BigQuery, hosting Google Analytics data, as a foreign catalog.When querying the tables, objects are returned as strings, with all keys obfuscated by "f" or "v", likely to avoid replicating object keys across all records and hence ...

  • 744 Views
  • 0 replies
  • 0 kudos
Neli
by New Contributor III
  • 1690 Views
  • 3 replies
  • 0 kudos

Decrease frequency of Databricks Asset Bundle API

We are using DABs for our deployment and to invoke any workflow. Behinds the scenes, it calls below API to get the status of workflow. Currently, it checks every few seconds. Is there a way to decrease this frequency from seconds to minutes.  GET /ap...

  • 1690 Views
  • 3 replies
  • 0 kudos
Latest Reply
Frank_Kennedy
New Contributor II
  • 0 kudos

Hey!  If the API is checking the job status too frequently, you might want to consider implementing a custom polling mechanism. Instead of relying on the default frequency, you can build a simple script or function that pauses for a longer interval b...

  • 0 kudos
2 More Replies
thiagoawstest
by Contributor
  • 3940 Views
  • 1 replies
  • 0 kudos

Error Sent message larger than max

Hello, I'm receiving a large amount of data in a dataframe, when trying to record or display it, I receive the error below. How can I fix it, or where do I change the setting?SparkConnectGrpcException: <_MultiThreadedRendezvous of RPC that terminated...

  • 3940 Views
  • 1 replies
  • 0 kudos
Latest Reply
fghedin
Databricks Partner
  • 0 kudos

Hi @Retired_mod, I'm facing the same error. Can you provide the full name of the spark conf we have to change?Thank you

  • 0 kudos
costi9992
by Databricks Partner
  • 4421 Views
  • 3 replies
  • 2 kudos

Access Databricks API using IDP token

Hello,We have a databricks account & workspace, provided by AWS with SSO enabled. Is there any way to access databricks workspace API ( jobs/clusters, etc ) using a token retrieved from IdentityProvider ? We can access databricks workspace API with A...

  • 4421 Views
  • 3 replies
  • 2 kudos
Latest Reply
fpopa
New Contributor II
  • 2 kudos

Hey - Costin and Anonymous user, have you managed to get this working, do you have examples by any chance?I'm also trying something similar but I haven't been able to make it work.> authenticate and access the Databricks REST API by setting the Autho...

  • 2 kudos
2 More Replies
csmcpherson
by Databricks Partner
  • 2498 Views
  • 2 replies
  • 0 kudos

AWS NAT (Network Address Translation) Automated On-demand Destruct / Create

Hi folks, Our company typically uses Databrick during a 12 hour block, however the AWS NAT for elastic compute is up 24 hours, and I'd rather not pay for those hours.I gather AWS lambda and cloudwatch can be used to schedule / trigger NAT destruction...

  • 2498 Views
  • 2 replies
  • 0 kudos
Latest Reply
csmcpherson
Databricks Partner
  • 0 kudos

For interest, this is how I ended up solving the situation, with pointers from AWS support:<< CREATE NAT >>import boto3 import logging from datetime import datetime ec2 = boto3.client('ec2') cloudwatch = boto3.client('logs') def lambda_handler(even...

  • 0 kudos
1 More Replies
NickLee
by New Contributor III
  • 1795 Views
  • 2 replies
  • 1 kudos

How to update num_workers dynamically in a job cluster

I am setting up a workflows with the UI. In the first task, a dynamic value for the next task's num_workers is calculated based on actual data size. In the subsequent task, I'd like to use this calculated num_workers to update the job cluster's defau...

NickLee_0-1722018584496.png
  • 1795 Views
  • 2 replies
  • 1 kudos
Latest Reply
NickLee
New Contributor III
  • 1 kudos

wonder if anyone has similar experience? thanks

  • 1 kudos
1 More Replies
tramtran
by Contributor
  • 8763 Views
  • 3 replies
  • 5 kudos

Resolved! Driver: Out of Memory

Hi everyone,I have a streaming job with 29 notebooks that runs continuously. Initially, I allocated 28 GB of memory to the driver, but the job failed with a "Driver Out of Memory" error after 4 hours of execution.To address this, I increased the driv...

  • 8763 Views
  • 3 replies
  • 5 kudos
Latest Reply
xorbix_rshiva
Databricks MVP
  • 5 kudos

It looks like _source_cdc_time is the timestamp for when the CDC transaction occurred in your source system. This would be a good choice for a timestamp column for your watermark, since you would be deduping values according to the time the transacti...

  • 5 kudos
2 More Replies
alex-syk
by New Contributor II
  • 14783 Views
  • 1 replies
  • 1 kudos

Delta table and AnalysisException: [PATH_NOT_FOUND] Path does not exist

I am performing some tests with delta tables. For each test, I write a delta table to Azure Blob Storage. Then I manually delete the delta table. After deleting the table and running my code again, I get this error:  AnalysisException: [PATH_NOT_FOUN...

Capture.PNG Capture.PNG Capture.PNG Capture.PNG
  • 14783 Views
  • 1 replies
  • 1 kudos
Latest Reply
kumar_ravi
New Contributor III
  • 1 kudos

yes it is weird , workaround for thisfiles = dbutils.fs.ls("s3 bucket or azure blob path")file_paths = [file.path for file in files]if target_path not in file_paths:        dbutils.fs.mkdirs(target_path)

  • 1 kudos
aschiff
by Contributor II
  • 732507 Views
  • 33 replies
  • 5 kudos

GC Driver Error

I am using a cluster in databricks to connect to a Tableau workbook through the JDBC connector. My Tableau workbook has been unable to load due to resources not being available through the data connection. I went to look at the driver log for my clus...

  • 732507 Views
  • 33 replies
  • 5 kudos
Latest Reply
galang123
New Contributor II
  • 5 kudos

yesasd

  • 5 kudos
32 More Replies
KosmaS
by New Contributor III
  • 9614 Views
  • 3 replies
  • 7 kudos

Resolved! Efficient caching/persisting

To cache/persist an action needs to be triggered. I'm just wondering, will it make any difference if, after persisting some df, I use, for instance, take(5) instead of count()?Will it be a bit more effective, because of sending results from 5 partiti...

  • 9614 Views
  • 3 replies
  • 7 kudos
Latest Reply
Rishabh-Pandey
Databricks MVP
  • 7 kudos

Yes take (5) will be more efficient in some ways.When you cache or persist a DataFrame in Spark, you are instructing Spark to store the DataFrame's intermediate data in memory (or on disk, depending on the storage level). This can significantly speed...

  • 7 kudos
2 More Replies
Labels