cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

mbaas
by New Contributor III
  • 2092 Views
  • 1 replies
  • 2 kudos

DLT Serverless costs

I recently started checking out the serverless delta live tables. In my understanding serverless continuous jobs (with autoloader) would only do something when new files arrive. However for serverless 4 pipelines running continuously, I spend in two ...

  • 2092 Views
  • 1 replies
  • 2 kudos
Latest Reply
NandiniN
Databricks Employee
  • 2 kudos

Hi @mbaas , That does not sound right. So, were you able to compare the jobs, stages and spot the difference. Are there more tasks added, or with compute perspective you do not find a difference at all but only the cost?  Also, it may be required to ...

  • 2 kudos
rumfox
by New Contributor II
  • 4118 Views
  • 2 replies
  • 2 kudos

Maximum Number of Parameters in Databricks SQL Queries

Hello Databricks Community,I'm working with Databricks SQL and encountered an issue when passing a large number of parameters in a query. Specifically, I attempted to pass 493 parameters, but I received the following error message:BAD_REQUEST : Too m...

  • 4118 Views
  • 2 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @rumfox ,I would assume there is such a limit, the error is pretty clear. But the weird thing is, I cannot find any mention in documentation about it 

  • 2 kudos
1 More Replies
dadrake3
by New Contributor II
  • 1108 Views
  • 1 replies
  • 1 kudos

Delta Live Tables Unity Catalog Insufficient Permissions

I am receiving the following error when I try to run my DLT pipeline with unity catalog enabled.```raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o950.load. : org.apache.spark.SparkSecurityException: [INSUFFICIENT_P...

  • 1108 Views
  • 1 replies
  • 1 kudos
Latest Reply
dadrake3
New Contributor II
  • 1 kudos

I have also tried granting all permissions on the schema to myself and to all users and neither helped

  • 1 kudos
mdelvaux
by New Contributor
  • 744 Views
  • 0 replies
  • 0 kudos

BigQuery as foreign catalog - full object structs

Hi -We have mounted BigQuery, hosting Google Analytics data, as a foreign catalog.When querying the tables, objects are returned as strings, with all keys obfuscated by "f" or "v", likely to avoid replicating object keys across all records and hence ...

  • 744 Views
  • 0 replies
  • 0 kudos
Neli
by New Contributor III
  • 1690 Views
  • 3 replies
  • 0 kudos

Decrease frequency of Databricks Asset Bundle API

We are using DABs for our deployment and to invoke any workflow. Behinds the scenes, it calls below API to get the status of workflow. Currently, it checks every few seconds. Is there a way to decrease this frequency from seconds to minutes.  GET /ap...

  • 1690 Views
  • 3 replies
  • 0 kudos
Latest Reply
Frank_Kennedy
New Contributor II
  • 0 kudos

Hey!  If the API is checking the job status too frequently, you might want to consider implementing a custom polling mechanism. Instead of relying on the default frequency, you can build a simple script or function that pauses for a longer interval b...

  • 0 kudos
2 More Replies
thiagoawstest
by Contributor
  • 3941 Views
  • 1 replies
  • 0 kudos

Error Sent message larger than max

Hello, I'm receiving a large amount of data in a dataframe, when trying to record or display it, I receive the error below. How can I fix it, or where do I change the setting?SparkConnectGrpcException: <_MultiThreadedRendezvous of RPC that terminated...

  • 3941 Views
  • 1 replies
  • 0 kudos
Latest Reply
fghedin
Databricks Partner
  • 0 kudos

Hi @Retired_mod, I'm facing the same error. Can you provide the full name of the spark conf we have to change?Thank you

  • 0 kudos
costi9992
by Databricks Partner
  • 4421 Views
  • 3 replies
  • 2 kudos

Access Databricks API using IDP token

Hello,We have a databricks account & workspace, provided by AWS with SSO enabled. Is there any way to access databricks workspace API ( jobs/clusters, etc ) using a token retrieved from IdentityProvider ? We can access databricks workspace API with A...

  • 4421 Views
  • 3 replies
  • 2 kudos
Latest Reply
fpopa
New Contributor II
  • 2 kudos

Hey - Costin and Anonymous user, have you managed to get this working, do you have examples by any chance?I'm also trying something similar but I haven't been able to make it work.> authenticate and access the Databricks REST API by setting the Autho...

  • 2 kudos
2 More Replies
csmcpherson
by Databricks Partner
  • 2498 Views
  • 2 replies
  • 0 kudos

AWS NAT (Network Address Translation) Automated On-demand Destruct / Create

Hi folks, Our company typically uses Databrick during a 12 hour block, however the AWS NAT for elastic compute is up 24 hours, and I'd rather not pay for those hours.I gather AWS lambda and cloudwatch can be used to schedule / trigger NAT destruction...

  • 2498 Views
  • 2 replies
  • 0 kudos
Latest Reply
csmcpherson
Databricks Partner
  • 0 kudos

For interest, this is how I ended up solving the situation, with pointers from AWS support:<< CREATE NAT >>import boto3 import logging from datetime import datetime ec2 = boto3.client('ec2') cloudwatch = boto3.client('logs') def lambda_handler(even...

  • 0 kudos
1 More Replies
NickLee
by New Contributor III
  • 1796 Views
  • 2 replies
  • 1 kudos

How to update num_workers dynamically in a job cluster

I am setting up a workflows with the UI. In the first task, a dynamic value for the next task's num_workers is calculated based on actual data size. In the subsequent task, I'd like to use this calculated num_workers to update the job cluster's defau...

NickLee_0-1722018584496.png
  • 1796 Views
  • 2 replies
  • 1 kudos
Latest Reply
NickLee
New Contributor III
  • 1 kudos

wonder if anyone has similar experience? thanks

  • 1 kudos
1 More Replies
tramtran
by Contributor
  • 8764 Views
  • 3 replies
  • 5 kudos

Resolved! Driver: Out of Memory

Hi everyone,I have a streaming job with 29 notebooks that runs continuously. Initially, I allocated 28 GB of memory to the driver, but the job failed with a "Driver Out of Memory" error after 4 hours of execution.To address this, I increased the driv...

  • 8764 Views
  • 3 replies
  • 5 kudos
Latest Reply
xorbix_rshiva
Databricks MVP
  • 5 kudos

It looks like _source_cdc_time is the timestamp for when the CDC transaction occurred in your source system. This would be a good choice for a timestamp column for your watermark, since you would be deduping values according to the time the transacti...

  • 5 kudos
2 More Replies
alex-syk
by New Contributor II
  • 14787 Views
  • 1 replies
  • 1 kudos

Delta table and AnalysisException: [PATH_NOT_FOUND] Path does not exist

I am performing some tests with delta tables. For each test, I write a delta table to Azure Blob Storage. Then I manually delete the delta table. After deleting the table and running my code again, I get this error:  AnalysisException: [PATH_NOT_FOUN...

Capture.PNG Capture.PNG Capture.PNG Capture.PNG
  • 14787 Views
  • 1 replies
  • 1 kudos
Latest Reply
kumar_ravi
New Contributor III
  • 1 kudos

yes it is weird , workaround for thisfiles = dbutils.fs.ls("s3 bucket or azure blob path")file_paths = [file.path for file in files]if target_path not in file_paths:        dbutils.fs.mkdirs(target_path)

  • 1 kudos
Labels