cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

r0nald
by New Contributor II
  • 11863 Views
  • 5 replies
  • 1 kudos

UDF not working inside transform() & lambda (SQL)

Below is toy example of what I'm trying to achieve, but don't understand why it fails. Can anyone explain why, and suggest a fix or not overly bloated workaround?%sqlcreate or replace function status_map(status int)returns stringreturn map(10, "STATU...

  • 11863 Views
  • 5 replies
  • 1 kudos
Latest Reply
marcogrcr
New Contributor II
  • 1 kudos

Scoped variables in a transform() are not accessible by UDFs. However, you can workaround this using explode():# equivalent of: select transform(arr, e -> status_map(e.v1)) from s1 select collect_list(status_map(status_id)) from explode((select trans...

  • 1 kudos
4 More Replies
seefoods
by Valued Contributor
  • 459 Views
  • 3 replies
  • 4 kudos

Resolved! write both logging error Pyspark and Python exceptions

Hello guyz, Happy new year and best wishes for all of us. I am catching both Pyspark and Python exceptions but i want to write this logging error inside a delta table when i logging. Someone knows the best practise for this ? Thanks Cordially, 

  • 459 Views
  • 3 replies
  • 4 kudos
Latest Reply
seefoods
Valued Contributor
  • 4 kudos

Thanks a lot @szymon_dybczak 

  • 4 kudos
2 More Replies
Digvijay_11
by Databricks Partner
  • 544 Views
  • 1 replies
  • 0 kudos

Resolved! Few queries on Autoloader

How to retrieve filename and file path from the trigger and consume in Databricks Notebook dynamicallyIf the same file is being modified with no change in name but in data then will this trigger work? If not what is the walkaround?In landing we are g...

  • 544 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Digvijay_11 ,1. You can use metadata column for that purpose File metadata column - Azure Databricks | Microsoft Learn2. With the default setting (cloudFiles.allowOverwrites = false), files are processed exactly once. When a file is appended to o...

  • 0 kudos
ChristianRRL
by Honored Contributor
  • 449 Views
  • 2 replies
  • 3 kudos

Resolved! Serverless Compute Spark Version Flexibility?

Hi there, I'm wondering what determines the Serverless Compute spark version? Is it based on the current DBR LTS? And is there a way to modify the spark version for serverless compute?For example, when I check the spark version for our serverless com...

ChristianRRL_0-1768409059721.png ChristianRRL_1-1768409577998.png
  • 449 Views
  • 2 replies
  • 3 kudos
Latest Reply
Databricks77
New Contributor III
  • 3 kudos

Serverless compute always run on the latest runtime version. You cannot choose it like in standard compute.

  • 3 kudos
1 More Replies
ChristianRRL
by Honored Contributor
  • 1046 Views
  • 4 replies
  • 10 kudos

Resolved! Testing Spark Declarative Pipeline in Docker Container > PySparkRuntimeError

Hi there, I see via an announcement last year that Spark Declarative Pipeline (previously DLT) was getting open sourced into Apache Spark, and I see that this recently is true as of Apache 4.1:Spark Declarative Pipelines Programming Guide I'm trying ...

ChristianRRL_0-1768361209159.png
  • 1046 Views
  • 4 replies
  • 10 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 10 kudos

Hi @ChristianRRL ,In addition to @osingh 's answers, check out this old but good blog post about how to structure the pipelines's code to enable dev and test cycle: https://www.databricks.com/blog/applying-software-development-devops-best-practices-d...

  • 10 kudos
3 More Replies
Anish_2
by New Contributor III
  • 503 Views
  • 2 replies
  • 0 kudos

Resolved! daabricks workflow design

Hello Team,I have use-case in which i want to trigger another dlt pipeline if 1 table got succeded in my parent dlt pipeline. I dont want to create pipeline to pipeline dependency. Is there any way to create table to pipeline dependency?Thank youAnis...

Data Engineering
deltalivetable
workflowdesign
  • 503 Views
  • 2 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Honored Contributor III
  • 0 kudos

@Anish_2 - TUT is the solution. in TUT, instead of the parent pipeline "pushing" a notification, the child job is "pulled" into action by a metadata change.Set it up as below.Create a Databricks Job and add a Pipeline task pointing to your Secondary ...

  • 0 kudos
1 More Replies
NotCuriosAtAll
by New Contributor III
  • 577 Views
  • 2 replies
  • 3 kudos

Resolved! Cluster crashes occasionally but not all of the time

We have a small cluster (Standard D2ads v6) with 8 gigs of ram and 2 cores. This is an all-purpose cluster and for some reason, the client demands to use this one for our ETL process. The ETL process is simple, the client drops parquet files in the b...

  • 577 Views
  • 2 replies
  • 3 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 3 kudos

Hi @NotCuriosAtAll ,You have undersized cluster for your workload. This error is typical on driver node with that high cpu consumption. You can check below article (and related solution):Job run fails with error message “Could not reach driver of clu...

  • 3 kudos
1 More Replies
bsr
by New Contributor II
  • 1778 Views
  • 4 replies
  • 5 kudos

Resolved! DBR 17.3.3 introduced unexpected DEBUG logs from ThreadMonitor – how to disable?

After upgrading from DBR 17.3.2 to DBR 17.3.3, we started seeing a flood of DEBUG logs like this in job outputs:```DEBUG:ThreadMonitor:Logging python thread stack frames for MainThread and py4j threads: DEBUG:ThreadMonitor:Logging Thread-8 (run) stac...

  • 1778 Views
  • 4 replies
  • 5 kudos
Latest Reply
WAHID
New Contributor II
  • 5 kudos

@iyashk-DBWe are currently using DBR version 17.3 LTS, and the issue is still occurring.Do you know when the fix is expected to be applied? We need this information to decide whether we should wait for the fix or proceed with the workaround you propo...

  • 5 kudos
3 More Replies
rijin-thomas
by New Contributor II
  • 644 Views
  • 4 replies
  • 3 kudos

Mongo Db connector - Connection timeout when trying to connect to AWS Document DB

I am on Databricks Run Time LTE 14.3 Spark 3.5.0 Scala 2.12 and mongodb-spark-connector_2.12:10.2.0. Trying to connect to Document DB using the connector and all I get is a connection timeout. I tried using PyMongo, which works as expected and I can ...

  • 644 Views
  • 4 replies
  • 3 kudos
Latest Reply
Sanjeeb2024
Valued Contributor
  • 3 kudos

Hi @rijin-thomas - Can you please allow the CIDR block for databricks account VPC from aws document db sg ( Executor connectivity stated by@bianca_unifeye ) . 

  • 3 kudos
3 More Replies
tvdh
by New Contributor II
  • 251 Views
  • 1 replies
  • 1 kudos

Resolved! Tab navigation between fields in dashboards is random

Tab navigation between fields in published dashboards seem very random.I have a dashboard with multiple text input fields (mapped to query paramters / filters). I expect to move logically between them when pressing tab (keyboard navigation), but I mo...

  • 251 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Community Manager
  • 1 kudos

Hello @tvdh! You can share this as product feedback so it’s visible to the Databricks product team and can be tracked and prioritized.

  • 1 kudos
Upendra_Dwivedi
by Databricks Partner
  • 3583 Views
  • 2 replies
  • 1 kudos

Resolved! Databricks APP OBO User Authorization

Hi All,We are using on-behalf of user authorization method for our app and the x-forwarded-access-token is expiring after sometime and we have to redeploy our app to rectify the issue. I am not sure what is the issue or how we can keep the token aliv...

Upendra_Dwivedi_0-1747911721728.png
  • 3583 Views
  • 2 replies
  • 1 kudos
Latest Reply
jpt
New Contributor II
  • 1 kudos

I am confronted with a similar error. I am also using obo user auth and have implemented accessing the token via  st.context.headers.get('x-forwarded-access-token') for every query and do not save it in a cache. Still, after 1 hour, i am hit with the...

  • 1 kudos
1 More Replies
Ved88
by Databricks Partner
  • 526 Views
  • 5 replies
  • 1 kudos

databricks all-purpose cluster

getting below error-Failure starting repl. Try detaching and re-attaching the notebook. while executing notebook and can see cluster have all installed lib.

  • 526 Views
  • 5 replies
  • 1 kudos
Latest Reply
Ved88
Databricks Partner
  • 1 kudos

Hi,we are not using hive metastore anywhere not sure why that host ((host=consolidated-westeuropec2-prod-metastore-0.mysql.database.azure.com)(port=3306))is coming in driver log ,will i need to do whitelist for that .we are having other use case simi...

  • 1 kudos
4 More Replies
csondergaardp
by New Contributor II
  • 532 Views
  • 2 replies
  • 2 kudos

Resolved! [PATH_NOT_FOUND] Structured Streaming uses wrong checkpoint location

I'm trying to perform a simple example using structured streaming on a directory created as a Volume. The use case is purely educational; I am investigating various forms of triggers. Basic info:Catalog: "dev_catalog"Schema: "stream"Volume: "streamin...

  • 532 Views
  • 2 replies
  • 2 kudos
Latest Reply
cgrant
Databricks Employee
  • 2 kudos

Your checkpoint code looks correct. What is the source of `df`? Is it `/Volumes/dev_catalog/default/streaming_basics/` ? The path looks incorrect - add `stream` to it.  

  • 2 kudos
1 More Replies
HarishKumarM
by New Contributor
  • 542 Views
  • 1 replies
  • 0 kudos

Resolved! Zerobus Connector Issue

I was trying to implement the example posted on the below link for Zerobus connector to test its functionality on my free edition workspace but unfortunately I am getting below error.Reference Code: https://learn.microsoft.com/en-us/azure/databricks/...

  • 542 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hey @HarishKumarM , I did some digging and found some helpful information to help you troubleshoot.   What the error means Your workspace isn’t currently enrolled in the Zerobus Ingest preview. Even though Zerobus is labeled a Public Preview, it’s st...

  • 0 kudos
RevanthV
by Contributor
  • 420 Views
  • 3 replies
  • 3 kudos

Resolved! Data validation with df writes using append mode

Hi Team,Recently i came across a situation where I had to write a huge data and it took 6 hrs to complete...later when i checked the target data , I saw 20% of the total records written incorrectly or corrupted because the source data itself was corr...

  • 420 Views
  • 3 replies
  • 3 kudos
Latest Reply
RevanthV
Contributor
  • 3 kudos

Hey @K_Anudeep , thanks a lot for tagging me into the GitHub issue.. This is exactly what I want " validate and commit" feature and i se you have already raised a PR for the same with a new option called . I will try this out and check if it satisfie...

  • 3 kudos
2 More Replies
Labels