cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

brady_tyson
by New Contributor
  • 2280 Views
  • 1 replies
  • 0 kudos

Databricks Connect Vscode. Cannot find package installed on cluster

I am using Databricks Connect v2 to connect to a UC enabled cluster. I have a package I have made and installed in a wheel file on the cluster. When using vscode to import the package and use it I get a module not found error when running cell by cel...

  • 2280 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @brady_tyson just checking—are you still facing this issue with using your custom package and Databricks Connect? If so, here are a few questions to collect some data points about your setup:   Is Databricks Connect properly installed and configur...

  • 0 kudos
prakharcode
by New Contributor II
  • 1159 Views
  • 1 replies
  • 0 kudos

Problem with streaming jobs (foreachBatch) with USER_ISOLATION compute cluster

 We have been trying to run a streaming job on an all-purpose compute (4 cores, 16 gb) in the “user_isolation”, recommended by databricks to run with/for unity catalog. The job reads CDC files produced by a table refreshed every hour and produces aro...

  • 1159 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@prakharcode  Thank you for sharing the detailed information about your issue. Before diving into solutions, I want to confirm if this is still an ongoing problem you're facing. Regarding the difference in job performance between "NO_ISOLATION" mode ...

  • 0 kudos
sakuraDev
by New Contributor II
  • 2331 Views
  • 1 replies
  • 0 kudos

Why does soda not initialize?

Hey everyone, im using autoloader x soda.I'm new to both,The idea is to ingest with quality checks in my silver table for every batch in a continuous ingestion.I tried to configure soda as str just like the docs show, but its seems that it keeps on t...

sakuraDev_0-1725645131588.png
  • 2331 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@sakuraDev is this still an ongoing issue? If so, could you please share the error stacktrace as a file attachment? Thanks.

  • 0 kudos
zsh24
by New Contributor
  • 4448 Views
  • 3 replies
  • 0 kudos

Python worker exited unexpectedly (crashed)

I have a failing pipeline which results in the following failure:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2053.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2053.0 (TID 4594) (10.171.199.129 e...

  • 4448 Views
  • 3 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@zsh24 , just checking if you were able to address the problem or need further guidance? 

  • 0 kudos
2 More Replies
bobbysidhartha
by New Contributor
  • 18015 Views
  • 2 replies
  • 0 kudos

How to parallelly merge data into partitions of databricks delta table using PySpark/Spark streaming?

I have a PySpark streaming pipeline which reads data from a Kafka topic, data undergoes thru various transformations and finally gets merged into a databricks delta table. In the beginning we were loading data into the delta table by using the merge ...

WbOeJ 6MYWV
  • 18015 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@bobbysidhartha​ :When merging data into a partitioned Delta table in parallel, it is important to ensure that each job only accesses and modifies the files in its own partition to avoid concurrency issues. One way to achieve this is to use partition...

  • 0 kudos
1 More Replies
JothyGanesan
by New Contributor III
  • 1174 Views
  • 3 replies
  • 0 kudos

DLT Merge tables into Delta

We are trying to load a Delta table from streaming tables using DLT. This target table needs a MERGE of 3 source tables. But when we use the DLT command with merge it says Merge is not supported. Is this anything related to DLT version? Please help u...

  • 1174 Views
  • 3 replies
  • 0 kudos
Latest Reply
aayrm5
Honored Contributor
  • 0 kudos

Hey @JothyGanesan Please take a look at the Apply Changes API - https://docs.databricks.com/en/delta-live-tables/cdc.htmlThis is a replacement of MERGE INTO in Databricks.Cheers!

  • 0 kudos
2 More Replies
Taja
by New Contributor II
  • 434 Views
  • 1 replies
  • 0 kudos

Delta Live Tables: large use

Does anyone use Delta Live Table on large scale in production pipelines ? Are they satisfied with the product ?Recently, I´ve started a PoC to evaluate the DLT and notice some concerns:- Excessive use of compute resources when you check the cluster m...

  • 434 Views
  • 1 replies
  • 0 kudos
Latest Reply
aayrm5
Honored Contributor
  • 0 kudos

Hi @Taja,I agree that DLT pipelines doesn't accept a single node cluster to begin with but you can always choose the instance type for both your driver and the worker nodes.As far as `waiting for resources` time is concerned, I've seen that DLT takes...

  • 0 kudos
NK_123
by New Contributor II
  • 1172 Views
  • 3 replies
  • 0 kudos

DELTA_INVALID_SOURCE_VERSION issue on spark structure streaming

I am doing a structure streaming and getting this error on databricks, the source table already have 2 versions(0,1). It is still not able to find  Query {'_id': UUID('fe7a563e-f487-4d0e-beb0-efe794ab4708'), '_runId': UUID('bf0e94b5-b6ce-42bb-9bc7-15...

  • 1172 Views
  • 3 replies
  • 0 kudos
Latest Reply
lukinkratas
New Contributor II
  • 0 kudos

Are you using checkpoints? If so, make sure the permisions to that location are ok, alternatively delete all the checkpoints, you have created in that location and try again. This was my case. 

  • 0 kudos
2 More Replies
Akash_Wadhankar
by New Contributor III
  • 335 Views
  • 0 replies
  • 1 kudos

Data Engineering Journey on Databricks

For any new Data Engineering aspirant, it has always been a difficult where to start the learning journey. I faced this challenge a decade ago. In order to help new aspirants I created a series of medium article for new learners. I hope it brings mor...

  • 335 Views
  • 0 replies
  • 1 kudos
robbe
by New Contributor III
  • 2860 Views
  • 3 replies
  • 1 kudos

Resolved! Get job ID from Asset Bundles

When using Asset Bundles to deploy jobs, how does one get the job ID of the resources that are created?I would like to deploy some jobs through asset bundles, get the job IDs, and then trigger these jobs programmatically outside the CI/CD pipeline us...

  • 2860 Views
  • 3 replies
  • 1 kudos
Latest Reply
nvashisth
New Contributor III
  • 1 kudos

Refer this answer and this can be a solution to above scenario -> https://community.databricks.com/t5/data-engineering/getting-job-id-dynamically-to-create-another-job-to-refer-as-job/m-p/102860/highlight/true#M41252

  • 1 kudos
2 More Replies
David_Billa
by New Contributor III
  • 603 Views
  • 1 replies
  • 0 kudos

Unable to convert to date from datetime string with AM and PM

Any help to understand why it's showing 'null' instead of the date value? It's showing null only for 12:00:00 AM and for any other values it's showing date correctlyTO_DATE("12/30/2022 12:00:00 AM", "MM/dd/yyyy HH:mm:ss a") AS tsDate 

  • 603 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @David_Billa, Can you try with: TO_TIMESTAMP("12/30/2022 12:00:00 AM", "MM/dd/yyyy hh:mm:ss a") AS tsDate The issue you are encountering with the TO_DATE function returning null for the value "12:00:00 AM" is likely due to the format string not ma...

  • 0 kudos
najmead
by Contributor
  • 27703 Views
  • 7 replies
  • 13 kudos

How to convert string to datetime with correct timezone?

I have a field stored as a string in the format "12/30/2022 10:30:00 AM"If I use the function TO_DATE, I only get the date part... I want the full date and time.If I use the function TO_TIMESTAMP, I get the date and time, but it's assumed to be UTC, ...

  • 27703 Views
  • 7 replies
  • 13 kudos
Latest Reply
Rajeev_Basu
Contributor III
  • 13 kudos

use from_utc_timestamp(to_timestam("<string>", <format>),<timezone>)

  • 13 kudos
6 More Replies
Svish
by New Contributor III
  • 1208 Views
  • 3 replies
  • 0 kudos

Resolved! DLT: Schema mismatch error

HiI am encountering the following error when writing a DLT pipeline. Here is my workflow:Read a bronze delta tableCheck Data Quality RulesWrite clean records to a silver table with defined schema. I use TRY_CAST for columns where there is mismatch be...

  • 1208 Views
  • 3 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @Svish ,You have one line that differs:JOB_CERTREP_CONTRACT_INT: string (nullable = true) vs. JOB_CERTREP_CONTRACT_NUMBER: string (nullable = true) 

  • 0 kudos
2 More Replies
stevewb
by New Contributor II
  • 1136 Views
  • 2 replies
  • 1 kudos

Resolved! databricks bundle deploy fails when job includes dbt task and git_source

I am trying to deploy a dbt task as part of a databricks job using databricks asset bundles.However, there seems to be a clash that occurs when specifying a job that includes a dbt task that causes a bizarre failure.I am using v0.237.0 of the CLI.Min...

  • 1136 Views
  • 2 replies
  • 1 kudos
Latest Reply
madams
Contributor II
  • 1 kudos

Thanks for providing that whole example, it was really easy to fiddle with.  I think I've found your solution.  Update the original two tasks on the job (if you want to keep them) like this: tasks: - task_key: notebook_task job...

  • 1 kudos
1 More Replies
HoussemBL
by New Contributor III
  • 806 Views
  • 1 replies
  • 0 kudos

Resolved! Impact of deleting workspace on associated catalogs

Hello Community,I have a specific scenario regarding Unity Catalog and workspace deletion that I'd like to clarify:Current Setup:Two DataBricks workspaces: W1 and W2Single Unity Catalog instanceCatalog1: Created in W1, shared and accessible in W2Cata...

  • 806 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @HoussemBL  When you delete a Databricks workspace, it does not directly impact the Unity Catalog or the data within it. Unity Catalog is a separate entity that manages data access and governance across multiple workspaces. Here’s what happens in ...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels