cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

zsh24
by New Contributor
  • 5770 Views
  • 3 replies
  • 0 kudos

Python worker exited unexpectedly (crashed)

I have a failing pipeline which results in the following failure:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2053.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2053.0 (TID 4594) (10.171.199.129 e...

  • 5770 Views
  • 3 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@zsh24 , just checking if you were able to address the problem or need further guidance? 

  • 0 kudos
2 More Replies
bobbysidhartha
by New Contributor
  • 19519 Views
  • 2 replies
  • 0 kudos

How to parallelly merge data into partitions of databricks delta table using PySpark/Spark streaming?

I have a PySpark streaming pipeline which reads data from a Kafka topic, data undergoes thru various transformations and finally gets merged into a databricks delta table. In the beginning we were loading data into the delta table by using the merge ...

WbOeJ 6MYWV
  • 19519 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@bobbysidhartha​ :When merging data into a partitioned Delta table in parallel, it is important to ensure that each job only accesses and modifies the files in its own partition to avoid concurrency issues. One way to achieve this is to use partition...

  • 0 kudos
1 More Replies
JothyGanesan
by New Contributor III
  • 2029 Views
  • 3 replies
  • 0 kudos

DLT Merge tables into Delta

We are trying to load a Delta table from streaming tables using DLT. This target table needs a MERGE of 3 source tables. But when we use the DLT command with merge it says Merge is not supported. Is this anything related to DLT version? Please help u...

  • 2029 Views
  • 3 replies
  • 0 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 0 kudos

Hey @JothyGanesan Please take a look at the Apply Changes API - https://docs.databricks.com/en/delta-live-tables/cdc.htmlThis is a replacement of MERGE INTO in Databricks.Cheers!

  • 0 kudos
2 More Replies
Taja
by New Contributor II
  • 675 Views
  • 1 replies
  • 0 kudos

Delta Live Tables: large use

Does anyone use Delta Live Table on large scale in production pipelines ? Are they satisfied with the product ?Recently, I´ve started a PoC to evaluate the DLT and notice some concerns:- Excessive use of compute resources when you check the cluster m...

  • 675 Views
  • 1 replies
  • 0 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 0 kudos

Hi @Taja,I agree that DLT pipelines doesn't accept a single node cluster to begin with but you can always choose the instance type for both your driver and the worker nodes.As far as `waiting for resources` time is concerned, I've seen that DLT takes...

  • 0 kudos
NK_123
by New Contributor II
  • 1517 Views
  • 3 replies
  • 0 kudos

DELTA_INVALID_SOURCE_VERSION issue on spark structure streaming

I am doing a structure streaming and getting this error on databricks, the source table already have 2 versions(0,1). It is still not able to find  Query {'_id': UUID('fe7a563e-f487-4d0e-beb0-efe794ab4708'), '_runId': UUID('bf0e94b5-b6ce-42bb-9bc7-15...

  • 1517 Views
  • 3 replies
  • 0 kudos
Latest Reply
lukinkratas
New Contributor II
  • 0 kudos

Are you using checkpoints? If so, make sure the permisions to that location are ok, alternatively delete all the checkpoints, you have created in that location and try again. This was my case. 

  • 0 kudos
2 More Replies
Akash_Wadhankar
by New Contributor III
  • 458 Views
  • 0 replies
  • 1 kudos

Data Engineering Journey on Databricks

For any new Data Engineering aspirant, it has always been a difficult where to start the learning journey. I faced this challenge a decade ago. In order to help new aspirants I created a series of medium article for new learners. I hope it brings mor...

  • 458 Views
  • 0 replies
  • 1 kudos
robbe
by New Contributor III
  • 3680 Views
  • 3 replies
  • 1 kudos

Resolved! Get job ID from Asset Bundles

When using Asset Bundles to deploy jobs, how does one get the job ID of the resources that are created?I would like to deploy some jobs through asset bundles, get the job IDs, and then trigger these jobs programmatically outside the CI/CD pipeline us...

  • 3680 Views
  • 3 replies
  • 1 kudos
Latest Reply
nvashisth
New Contributor III
  • 1 kudos

Refer this answer and this can be a solution to above scenario -> https://community.databricks.com/t5/data-engineering/getting-job-id-dynamically-to-create-another-job-to-refer-as-job/m-p/102860/highlight/true#M41252

  • 1 kudos
2 More Replies
David_Billa
by New Contributor III
  • 1039 Views
  • 1 replies
  • 0 kudos

Unable to convert to date from datetime string with AM and PM

Any help to understand why it's showing 'null' instead of the date value? It's showing null only for 12:00:00 AM and for any other values it's showing date correctlyTO_DATE("12/30/2022 12:00:00 AM", "MM/dd/yyyy HH:mm:ss a") AS tsDate 

  • 1039 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @David_Billa, Can you try with: TO_TIMESTAMP("12/30/2022 12:00:00 AM", "MM/dd/yyyy hh:mm:ss a") AS tsDate The issue you are encountering with the TO_DATE function returning null for the value "12:00:00 AM" is likely due to the format string not ma...

  • 0 kudos
najmead
by Contributor
  • 31508 Views
  • 7 replies
  • 13 kudos

How to convert string to datetime with correct timezone?

I have a field stored as a string in the format "12/30/2022 10:30:00 AM"If I use the function TO_DATE, I only get the date part... I want the full date and time.If I use the function TO_TIMESTAMP, I get the date and time, but it's assumed to be UTC, ...

  • 31508 Views
  • 7 replies
  • 13 kudos
Latest Reply
Rajeev_Basu
Contributor III
  • 13 kudos

use from_utc_timestamp(to_timestam("<string>", <format>),<timezone>)

  • 13 kudos
6 More Replies
Svish
by New Contributor III
  • 1922 Views
  • 3 replies
  • 0 kudos

Resolved! DLT: Schema mismatch error

HiI am encountering the following error when writing a DLT pipeline. Here is my workflow:Read a bronze delta tableCheck Data Quality RulesWrite clean records to a silver table with defined schema. I use TRY_CAST for columns where there is mismatch be...

  • 1922 Views
  • 3 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @Svish ,You have one line that differs:JOB_CERTREP_CONTRACT_INT: string (nullable = true) vs. JOB_CERTREP_CONTRACT_NUMBER: string (nullable = true) 

  • 0 kudos
2 More Replies
stevewb
by New Contributor III
  • 1604 Views
  • 2 replies
  • 1 kudos

Resolved! databricks bundle deploy fails when job includes dbt task and git_source

I am trying to deploy a dbt task as part of a databricks job using databricks asset bundles.However, there seems to be a clash that occurs when specifying a job that includes a dbt task that causes a bizarre failure.I am using v0.237.0 of the CLI.Min...

  • 1604 Views
  • 2 replies
  • 1 kudos
Latest Reply
madams
Contributor III
  • 1 kudos

Thanks for providing that whole example, it was really easy to fiddle with.  I think I've found your solution.  Update the original two tasks on the job (if you want to keep them) like this: tasks: - task_key: notebook_task job...

  • 1 kudos
1 More Replies
HoussemBL
by New Contributor III
  • 1139 Views
  • 1 replies
  • 0 kudos

Resolved! Impact of deleting workspace on associated catalogs

Hello Community,I have a specific scenario regarding Unity Catalog and workspace deletion that I'd like to clarify:Current Setup:Two DataBricks workspaces: W1 and W2Single Unity Catalog instanceCatalog1: Created in W1, shared and accessible in W2Cata...

  • 1139 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @HoussemBL  When you delete a Databricks workspace, it does not directly impact the Unity Catalog or the data within it. Unity Catalog is a separate entity that manages data access and governance across multiple workspaces. Here’s what happens in ...

  • 0 kudos
thisisthemurph
by New Contributor II
  • 848 Views
  • 1 replies
  • 1 kudos

Databricks dashboards across multiple Databricks instances

We have multiple Databricks instances, one per environment (Dev-UK, Live-UK Live-EU, Live-US, etc), and we would like to create dashboards to present stats on our data in each of these environments. Each of these environments also has a differently n...

  • 848 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Hello, as you have mentioned you could create an script in Python that uses the api call https://docs.databricks.com/api/workspace/lakeview/create to generate the dashboard for each environment, the process to create the visualizations will be comple...

  • 1 kudos
dollyb
by Contributor II
  • 1566 Views
  • 5 replies
  • 0 kudos

Accessing Workspace / Repo file works in notebook, but not from job

In a notebook attached to.a normal personal cluster I can successfully do this:%fs ls file:/Workspace/Repos/$userName/$repoName/$folderNameWhen I run an init-script on a UC volume that the does the same thing, I'm getting this error:ls: cannot access...

  • 1566 Views
  • 5 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @dollyb, Can you try with just "ls /Workspace/Repos/my_user_name@company.com/my_repo_name/my_folder_name" I'm not sure dbutils will be useful in an init script, I will try to test it out

  • 0 kudos
4 More Replies
Monsem
by New Contributor III
  • 13397 Views
  • 8 replies
  • 5 kudos

Resolved! No Course Materials Widget below Lesson

Hello everyone,In my Databricks partner academy account, there is no course material while it should be under the lesson video. How can I resolve this problem? Does anyone else face the same problem? I had submitted a ticket to ask Databricks team bu...

  • 13397 Views
  • 8 replies
  • 5 kudos
Latest Reply
TheManOfSteele
New Contributor III
  • 5 kudos

I am still having this problem, cant find the slides and DLC for Data Engineering with Databricks

  • 5 kudos
7 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels