cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ceceliac
by New Contributor III
  • 411 Views
  • 7 replies
  • 0 kudos

inconsistent behavior with serverless sql: user is not an owner of table error with views

We get the following error with some basic views and not others when using serverless compute (from a notebook or from SQL Editor or from the Catalog Explorer).  Views are simple select * from table x and underlying schemas/tables are using managed m...

  • 411 Views
  • 7 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@ceceliac just a quick check, if you rerun the same query after it has initially failed, will it go through or still fail? if it runs fine, wait another 10-15mins and rerun it and share the outcome. So: 1.- Run it once, it will fail. 2.- Rerun it inm...

  • 0 kudos
6 More Replies
zsh24
by New Contributor
  • 1788 Views
  • 3 replies
  • 0 kudos

Python worker exited unexpectedly (crashed)

I have a failing pipeline which results in the following failure:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2053.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2053.0 (TID 4594) (10.171.199.129 e...

  • 1788 Views
  • 3 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

@zsh24 , just checking if you were able to address the problem or need further guidance? 

  • 0 kudos
2 More Replies
khishore
by Contributor
  • 4162 Views
  • 7 replies
  • 6 kudos

Resolved! i haven't received my certificate or the badge for Databricks Certified Data Engineer Associate

Hi @Lindsay Olson​ @Kaniz Fatma​ ,I have cleared my Databricks Certified Data Engineer Associate on 29 October 2022. but haven't received my badge or certificate yet .Can you guys please help .Thanks

  • 4162 Views
  • 7 replies
  • 6 kudos
Latest Reply
gokul2
New Contributor III
  • 6 kudos

Hi @Lindsay Olson​ @Kaniz Fatma​ ,I have cleared my Databricks Certified Data Engineer Associate on 29 October 2022. but haven't received my badge or certificate yet .thanks,Gokul P

  • 6 kudos
6 More Replies
bobbysidhartha
by New Contributor
  • 15914 Views
  • 2 replies
  • 0 kudos

How to parallelly merge data into partitions of databricks delta table using PySpark/Spark streaming?

I have a PySpark streaming pipeline which reads data from a Kafka topic, data undergoes thru various transformations and finally gets merged into a databricks delta table. In the beginning we were loading data into the delta table by using the merge ...

WbOeJ 6MYWV
  • 15914 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

@bobbysidhartha​ :When merging data into a partitioned Delta table in parallel, it is important to ensure that each job only accesses and modifies the files in its own partition to avoid concurrency issues. One way to achieve this is to use partition...

  • 0 kudos
1 More Replies
FilipezAR
by New Contributor
  • 7273 Views
  • 2 replies
  • 1 kudos

Failed to create new KafkaAdminClient

I want to create connections to kafka with spark.readStream using the following parameters: kafkaParams = { "kafka.sasl.jaas.config": f'org.apache.kafka.common.security.plain.PlainLoginModule required username="{kafkaUsername}" password="{kafkaPa...

  • 7273 Views
  • 2 replies
  • 1 kudos
Latest Reply
john533
New Contributor III
  • 1 kudos

The error indicates a missing Kafka client dependency for Spark in Databricks. Ensure the correct Kafka connector library is attached to your Databricks cluster, such as org.apache.spark:spark-sql-kafka-0-10_2.12:x.x.x (replace x.x.x with your Spark ...

  • 1 kudos
1 More Replies
JothyGanesan
by New Contributor II
  • 377 Views
  • 3 replies
  • 0 kudos

DLT Merge tables into Delta

We are trying to load a Delta table from streaming tables using DLT. This target table needs a MERGE of 3 source tables. But when we use the DLT command with merge it says Merge is not supported. Is this anything related to DLT version? Please help u...

  • 377 Views
  • 3 replies
  • 0 kudos
Latest Reply
RiyazAli
Valued Contributor II
  • 0 kudos

Hey @JothyGanesan Please take a look at the Apply Changes API - https://docs.databricks.com/en/delta-live-tables/cdc.htmlThis is a replacement of MERGE INTO in Databricks.Cheers!

  • 0 kudos
2 More Replies
Taja
by New Contributor II
  • 162 Views
  • 1 replies
  • 0 kudos

Delta Live Tables: large use

Does anyone use Delta Live Table on large scale in production pipelines ? Are they satisfied with the product ?Recently, I´ve started a PoC to evaluate the DLT and notice some concerns:- Excessive use of compute resources when you check the cluster m...

  • 162 Views
  • 1 replies
  • 0 kudos
Latest Reply
RiyazAli
Valued Contributor II
  • 0 kudos

Hi @Taja,I agree that DLT pipelines doesn't accept a single node cluster to begin with but you can always choose the instance type for both your driver and the worker nodes.As far as `waiting for resources` time is concerned, I've seen that DLT takes...

  • 0 kudos
NK_123
by New Contributor II
  • 715 Views
  • 3 replies
  • 0 kudos

DELTA_INVALID_SOURCE_VERSION issue on spark structure streaming

I am doing a structure streaming and getting this error on databricks, the source table already have 2 versions(0,1). It is still not able to find  Query {'_id': UUID('fe7a563e-f487-4d0e-beb0-efe794ab4708'), '_runId': UUID('bf0e94b5-b6ce-42bb-9bc7-15...

  • 715 Views
  • 3 replies
  • 0 kudos
Latest Reply
lukinkratas
New Contributor II
  • 0 kudos

Are you using checkpoints? If so, make sure the permisions to that location are ok, alternatively delete all the checkpoints, you have created in that location and try again. This was my case. 

  • 0 kudos
2 More Replies
Akash_Wadhankar
by New Contributor III
  • 156 Views
  • 0 replies
  • 1 kudos

Data Engineering Journey on Databricks

For any new Data Engineering aspirant, it has always been a difficult where to start the learning journey. I faced this challenge a decade ago. In order to help new aspirants I created a series of medium article for new learners. I hope it brings mor...

  • 156 Views
  • 0 replies
  • 1 kudos
robbe
by New Contributor III
  • 1882 Views
  • 3 replies
  • 1 kudos

Resolved! Get job ID from Asset Bundles

When using Asset Bundles to deploy jobs, how does one get the job ID of the resources that are created?I would like to deploy some jobs through asset bundles, get the job IDs, and then trigger these jobs programmatically outside the CI/CD pipeline us...

  • 1882 Views
  • 3 replies
  • 1 kudos
Latest Reply
nvashisth
New Contributor III
  • 1 kudos

Refer this answer and this can be a solution to above scenario -> https://community.databricks.com/t5/data-engineering/getting-job-id-dynamically-to-create-another-job-to-refer-as-job/m-p/102860/highlight/true#M41252

  • 1 kudos
2 More Replies
David_Billa
by New Contributor III
  • 166 Views
  • 1 replies
  • 0 kudos

Unable to convert to date from datetime string with AM and PM

Any help to understand why it's showing 'null' instead of the date value? It's showing null only for 12:00:00 AM and for any other values it's showing date correctlyTO_DATE("12/30/2022 12:00:00 AM", "MM/dd/yyyy HH:mm:ss a") AS tsDate 

  • 166 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @David_Billa, Can you try with: TO_TIMESTAMP("12/30/2022 12:00:00 AM", "MM/dd/yyyy hh:mm:ss a") AS tsDate The issue you are encountering with the TO_DATE function returning null for the value "12:00:00 AM" is likely due to the format string not ma...

  • 0 kudos
ramy
by New Contributor II
  • 1584 Views
  • 4 replies
  • 1 kudos

Getting JOB-ID dynamically to create another job to refer as job-task

I am trying to create a new job in Databricks Asset Bundles which refers to another job-task and passing parameters to it. However, the previous job is not created yet (Or will be cretead using Databricks asset bundles in higher envs when deploying t...

  • 1584 Views
  • 4 replies
  • 1 kudos
Latest Reply
nvashisth
New Contributor III
  • 1 kudos

Hi Team,Yes this solution of referencing the variable ${resources.jobs.<job-name>.id} works within the same bundle and this should be a solution. I had tried that by creating multiple workflows and referencing that.

  • 1 kudos
3 More Replies
najmead
by Contributor
  • 21525 Views
  • 7 replies
  • 13 kudos

How to convert string to datetime with correct timezone?

I have a field stored as a string in the format "12/30/2022 10:30:00 AM"If I use the function TO_DATE, I only get the date part... I want the full date and time.If I use the function TO_TIMESTAMP, I get the date and time, but it's assumed to be UTC, ...

  • 21525 Views
  • 7 replies
  • 13 kudos
Latest Reply
Rajeev_Basu
Contributor III
  • 13 kudos

use from_utc_timestamp(to_timestam("<string>", <format>),<timezone>)

  • 13 kudos
6 More Replies
Svish
by New Contributor III
  • 503 Views
  • 3 replies
  • 0 kudos

Resolved! DLT: Schema mismatch error

HiI am encountering the following error when writing a DLT pipeline. Here is my workflow:Read a bronze delta tableCheck Data Quality RulesWrite clean records to a silver table with defined schema. I use TRY_CAST for columns where there is mismatch be...

  • 503 Views
  • 3 replies
  • 0 kudos
Latest Reply
filipniziol
Contributor III
  • 0 kudos

Hi @Svish ,You have one line that differs:JOB_CERTREP_CONTRACT_INT: string (nullable = true) vs. JOB_CERTREP_CONTRACT_NUMBER: string (nullable = true) 

  • 0 kudos
2 More Replies
stevewb
by New Contributor II
  • 433 Views
  • 2 replies
  • 1 kudos

Resolved! databricks bundle deploy fails when job includes dbt task and git_source

I am trying to deploy a dbt task as part of a databricks job using databricks asset bundles.However, there seems to be a clash that occurs when specifying a job that includes a dbt task that causes a bizarre failure.I am using v0.237.0 of the CLI.Min...

  • 433 Views
  • 2 replies
  • 1 kudos
Latest Reply
madams
Contributor
  • 1 kudos

Thanks for providing that whole example, it was really easy to fiddle with.  I think I've found your solution.  Update the original two tasks on the job (if you want to keep them) like this: tasks: - task_key: notebook_task job...

  • 1 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels