cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

KKo
by Contributor III
  • 3 Views
  • 0 replies
  • 0 kudos

DDL script to upper environment

I have multiple databases created in unity catalog in a DEV databricks workspace, I used databricks UI/notebook and ran scripts to do it. Now, I want to have those databases in QA and PROD workspaces as well. What is the best way to run those DDLs in...

  • 3 Views
  • 0 replies
  • 0 kudos
Bhavana_Y
by New Contributor
  • 9 Views
  • 0 replies
  • 0 kudos

Learning Path for Spark Developer Associate

Hello Everyone,Happy for being a part of Virtual Journey !!Enrolled in Associate Spark Developer and completed learning path in Databricks Academy. Can anyone please confirm is completing learning path enough for obtaining 50% off voucher for certifi...

Screenshot (15).png
  • 9 Views
  • 0 replies
  • 0 kudos
ckough
by New Contributor III
  • 54739 Views
  • 47 replies
  • 25 kudos

Resolved! Cannot sign in at databricks partner-academy portal

Hi thereI have used my company email to register an account for customer-academy.databricks.com a while back. Now what I need to do is create an account with partner-academy.databricks.com using my company email too.However when I register at partner...

  • 54739 Views
  • 47 replies
  • 25 kudos
Latest Reply
cpelletier360
New Contributor
  • 25 kudos

Also facing the same issue. I will log a ticket.

  • 25 kudos
46 More Replies
elliottatreef
by New Contributor
  • 63 Views
  • 3 replies
  • 1 kudos

Serverless environment not respecting environment spec on run_job_task

When running a job via a `run_job_task`, the job triggered is not using the specified serverless environment. I've configured my job to use serverless `environment_version` "3" with a dependency built into my workspace, but whenever I run the job, it...

Screenshot 2025-10-15 at 11.40.45 AM.png Screenshot 2025-10-15 at 11.43.39 AM.png
  • 63 Views
  • 3 replies
  • 1 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 1 kudos

@elliottatreef Can you try to set the Environment version on the source notebook and then trigger the job?On notebook -> Serverless -> configuration -> Environment version drop down. Then, in your job, making sure it’s assigning to the Serverless com...

  • 1 kudos
2 More Replies
donlxz
by New Contributor III
  • 77 Views
  • 3 replies
  • 2 kudos

deadlock occurs with use statement

When issuing a query from Informatica using a Delta connection, the statement use catalog_name.schema_name is executed first. At that time, the following error appeared in the query history:Query could not be scheduled: (conn=5073499)Deadlock found w...

  • 77 Views
  • 3 replies
  • 2 kudos
Latest Reply
donlxz
New Contributor III
  • 2 kudos

Hi @ManojkMohan Thank you for your response.I understand that adjustments are needed on the Informatica side, and I’ll ask them to review the deadlock retry settings.Is there anything that can be changed or configured on the Databricks side to help w...

  • 2 kudos
2 More Replies
Mous92i
by New Contributor
  • 93 Views
  • 2 replies
  • 0 kudos

Liquid Clustering With Merge

Hello I’m facing severe performance issues with a  merge into databricksmerge_condition = """ source.data_hierarchy = target.data_hierarchy AND source.sensor_id = target.sensor_id AND source.timestamp = target.timestamp """The target Delt...

  • 93 Views
  • 2 replies
  • 0 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 0 kudos

Hi @Mous92i  DFP is what pushes source filters down to the target to skip files. For MERGE/UPDATE/DELETE, DFP only works on Photon-enabled compute. If you’re not on Photon, MERGE will scan everything.Enabling Liquid Clustering doesn’t recluster past ...

  • 0 kudos
1 More Replies
georgemichael40
by New Contributor III
  • 102 Views
  • 4 replies
  • 5 kudos

Resolved! Python Wheel in Serverless Job in DAB

Hey,I am trying to run a job with serverless compute, that runs python scripts.I need the paramiko package to get my scripts to work. I managed to get it working by doing:environments:- environment_key: default# Full documentation of this spec can be...

  • 102 Views
  • 4 replies
  • 5 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 5 kudos

Hi @georgemichael40 ,Put your whl file in the volume and then you can reference it in following way in your DAB file:dependencies: - " /Volumes/workspace/default/my_volume/hellopkg-0.0.1-py3-none-any.whl"https://docs.databricks.com/aws/en/compute/s...

  • 5 kudos
3 More Replies
dndeng
by Visitor
  • 28 Views
  • 2 replies
  • 0 kudos

Query to calculate cost of task from each job by day

I am trying to find the cost per Task in each Job every time it was executed (daily) but currently getting very huge numbers due to duplicates, can someone help me ?   WITH workspace AS ( SELECT account_id, workspace_id, workspace_name,...

  • 28 Views
  • 2 replies
  • 0 kudos
Latest Reply
nayan_wylde
Honored Contributor III
  • 0 kudos

It seems the duplicates are caused by the task_change_time from the job_tasks table. Even though the table definition shows task_change_time is the time last time the task was modifed.. But it is capturing different times and it is SCD type 2 table. ...

  • 0 kudos
1 More Replies
thib
by New Contributor III
  • 8552 Views
  • 5 replies
  • 3 kudos

Can we use multiple git repos for a job running multiple tasks?

I have a job running multiple tasks :Task 1 runs a machine learning pipeline from git repo 1Task 2 runs an ETL pipeline from git repo 1Task 2 is actually a generic pipeline and should not be checked in repo 1, and will be made available in another re...

image
  • 8552 Views
  • 5 replies
  • 3 kudos
Latest Reply
tors_r_us
New Contributor II
  • 3 kudos

Had this same problem. Fix was to have two workflows with no triggers, each pointing to the respective git repo. Then setup a 3rd workflow with appropriate triggers/schedule which calls the first 2 workflows. A workflow can run other workflows. 

  • 3 kudos
4 More Replies
shreya24
by New Contributor II
  • 1780 Views
  • 1 replies
  • 2 kudos

Geometry Type not converted into proper binary format when reading through Federated Catalog

Hi,When reading a geometry column from a sql server into Databricks through foreign/federated catalog the tranformation of geometry type to binary type is not in proper format or I am not able to find a way I can decode that binary.for example, for p...

  • 1780 Views
  • 1 replies
  • 2 kudos
Latest Reply
AbhaySingh
Visitor
  • 2 kudos

Give this a shotCreate a view in SQL Server that converts geometry to Well-Known Text before federating:-- Create view in SQL ServerCREATE VIEW dbo.vw_spatial_converted ASSELECTid,location_name,location.STAsText() AS geom_wkt,location.STSrid() AS sri...

  • 2 kudos
databricksero
by New Contributor
  • 199 Views
  • 7 replies
  • 3 kudos

DLT pipeline fails with “can not infer schema from empty dataset” — works fine when run manually

Hi everyone,I’m running into an issue with a Delta Live Tables (DLT) pipeline that processes a few transformation layers (raw → intermediate → primary → feature).When I trigger the entire pipeline, it fails with the following error:can not infer sche...

  • 199 Views
  • 7 replies
  • 3 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 3 kudos

@databricksero  Explicit Schema Definition: When calling spark.createDataFrame(pdf_cleaned), explicitly provide the schema even if the DataFrame is empty. This helps Spark infer the types and prevents the “cannot infer schema from empty dataset” erro...

  • 3 kudos
6 More Replies
chanukya-pekala
by Contributor III
  • 112 Views
  • 4 replies
  • 4 kudos

Resolved! Lost access to Databricks account console on Free Edition

Hi everyone,I'm having trouble accessing the Databricks account console and need some guidance.Background:I successfully set up Databricks Free Edition with Terraform using my personal accountI was able to access accounts.cloud.databricks.com to obta...

  • 112 Views
  • 4 replies
  • 4 kudos
Latest Reply
chanukya-pekala
Contributor III
  • 4 kudos

I just double checked, I was able to manage my personal workspace through terraform without account console. Thanks again.

  • 4 kudos
3 More Replies
stevewb
by New Contributor III
  • 76 Views
  • 1 replies
  • 0 kudos

Errors in runtime 17 today

Anyone else getting a bunch of errors on runtime 17 today? A load of our pipelines that were running smoothly suddenly stopped working with driver crashes. I was able to get us running again by downgrading to runtime 16, but curious if anyone else hi...

  • 76 Views
  • 1 replies
  • 0 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 0 kudos

@stevewb Driver crash is very generic. We may need to dig deeper here to understand the root cause. Can you raise a support ticket with us? 

  • 0 kudos
manugarri
by New Contributor II
  • 18560 Views
  • 11 replies
  • 2 kudos

Fuzzy text matching in Spark

I have a list of client provided data, a list of company names. I have to match those names with an internal database of company names. The client list can fit in memory (its about 10k elements) but the internal dataset is on hdfs and we use Spark ...

  • 18560 Views
  • 11 replies
  • 2 kudos
Latest Reply
Edthehead
Contributor III
  • 2 kudos

You can refer to this article Optimizing Large-Scale Fuzzy Matching with Apache Spark and Databricks | by Gavaragirijarani | Medium.As far as open-source libraries go, rapidfuzz is known to be faster than fuzzywuzzy.

  • 2 kudos
10 More Replies
surajitDE
by New Contributor III
  • 75 Views
  • 2 replies
  • 0 kudos

Question on assigning email_notification_group to DLT Job Notifications?

Hi Folks,I wanted to check if there’s a way to assign an email notification group to a Delta Live Tables (DLT) job for notifications.I know that it’s possible to configure Teams workflows and email notification groups for Databricks jobs, but in the ...

  • 75 Views
  • 2 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor
  • 0 kudos

Hi @surajitDE ,At the moment, DLT doesn’t support linking existing email notification groups or Teams workflows directly. You can only add individual email addresses in the DLT UI.If you have a group email alias, you can use it as a single address so...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels