cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Mkk1
by New Contributor
  • 1772 Views
  • 1 replies
  • 0 kudos

Joining tables across DLT pipelines

How can I join a silver table (s1) from a DLT pipeline (D1) to another silver table (S2) from a different DLT pipeline (D2)?#DLT #DeltaLiveTables

  • 1772 Views
  • 1 replies
  • 0 kudos
Latest Reply
JothyGanesan
New Contributor III
  • 0 kudos

@Mkk1 Did you get to get this completed? We are in the similar situation, how did you get to acheive this?

  • 0 kudos
MAHANK
by New Contributor II
  • 4044 Views
  • 3 replies
  • 0 kudos

How to compare two databricks notebooks which are in different folders? note we dont have GIT setup

we would to like compare two notebooks which are in different folders , we are yet set up a GIT repo for these folders.?what are the other options we have to compare two notebooks?thanksNAnda  

  • 4044 Views
  • 3 replies
  • 0 kudos
Latest Reply
arekmust
New Contributor III
  • 0 kudos

Then using the Repos and Git (GitHub/Azure DevOps) is the way to go!

  • 0 kudos
2 More Replies
MatthewMills
by Databricks Partner
  • 5808 Views
  • 3 replies
  • 7 kudos

Resolved! DLT Apply Changes Tables corrupt

Got a weird DLT error.Test harness using the new(ish) 'Apply Changes from Snapshot' Functionality and DLT Serverless (Current Channel). Azure Aus East Region.Has been working for several months without issue - but within the last week these DLT table...

Data Engineering
Apply Changes From Snapshot
dlt
  • 5808 Views
  • 3 replies
  • 7 kudos
Latest Reply
Lakshay
Databricks Employee
  • 7 kudos

We have an open ticket on this issue. The issue is caused by the maintenance pipeline renaming the backing table. We expect the fix to be rolled out soon for this issue.

  • 7 kudos
2 More Replies
shubham_007
by Contributor III
  • 1495 Views
  • 1 replies
  • 0 kudos

Urgent !! Need information/details and reference link on below two topics:

Dear experts,I need urgent help and guidance on information/details with reference links on below topics:Steps on Package Installation with Serverless in Databricks.What are Delta Lake Connector with serverless ? How to run Delta Lake queries outside...

  • 1495 Views
  • 1 replies
  • 0 kudos
Latest Reply
brockb
Databricks Employee
  • 0 kudos

Seems like a duplicate: https://community.databricks.com/t5/data-engineering/urgent-need-information-details-and-reference-link-on-below-two/td-p/107260

  • 0 kudos
data-grassroots
by New Contributor III
  • 9223 Views
  • 7 replies
  • 1 kudos

Resolved! Ingesting Files - Same file name, modified content

We have a data feed with files whose filenames stays the same but the contents change over time (brand_a.csv, brand_b.csv, brand_c.csv ....).Copy Into seems to ignore the files when they change.If we set the Force flag to true and run it, we end up w...

  • 9223 Views
  • 7 replies
  • 1 kudos
Latest Reply
data-grassroots
New Contributor III
  • 1 kudos

Thanks for the validation, Werners! That's the path we've been heading down (copy + merge). I still have some DLT experiments planned but - at least for this situation - copy + merge works just fine.

  • 1 kudos
6 More Replies
peter_ticker
by New Contributor III
  • 13080 Views
  • 17 replies
  • 2 kudos

XML Auto Loader rescuedDataColumn Doesn't Rescue Array Fields

Hiya! I'm interested whether anyone has a solution to the following problem. If you load XML using Auto Loader or otherwise and set the schema to be such that a single value is assumed for a given xpath but the actual XML contains multiple values (i....

  • 13080 Views
  • 17 replies
  • 2 kudos
Latest Reply
Witold
Databricks Partner
  • 2 kudos

Let me rephrase it. You can't use Message as the rowTag, because it's the root element. rowTag implies that it's a tag within the root element, which might occur multiple times. Check the docs around reading and write XML files, there you'll find exa...

  • 2 kudos
16 More Replies
evangelos
by New Contributor III
  • 7110 Views
  • 5 replies
  • 0 kudos

Resolved! Databricks asset bundles: name_prefix doesn't work with presets

Hello!I am deploying a databricks workflow using bundles and want to attach the prefix "prod_" to the name of my job.My target uses the `mode: production` and I follow the instructions in https://learn.microsoft.com/en-us/azure/databricks/dev-tools/b...

  • 7110 Views
  • 5 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

You need to attach the prefix "prod_" to the name of your job in a Databricks workflow using bundles, you need to ensure that the name_prefix preset is correctly configured in your databricks.yml file.   targets: prod: mode: production pres...

  • 0 kudos
4 More Replies
oakhill
by New Contributor III
  • 6744 Views
  • 3 replies
  • 1 kudos

How do we create a job cluster in Databricks Asset Bundles for use across different jobs?

When developing jobs on DABs, we use new_cluster to create a cluster for a particular job. I think it's a lot of lines and YAML when what I really need is a "small cluster" and "big cluster" to reference for certain kind of jobs. Tags would be on the...

  • 6744 Views
  • 3 replies
  • 1 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 1 kudos

Hi @oakhill ,You can specify you job cluster configuration in your variables:variables: small_cluster_id: description: "The small cluster with 2 workers used by the jobs" type: complex default: spark_version: "15.4.x-scala2.12" ...

  • 1 kudos
2 More Replies
saniok
by New Contributor II
  • 2497 Views
  • 2 replies
  • 0 kudos

How to Handle Versioning in Databricks Asset Bundles?

 Hi everyone,In our organization, we are transitioning from defining Databricks jobs using the UI to managing them with asset bundles. Since asset bundles can be deployed across multiple workspaces—each potentially having multiple targets (e.g., stag...

  • 2497 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @saniok,   In databricks.yml file you can include version information in this file to manage different versions of your bundles.Example: bundle: name: my-bundle version: 1.0.0 resources: jobs: my-job: name: my-job ...

  • 0 kudos
1 More Replies
Avinash_Narala
by Databricks Partner
  • 3898 Views
  • 7 replies
  • 3 kudos

Resolved! SQL Server to Databricks Migration

Hi,I want to build a python function to migrate SQL Server tables to Databricks.Is there any guide/ best practices on how to do so.It'll be really helpful if there is any.Regards,Avinash N

  • 3898 Views
  • 7 replies
  • 3 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 3 kudos

Hi @Avinash_Narala ,If it is lift and shift, then try this:1. Set up Lakehouse Federation to SQL Server2. Use CTAS statements to copy each table into Unity Catalog CREATE TABLE catalog_name.schema_name.table_name AS SELECT * FROM sql_server_catalog_...

  • 3 kudos
6 More Replies
jeremy98
by Honored Contributor
  • 10751 Views
  • 22 replies
  • 1 kudos

wheel package to install in a serveless workflow

Hi guys, Which is the way through Databricks Asset Bundle to declare a new job definition having a serveless compute associated on each task that composes the workflow and be able that inside each notebook task definition is possible to catch the dep...

  • 10751 Views
  • 22 replies
  • 1 kudos
Latest Reply
jeremy98
Honored Contributor
  • 1 kudos

Ping @Alberto_Umana 

  • 1 kudos
21 More Replies
dbx-user7354
by New Contributor III
  • 8659 Views
  • 7 replies
  • 3 kudos

Pyspark Dataframes orderby only orders within partition when having multiple worker

I came across a pyspark issue when sorting the dataframe by a column. It seems like pyspark only orders the data within partitions when having multiple worker, even though it shouldn't.  from pyspark.sql import functions as F import matplotlib.pyplot...

dbxuser7354_0-1711014288660.png dbxuser7354_1-1711014300462.png
  • 8659 Views
  • 7 replies
  • 3 kudos
Latest Reply
Avinash_Narala
Databricks Partner
  • 3 kudos

Hi @dbx-user7354 ,OrderBy() should perform a global sort as showed in plot-2, but as per your problem it is sorting the data within the partitions which is the behavior of sortWithinPartitions(), so to solve this error. Please try with the latest DBR...

  • 3 kudos
6 More Replies
SwathiChidurala
by New Contributor II
  • 9255 Views
  • 2 replies
  • 3 kudos

Resolved! deltaformat

Hi,I am a student who learning databricks, In the below code I tried to write data in delta format to a gold layer. I authenticated using the service principle method to read, write and execute data , I assigned the storage blob contributor role, but...

  • 9255 Views
  • 2 replies
  • 3 kudos
Latest Reply
Avinash_Narala
Databricks Partner
  • 3 kudos

Hi @SwathiChidurala ,The error is because you don't have the folder trip_zone inside the gold folder, so you can try by removing the trip_zone from the location or adding the folder trip_zone inside the gold folder in adls and then try it again.If th...

  • 3 kudos
1 More Replies
Abdurrahman
by New Contributor II
  • 2430 Views
  • 3 replies
  • 3 kudos

Move files from DBFS to Workspace Folders databricks

I want to move a zip file from DBFS to a workspace folder.I am using dbutils.fs.cp("dbfs file path", "workspace folder path"), in databricks notebook and I am seeing the following error - ExecutionError: An error occurred while calling o455.cp. : jav...

  • 2430 Views
  • 3 replies
  • 3 kudos
Latest Reply
nick533
New Contributor III
  • 3 kudos

Permission denied appears to be the cause of the error message. To read from the DBFS path and write to the workspace folder, please make sure you have the required permissions. The following permissions may be required:The DBFS file path can be read...

  • 3 kudos
2 More Replies
nhakobian
by Databricks Partner
  • 1195 Views
  • 1 replies
  • 0 kudos

Python Artifact Installation Error on Runtime 16.1 on Shared Clusters

I've run into an issue with no clear path to resolution.Due to various integrations we have in Unity Catalog, some jobs we have to run in a Shared Cluster environment in order to authenticate properly to the underlying data resource. When setting up ...

  • 1195 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

The deprecation of the Enable libraries and init scripts on shared Unity Catalog clusters setting is in Databricks Runtime 16.0 and above. Please refer to the documentation here  for deprecation. Disabling this feature at the workspace level would pr...

  • 0 kudos
Labels