Data Engineering

Forum Posts

Sorted by:

by Snoonan • Visitor

an hour ago

14 Views
2 replies
0 kudos

Unity catalog issues

Hi all,I have recently enabled Unity catalog in my DBX workspace. I have created a new catalog with an external location on Azure data storage.I can create new schemas(databases) in the new catalog but I can't create a table. I get the below error wh...

Data Engineering

14 Views
2 replies
0 kudos

an hour ago

View Replies

Latest Reply

daniel_sahal
Honored Contributor III

59m ago

0 kudos

@Snoonan Make sure that permissions are correct.Databricks Access Connector requires at least:- Blob Data Reader on storage,- Blob Data Contributor on container

0 kudos

59m ago

1 More Replies

by dashawn • Visitor

yesterday

26 Views
1 replies
0 kudos

DLT Pipeline Error Handling

Hello all.We are a new team implementing DLT and have setup a number of tables in a pipeline loading from s3 with UC as the target. I'm noticing that if any of the 20 or so tables fail to load, the entire pipeline fails even when there are no depende...

Data Engineering

Delta Live Tables

26 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

2 hours ago

0 kudos

Hi @dashawn, When data processing fails, manual investigation of logs to understand the failures, data cleanup, and determining the restart point can be time-consuming and costly. DLT provides features to handle errors more intelligently.By default,...

0 kudos

2 hours ago

by Karlo_Kotarac • New Contributor II

Wednesday

43 Views
2 replies
0 kudos

Run failed with error message ContextNotFound

Hi all!Recently we've been getting lots of these errors when running Databricks notebooks:At that time we observed DRIVER_NOT_RESPONDING (Driver is up but is not responsive, likely due to GC.) log on the single-user cluster we use.Previously when thi...

Data Engineering

43 Views
2 replies
0 kudos

Wednesday

View Replies

Latest Reply

jose_gonzalez
Moderator

yesterday

0 kudos

are you able to get the full error stack trace from the driver's logs?

0 kudos

yesterday

1 More Replies

by amde99 • New Contributor

a week ago

156 Views
1 replies
0 kudos

How can I throw an exception when a .json.gz file has multiple roots?

I have a situation where source files in .json.gz sometimes arrive with invalid syntax containing multiple roots separated by empty braces []. How can I detect this and thrown an exception? Currently the code runs and picks up only record set 1, and ...

Data Engineering

json

156 Views
1 replies
0 kudos

a week ago

View Replies

Latest Reply

daniel_sahal
Honored Contributor III

2 hours ago

0 kudos

@amde99 Changing the mode to FAILFAST should be able to help you with throwing an exception. https://spark.apache.org/docs/latest/sql-data-sources-json.html

0 kudos

2 hours ago

by mvmiller • New Contributor III

yesterday

166 Views
1 replies
0 kudos

Workflow file arrival trigger - does it apply to overwritten files?

I am exploring the use of the "file arrival" trigger for a workflow for a use case I am working on. I understand from the documentation that it checks every minute for new files in an external location, then initiates the workflow when it detects a ...

Data Engineering

166 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Rajani
New Contributor III

yesterday

0 kudos

Hi @mvmiller The "file arrival" trigger for a workflow considers the name of the file,when the same name file was overwritten the workflow didnt triggerred.hope I answered your question!

0 kudos

yesterday

by dollyb • New Contributor III

Monday

106 Views
1 replies
0 kudos

Differences between Spark SQL and Databricks

Hello,I'm using a local Docker Spark 3.5 runtime to test my Databricks Connect code. However I've come across a couple of cases where my code would work in one environment, but not the other.Concrete example, I'm reading data from BigQuery via spark....

Data Engineering

106 Views
1 replies
0 kudos

Monday

View Replies

Latest Reply

daniel_sahal
Honored Contributor III

yesterday

0 kudos

@dollyb That's because when you've added another dependency on Databricks, it doesn't really know which one it should use. By default it's using built-in com.google.cloud.spark.bigquery.BigQueryRelationProvider.What you can do is provide whole packag...

0 kudos

yesterday

by HaripriyaP • New Contributor

Wednesday

61 Views
1 replies
0 kudos

Multiple Notebooks Migration from one workspace to another without using Git.

Hi all!I need to migrate multiple notebooks from one workspace to another. Is there any way to do it without using Git?Since Manual Import and Export is difficult to do for multiple notebooks and folders, need an alternate solution.Please reply as so...

Data Engineering

61 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

daniel_sahal
Honored Contributor III

yesterday

0 kudos

@HaripriyaP You can use databricks CLI to export and import notebooks from one workspace to another.CLI Documentation here:https://github.com/databricks/cli/blob/main/docs/commands.md#databricks-workspace-export---export-a-workspace-object

0 kudos

yesterday

by HaripriyaP • New Contributor

Wednesday

78 Views
1 replies
0 kudos

Multiple Tables Migration from one workspace to another.

Hi all!I need to copy multiple tables from one workspace to another with metadata information. Is there any way to do it?Please reply as soon as possible.

Data Engineering

78 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

shan_chandra
Honored Contributor III

yesterday

0 kudos

@HaripriyaP - Depends on your use case, Either of the below approach can be chosen. 1) DELTA CLONE(DEEP CLONE) to clone them to the new workspace. 2) Have the same cluster policy/Instance profile of the old workspace to access them in the new worksp...

0 kudos

yesterday

by bozhu • Contributor

08-22-2023 4:04:51 AM

925 Views
4 replies
0 kudos

Delta Live Tables Materialised View Column Comment Error

While materialised view doc says MVs support columns comments, this does not seem like the case for MVs created by DLT. For example, when trying to add a comment to a MV created by DLT, it errors:Any ideas on when this will be fixed/supported?

Data Engineering

925 Views
4 replies
0 kudos

08-22-2023 4:04:51 AM

View Replies

Latest Reply

bozhu
Contributor

09-17-2023 9:25:25 PM

0 kudos

Just to close the loop here that it seems DLT generated MVs now support column comments.

0 kudos

09-17-2023 9:25:25 PM

3 More Replies

by Chinu • New Contributor III

09-13-2023 12:19:44 PM

986 Views
1 replies
0 kudos

How do I access to DLT advanced configuration from python notebook?

Hi Team, Im trying to get DLT Advanced Configuration value from the python dlt notebook. For example, I set "something": "some path" in Advanced configuration in DLT and I want to get the value from my dlt notebook. I tried "dbutils.widgets.get("some...

Data Engineering

986 Views
1 replies
0 kudos

09-13-2023 12:19:44 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

yesterday

0 kudos

The following docs will help. Please check the examples https://docs.databricks.com/en/delta-live-tables/settings.html#parameterize-pipelines

0 kudos

yesterday

by jenshumrich • New Contributor III

a week ago

68 Views
1 replies
0 kudos

Filter not using partition

I have the following code:spark.sparkContext.setCheckpointDir("dbfs:/mnt/lifestrategy-blob/checkpoints") result_df.repartitionByRange(200, "IdStation") result_df_checked = result_df.checkpoint(eager=True) unique_stations = result_df.select("IdStation...

Data Engineering

68 Views
1 replies
0 kudos

a week ago

View Replies

Latest Reply

jose_gonzalez
Moderator

yesterday

0 kudos

Please check the physical query plan. Add .explain() API to your existing call and check the physical query plan for any filter push-down values happening in your query.

0 kudos

yesterday

by toolhater • New Contributor II

a week ago

70 Views
1 replies
0 kudos

Installing dlt causing error

I'm trying to use the example in big book of engineering 2nd edition-final.pdf and I had an issue with the statementimport dltSo I created another cell and installed it and I noticed I was getting this error:"dataclass_transform() got an unexpected k...

Data Engineering

70 Views
1 replies
0 kudos

a week ago

View Replies

Latest Reply

jose_gonzalez
Moderator

yesterday

0 kudos

could you get the full error stack trace please

0 kudos

yesterday

by anish2102 • New Contributor

Sunday

84 Views
1 replies
0 kudos

Pyspark operations slowness in CLuster 14.3LTS as compared to 13.3 LTS

In my notebook, i am performing few join operations which are taking more than 30s in cluster 14.3 LTS where same operation is taking less than 4s in 13.3 LTS cluster. Can someone help me how can i optimize pyspark operations like joins and withColum...

Data Engineering

clustr-14.3

spark-3.5

84 Views
1 replies
0 kudos

Sunday

View Replies

Latest Reply

jose_gonzalez
Moderator

yesterday

0 kudos

check the physical query plan for both, DBR 14.3 and 13.3 to compare if these values are different. If they are, then check the Spark UI to identify where did it changed

0 kudos

yesterday

by Hubert-Dudek • Esteemed Contributor III

yesterday

408 Views
1 replies
0 kudos

Nulls in Merge

If you are going to handle any null values in your MERGE condition, better watch out for your syntax #databricks

Data Engineering

408 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

jose_gonzalez
Moderator

yesterday

0 kudos

Thank you for sharing @Hubert-Dudek

0 kudos

yesterday

by carlosancassani • New Contributor III

01-05-2024 4:31:16 AM

498 Views
2 replies
0 kudos

Update DeltaTable on column type ArrayType(): add element to array

Hi all,I need to perform an Update on a Delta Table adding elements to a column of ArrayType(StringType()) which is initialized empty.Before UpdateCol_1 StringType()Col_2 StringType()Col_3 ArrayType()ValVal[ ]After UpdateCol_1 StringType()Col_2 Strin...

Data Engineering

deltatable

Update

498 Views
2 replies
0 kudos

01-05-2024 4:31:16 AM

View Replies

Latest Reply

Kaniz
Community Manager

01-18-2024 2:36:02 AM

0 kudos

Hi @carlosancassani, It seems like you’re trying to append a string to an array column in a Delta table. The error you’re encountering is because you’re trying to assign a string value to an array column, which is not allowed due to type mismatch. To...

0 kudos

01-18-2024 2:36:02 AM

1 More Replies

User

Count

1599

734

343

284

246

Databricks

Forum Posts

Unity catalog issues

DLT Pipeline Error Handling

Run failed with error message ContextNotFound

How can I throw an exception when a .json.gz file has multiple roots?

Workflow file arrival trigger - does it apply to overwritten files?

Differences between Spark SQL and Databricks

Multiple Notebooks Migration from one workspace to another without using Git.

Multiple Tables Migration from one workspace to another.

Delta Live Tables Materialised View Column Comment Error

How do I access to DLT advanced configuration from python notebook?

Filter not using partition

Installing dlt causing error

Pyspark operations slowness in CLuster 14.3LTS as compared to 13.3 LTS

Nulls in Merge

Update DeltaTable on column type ArrayType(): add element to array

Unit Testing with the new Databricks Connect in Py...

Cluster pools

What is difference between streaming and streaming...

Liquid Clustering With Merge

Accessing ADLS Gen 2 Raw Files with UC ?