Data Engineering

Forum Posts

Sorted by:

by andre_rizzatti • Visitor

3m ago

0 Views
0 replies
0 kudos

Ingest __databricks_internal catalog - PERMISSION DENIED

Good morning, I have a DLT process with CDC incremental load and I need to ingest the history as CDC transactions are only recent. To do this I need to ingest data in the __databricks_internal catalog. In my case, as I am full admin, I can do it, how...

Data Engineering

0 Views
0 replies
0 kudos

3m ago

by amde99 • New Contributor

a week ago

158 Views
2 replies
0 kudos

How can I throw an exception when a .json.gz file has multiple roots?

I have a situation where source files in .json.gz sometimes arrive with invalid syntax containing multiple roots separated by empty braces []. How can I detect this and thrown an exception? Currently the code runs and picks up only record set 1, and ...

Data Engineering

json

158 Views
2 replies
0 kudos

a week ago

View Replies

Latest Reply

Lakshay
Esteemed Contributor

36m ago

0 kudos

Schema validation should help here.

0 kudos

36m ago

1 More Replies

by Karlo_Kotarac • New Contributor II

Wednesday

45 Views
3 replies
0 kudos

Run failed with error message ContextNotFound

Hi all!Recently we've been getting lots of these errors when running Databricks notebooks:At that time we observed DRIVER_NOT_RESPONDING (Driver is up but is not responsive, likely due to GC.) log on the single-user cluster we use.Previously when thi...

Data Engineering

45 Views
3 replies
0 kudos

Wednesday

View Replies

Latest Reply

Lakshay
Esteemed Contributor

41m ago

0 kudos

You may also try to run the failing notebook on the job cluster

0 kudos

41m ago

2 More Replies

by amar1995 • Visitor

45m ago

11 Views
0 replies
0 kudos

Performance Issue with XML Processing in Spark Databricks

I am reaching out to bring attention to a performance issue we are encountering while processing XML files using Spark-XML, particularly with the configuration spark.read().format("com.databricks.spark.xml").Currently, we are experiencing significant...

Data Engineering

11 Views
0 replies
0 kudos

45m ago

by Kanti1989 • Visitor

an hour ago

11 Views
1 replies
0 kudos

Pyspark execution error

I am getting a error message when executing a simple pyspark code. Can anyone help me with this.

Data Engineering

11 Views
1 replies
0 kudos

an hour ago

View Replies

Latest Reply

Lakshay
Esteemed Contributor

53m ago

0 kudos

The error message says "system cannot find the file specified". Could you check in the error message which file it is complaining about?

0 kudos

53m ago

by amitca71 • Contributor II

05-14-2023 6:00:43 AM

3122 Views
5 replies
4 kudos

Resolved! exception when using java SQL client

Hi,I try to use java sql. i can see that the query on databricks is executed properly.However, on my client i get exception (see below).versions:jdk: jdk-20.0.1 (tryed also with version 16, same results)https://www.oracle.com/il-en/java/technologies/...

Data Engineering

3122 Views
5 replies
4 kudos

05-14-2023 6:00:43 AM

View Replies

Latest Reply

xebia
Visitor

an hour ago

4 kudos

I am using java 17 and getting the same error.

4 kudos

an hour ago

4 More Replies

by Snoonan • Visitor

3 hours ago

32 Views
3 replies
0 kudos

Unity catalog issues

Hi all,I have recently enabled Unity catalog in my DBX workspace. I have created a new catalog with an external location on Azure data storage.I can create new schemas(databases) in the new catalog but I can't create a table. I get the below error wh...

Data Engineering

32 Views
3 replies
0 kudos

3 hours ago

View Replies

Latest Reply

daniel_sahal
Honored Contributor III

an hour ago

0 kudos

@Snoonan First of all, check the networking tab on the storage account to see if it's behind firewall. If it is, make sure that Databricks/Storage networking is properly configured (https://learn.microsoft.com/en-us/azure/databricks/security/network/...

0 kudos

an hour ago

2 More Replies

by drag7ter • New Contributor II

yesterday

105 Views
1 replies
0 kudos

Configure Service Principle access to GiLab

I'm facing an issue while trying to run my job in db and my notebooks located in Git Lab. When I run job under my personal user_Id it works fine, because I added Git Lab token to my user_Id profile and job able to pull branch from repository. But whe...

Data Engineering

105 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

2 hours ago

0 kudos

Hi @drag7ter, There might be a missing piece in the setup. Ensure that you’ve correctly entered the Git provider credentials (username and personal access token) for your Service Principle.Confirm that you’ve selected the correct Git provider (GitLab...

0 kudos

2 hours ago

by cszczotka • New Contributor II

yesterday

318 Views
1 replies
0 kudos

Ephemeral storage how to create/mount.

Hi,I'm looking for information how to create/mount ephemeral storage to Databricks driver node in Azure Cloud. Does anyone have any experience working with ephemeral storage?Thanks,

Data Engineering

318 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

2 hours ago

0 kudos

Hi @cszczotka, Azure Databricks allows you to mount cloud object storage to the Databricks File System (DBFS) to simplify data access patterns for users who are unfamiliar with cloud concepts. Mounted data does not work with Unity Catalog, and Dat...

0 kudos

2 hours ago

by dashawn • Visitor

yesterday

34 Views
1 replies
0 kudos

DLT Pipeline Error Handling

Hello all.We are a new team implementing DLT and have setup a number of tables in a pipeline loading from s3 with UC as the target. I'm noticing that if any of the 20 or so tables fail to load, the entire pipeline fails even when there are no depende...

Data Engineering

Delta Live Tables

34 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

3 hours ago

0 kudos

Hi @dashawn, When data processing fails, manual investigation of logs to understand the failures, data cleanup, and determining the restart point can be time-consuming and costly. DLT provides features to handle errors more intelligently.By default,...

0 kudos

3 hours ago

by mvmiller • New Contributor III

yesterday

167 Views
1 replies
0 kudos

Workflow file arrival trigger - does it apply to overwritten files?

I am exploring the use of the "file arrival" trigger for a workflow for a use case I am working on. I understand from the documentation that it checks every minute for new files in an external location, then initiates the workflow when it detects a ...

Data Engineering

167 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Rajani
New Contributor III

yesterday

0 kudos

Hi @mvmiller The "file arrival" trigger for a workflow considers the name of the file,when the same name file was overwritten the workflow didnt triggerred.hope I answered your question!

0 kudos

yesterday

by dollyb • New Contributor III

Monday

107 Views
1 replies
0 kudos

Differences between Spark SQL and Databricks

Hello,I'm using a local Docker Spark 3.5 runtime to test my Databricks Connect code. However I've come across a couple of cases where my code would work in one environment, but not the other.Concrete example, I'm reading data from BigQuery via spark....

Data Engineering

107 Views
1 replies
0 kudos

Monday

View Replies

Latest Reply

daniel_sahal
Honored Contributor III

yesterday

0 kudos

@dollyb That's because when you've added another dependency on Databricks, it doesn't really know which one it should use. By default it's using built-in com.google.cloud.spark.bigquery.BigQueryRelationProvider.What you can do is provide whole packag...

0 kudos

yesterday

by HaripriyaP • New Contributor

Wednesday

61 Views
1 replies
0 kudos

Multiple Notebooks Migration from one workspace to another without using Git.

Hi all!I need to migrate multiple notebooks from one workspace to another. Is there any way to do it without using Git?Since Manual Import and Export is difficult to do for multiple notebooks and folders, need an alternate solution.Please reply as so...

Data Engineering

61 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

daniel_sahal
Honored Contributor III

yesterday

0 kudos

@HaripriyaP You can use databricks CLI to export and import notebooks from one workspace to another.CLI Documentation here:https://github.com/databricks/cli/blob/main/docs/commands.md#databricks-workspace-export---export-a-workspace-object

0 kudos

yesterday

by HaripriyaP • New Contributor

Wednesday

78 Views
1 replies
0 kudos

Multiple Tables Migration from one workspace to another.

Hi all!I need to copy multiple tables from one workspace to another with metadata information. Is there any way to do it?Please reply as soon as possible.

Data Engineering

78 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

shan_chandra
Honored Contributor III

yesterday

0 kudos

@HaripriyaP - Depends on your use case, Either of the below approach can be chosen. 1) DELTA CLONE(DEEP CLONE) to clone them to the new workspace. 2) Have the same cluster policy/Instance profile of the old workspace to access them in the new worksp...

0 kudos

yesterday

by bozhu • Contributor

08-22-2023 4:04:51 AM

926 Views
4 replies
0 kudos

Delta Live Tables Materialised View Column Comment Error

While materialised view doc says MVs support columns comments, this does not seem like the case for MVs created by DLT. For example, when trying to add a comment to a MV created by DLT, it errors:Any ideas on when this will be fixed/supported?

Data Engineering

926 Views
4 replies
0 kudos

08-22-2023 4:04:51 AM

View Replies

Latest Reply

bozhu
Contributor

09-17-2023 9:25:25 PM

0 kudos

Just to close the loop here that it seems DLT generated MVs now support column comments.

0 kudos

09-17-2023 9:25:25 PM

3 More Replies

User

Count

1599

734

343

284

246

Databricks

Forum Posts

Ingest __databricks_internal catalog - PERMISSION DENIED

How can I throw an exception when a .json.gz file has multiple roots?

Run failed with error message ContextNotFound

Performance Issue with XML Processing in Spark Databricks

Pyspark execution error

Resolved! exception when using java SQL client

Unity catalog issues

Configure Service Principle access to GiLab

Ephemeral storage how to create/mount.

DLT Pipeline Error Handling

Workflow file arrival trigger - does it apply to overwritten files?

Differences between Spark SQL and Databricks

Multiple Notebooks Migration from one workspace to another without using Git.

Multiple Tables Migration from one workspace to another.

Delta Live Tables Materialised View Column Comment Error

Unit Testing with the new Databricks Connect in Py...

Cluster pools

What is difference between streaming and streaming...

Liquid Clustering With Merge

Accessing ADLS Gen 2 Raw Files with UC ?