cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

amar1995
by Visitor
  • 35 Views
  • 1 replies
  • 0 kudos

Performance Issue with XML Processing in Spark Databricks

I am reaching out to bring attention to a performance issue we are encountering while processing XML files using Spark-XML, particularly with the configuration spark.read().format("com.databricks.spark.xml").Currently, we are experiencing significant...

  • 35 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Honored Contributor III
  • 0 kudos

@amar1995 - Can you try this streaming approach and see if it works for your use case (using autoloader) - https://kb.databricks.com/streaming/stream-xml-auto-loader

  • 0 kudos
andre_rizzatti
by Visitor
  • 170 Views
  • 3 replies
  • 0 kudos

Ingest __databricks_internal catalog - PERMISSION DENIED

Good morning, I have a DLT process with CDC incremental load and I need to ingest the history as CDC transactions are only recent. To do this I need to ingest data in the __databricks_internal catalog. In my case, as I am full admin, I can do it, how...

image.png
  • 170 Views
  • 3 replies
  • 0 kudos
Latest Reply
andre_rizzatti
  • 0 kudos

The tables do not have specific configuration, and the user who is receiving the error is in a group that has full permission in the INTERNAL catalog

  • 0 kudos
2 More Replies
Snoonan
by Visitor
  • 128 Views
  • 4 replies
  • 0 kudos

Unity catalog issues

Hi all,I have recently enabled Unity catalog in my DBX workspace. I have created a new catalog with an external location on Azure data storage.I can create new schemas(databases) in the new catalog but I can't create a table. I get the below error wh...

  • 128 Views
  • 4 replies
  • 0 kudos
Latest Reply
daniel_sahal
Honored Contributor III
  • 0 kudos

@Snoonan First of all, check the networking tab on the storage account to see if it's behind firewall. If it is, make sure that Databricks/Storage networking is properly configured (https://learn.microsoft.com/en-us/azure/databricks/security/network/...

  • 0 kudos
3 More Replies
amde99
by New Contributor
  • 161 Views
  • 2 replies
  • 0 kudos

How can I throw an exception when a .json.gz file has multiple roots?

I have a situation where source files in .json.gz sometimes arrive with invalid syntax containing multiple roots separated by empty braces []. How can I detect this and thrown an exception? Currently the code runs and picks up only record set 1, and ...

  • 161 Views
  • 2 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

Schema validation should help here.

  • 0 kudos
1 More Replies
Karlo_Kotarac
by New Contributor II
  • 61 Views
  • 3 replies
  • 0 kudos

Run failed with error message ContextNotFound

Hi all!Recently we've been getting lots of these errors when running Databricks notebooks:At that time we observed DRIVER_NOT_RESPONDING (Driver is up but is not responsive, likely due to GC.) log on the single-user cluster we use.Previously when thi...

Karlo_Kotarac_0-1713422302017.png
  • 61 Views
  • 3 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

You may also try to run the failing notebook on the job cluster

  • 0 kudos
2 More Replies
Kanti1989
by Visitor
  • 16 Views
  • 1 replies
  • 0 kudos

Pyspark execution error

I am getting a error message when executing a simple pyspark code. Can anyone help me with this.  

Kanti1989_0-1713522601530.png
  • 16 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

The error message says "system cannot find the file specified". Could you check in the error message which file it is complaining about?

  • 0 kudos
amitca71
by Contributor II
  • 3124 Views
  • 5 replies
  • 4 kudos

Resolved! exception when using java SQL client

Hi,I try to use java sql. i can see that the query on databricks is executed properly.However, on my client i get exception (see below).versions:jdk: jdk-20.0.1 (tryed also with version 16, same results)https://www.oracle.com/il-en/java/technologies/...

  • 3124 Views
  • 5 replies
  • 4 kudos
Latest Reply
xebia
New Contributor
  • 4 kudos

I am using java 17 and getting the same error.

  • 4 kudos
4 More Replies
drag7ter
by New Contributor II
  • 107 Views
  • 1 replies
  • 0 kudos

Configure Service Principle access to GiLab

I'm facing an issue while trying to run my job in db and my notebooks located in Git Lab. When I run job under my personal user_Id it works fine, because I added Git Lab token to my user_Id profile and job able to pull branch from repository. But whe...

  • 107 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @drag7ter, There might be a missing piece in the setup. Ensure that you’ve correctly entered the Git provider credentials (username and personal access token) for your Service Principle.Confirm that you’ve selected the correct Git provider (GitLab...

  • 0 kudos
cszczotka
by New Contributor II
  • 323 Views
  • 1 replies
  • 0 kudos

Ephemeral storage how to create/mount.

Hi,I'm looking for information how to create/mount ephemeral storage to Databricks driver node in Azure Cloud.  Does anyone have any experience working with ephemeral storage?Thanks,

  • 323 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @cszczotka,  Azure Databricks allows you to mount cloud object storage to the Databricks File System (DBFS) to simplify data access patterns for users who are unfamiliar with cloud concepts. Mounted data does not work with Unity Catalog, and Dat...

  • 0 kudos
dashawn
by Visitor
  • 36 Views
  • 1 replies
  • 0 kudos

DLT Pipeline Error Handling

Hello all.We are a new team implementing DLT and have setup a number of tables in a pipeline loading from s3 with UC as the target. I'm noticing that if any of the 20 or so tables fail to load, the entire pipeline fails even when there are no depende...

Data Engineering
Delta Live Tables
  • 36 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @dashawn,  When data processing fails, manual investigation of logs to understand the failures, data cleanup, and determining the restart point can be time-consuming and costly. DLT provides features to handle errors more intelligently.By default,...

  • 0 kudos
mvmiller
by New Contributor III
  • 179 Views
  • 1 replies
  • 0 kudos

Workflow file arrival trigger - does it apply to overwritten files?

I am exploring the use of the "file arrival" trigger for a workflow for a use case I am working on.  I understand from the documentation that it checks every minute for new files in an external location, then initiates the workflow when it detects a ...

  • 179 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rajani
New Contributor III
  • 0 kudos

Hi @mvmiller The  "file arrival" trigger for a workflow considers the name of the file,when the same name file was overwritten the workflow didnt triggerred.hope I answered your question! 

  • 0 kudos
dollyb
by New Contributor III
  • 107 Views
  • 1 replies
  • 0 kudos

Differences between Spark SQL and Databricks

Hello,I'm using a local Docker Spark 3.5 runtime to test my Databricks Connect code. However I've come across a couple of cases where my code would work in one environment, but not the other.Concrete example, I'm reading data from BigQuery via spark....

  • 107 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Honored Contributor III
  • 0 kudos

@dollyb That's because when you've added another dependency on Databricks, it doesn't really know which one it should use. By default it's using built-in com.google.cloud.spark.bigquery.BigQueryRelationProvider.What you can do is provide whole packag...

  • 0 kudos
HaripriyaP
by New Contributor
  • 62 Views
  • 1 replies
  • 0 kudos

Multiple Notebooks Migration from one workspace to another without using Git.

Hi all!I need to migrate multiple notebooks from one workspace to another. Is there any way to do it without using Git?Since Manual Import and Export is difficult to do for multiple notebooks and folders, need an alternate solution.Please reply as so...

  • 62 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Honored Contributor III
  • 0 kudos

@HaripriyaP You can use databricks CLI to export and import notebooks from one workspace to another.CLI Documentation here:https://github.com/databricks/cli/blob/main/docs/commands.md#databricks-workspace-export---export-a-workspace-object

  • 0 kudos
Labels
Top Kudoed Authors