cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Ramana
by Valued Contributor
  • 119 Views
  • 3 replies
  • 0 kudos

Serverless Compute - Spark - Jobs failing with Max iterations (1000) reached for batch Resolution

Hello Community,We have been trying to migrate our jobs from Classic Compute to Serverless Compute. As part of this process, we face several challenges, and this is one of them.When we try to execute the existing jobs with Serverless Compute, if the ...

Ramana_1-1757620107637.png Ramana_0-1757620075091.png
  • 119 Views
  • 3 replies
  • 0 kudos
Latest Reply
K_Anudeep
Contributor
  • 0 kudos

Hello @Ramana ,You’re right that data volume doesn’t change the logical plan, but your pattern (Example: SELECT * from a wide table + 10–15 column transforms) can still exceed the analyzer’s fixed iteration cap on Serverless, because each * expansion...

  • 0 kudos
2 More Replies
mattstyl-ff
by New Contributor
  • 201 Views
  • 6 replies
  • 0 kudos

Error with AutoLoader pipeline ingesting from external location: LOCATION_OVERLAP

Hello,I am trying to use pipelines in Databricks to ingest data from an external location to the datalake using AutoLoader, and I am facing this issue. I have noticed other posts with similar errors, but in those posts, the error was related to the d...

  • 201 Views
  • 6 replies
  • 0 kudos
Latest Reply
mattstyl-ff
New Contributor
  • 0 kudos

There is no table created yet. I tried deleting the pipeline and creating a new one, with new file names, it still fails.I noticed that the same error happens if I try to read from the event log location, using spark.read().Example:path = "abfss://un...

  • 0 kudos
5 More Replies
tyhatwar785
by New Contributor
  • 73 Views
  • 1 replies
  • 1 kudos

Solution Design Recommendation on Databricks

Hi Team,We need to design a pipeline in Databricks to:1. Call a metadata API (returns XML per keyword), parse, and consolidate into a combined JSON.2. Use this metadata to generate dynamic links for a second API, download ZIPs, unzip, and extract spe...

  • 73 Views
  • 1 replies
  • 1 kudos
Latest Reply
nikhilmohod-nm
New Contributor III
  • 1 kudos

Hi @tyhatwar785 1. Should metadata and file download be separate jobs/notebooks or combined?Keep them in separate notebooks but orchestrate them under a single Databricks Job.for better error handling, and retries .2. Cluster recommendationsstart wit...

  • 1 kudos
PratikRudra
by New Contributor
  • 99 Views
  • 1 replies
  • 0 kudos

unable to create table on external location

Currently trying to connect a table on external location and it fails with error -[UNAUTHORIZED_ACCESS] Unauthorized access: PERMISSION_DENIED: request not authorized SQLSTATE: 42501which seems like a pretty straight forward error but unable to find ...

  • 99 Views
  • 1 replies
  • 0 kudos
Latest Reply
Khaja_Zaffer
Contributor
  • 0 kudos

Hello @PratikRudra Thank you for sharing the error: I think probably there is a component that is missing. Writing table metadata (for example, to the _delta_log directory) requires the CREATE EXTERNAL TABLE capability on the external location; this ...

  • 0 kudos
sfishel18
by New Contributor
  • 13 Views
  • 1 replies
  • 0 kudos

GEOMETRY column type breaks all access to table from Spark

Hello, I have a Databricks table with a column using the new GEOMETRY type. When I try to access this table from a Spark workload, I am not able to describe the table or operate on any of its columns. My Spark config is the following, per the Databri...

  • 13 Views
  • 1 replies
  • 0 kudos
Latest Reply
sfishel18
New Contributor
  • 0 kudos

I have filed an issue here: https://github.com/unitycatalog/unitycatalog/issues/1077. But also wanted to ask for help here since this is a Databricks-specific column type.

  • 0 kudos
mosayed
by New Contributor
  • 281 Views
  • 4 replies
  • 6 kudos

Resolved! Databricks clusters unresponsive

Hello everyone,we are experiencing issues on one of our Databricks workspaces:Notebooks and SQL queries are executing, but results are not returned to the UI.On the screenshots you can see examples where cells in notebooks and queries in a SQL wareho...

  • 281 Views
  • 4 replies
  • 6 kudos
Latest Reply
mosayed
New Contributor
  • 6 kudos

Thanks a lot for the quick replies!It seems the issue was related to a faulty iPython version (or something similar inside the workspace). The problem resolved itself later the same day in the evening, and everything is working normally again now.

  • 6 kudos
3 More Replies
ToNiOZ45
by New Contributor II
  • 181 Views
  • 1 replies
  • 2 kudos

Resolved! New Scroll bar appears in cells with more than 300 lines

Hi,In a Databricks notebook, I noticed a new behaviour which I'd like to deactivate.Essentially, when a cell reaches 300 lines or more, a new scroll bar within the cell appears. I'd rather have the cell displayed in full and keep using the page scrol...

Databricks_Dual_Scrollbar.jpg
  • 181 Views
  • 1 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @ToNiOZ45 ,Unfortunately, there's no option to disable it in workspace settings. You can create ticket in Databricks Ideas and suggest them to add that option if you find this scroll bar frustrating:Feedback | Databricks

  • 2 kudos
Ramana
by Valued Contributor
  • 199 Views
  • 5 replies
  • 4 kudos

Serverless Compute - pySpark - Any alternative for rdd.getNumPartitions()

Hello Community,We have been trying to migrate our jobs from Classic Compute to Serverless Compute. As part of this process, we face several challenges, and this is one of them.When we read CSV or JSON files with multiLine=true, the load becomes sing...

  • 199 Views
  • 5 replies
  • 4 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 4 kudos

Hi @Ramana ,Yep, RDD API is not supported on ServelessAs a workaround you can obtain number of partitions in following way - using spark_partiton_id and then counting distinct occurance of each idfrom pyspark.sql.functions import spark_partition_id,...

  • 4 kudos
4 More Replies
Ramana
by Valued Contributor
  • 141 Views
  • 2 replies
  • 0 kudos

Serverless Compute - Python - Custom Emails via SMTP (smtplib.SMTP(host_name)) - Any alternative?

Hello Community,We have been trying to migrate our jobs from Classic Compute to Serverless Compute. As part of this process, we face several challenges, and this is one of them.We have several scenarios where we need to send an inline email via Pytho...

  • 141 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Ramana ,What error do you get in serverless? Could you provide error message?

  • 0 kudos
1 More Replies
minhhung0507
by Valued Contributor
  • 1467 Views
  • 2 replies
  • 0 kudos

Executors Getting FORCE_KILL After Migration to GCE – Resource Scaling Not Helping

Hi everyone,We're facing a persistent issue with our production streaming pipelines where executors are being forcefully killed with the following error:Executor got terminated abnormally due to FORCE_KILL Screenshot for reference:Context:Our pipelin...

minhhung0507_1-1750841419951.png minhhung0507_2-1750841451453.png
  • 1467 Views
  • 2 replies
  • 0 kudos
Latest Reply
thomas-totter
New Contributor III
  • 0 kudos

@minhhung0507 wrote:We're facing a persistent issue with our production streaming pipelines where executors are being forcefully killed with the following error:Executor got terminated abnormally due to FORCE_KILLI solved the issue in our case and al...

  • 0 kudos
1 More Replies
thomas-totter
by New Contributor III
  • 339 Views
  • 4 replies
  • 4 kudos

NativeADLGen2RequestComparisonHandler: Error in request comparison (when running DLT)

Since at least two weeks (but probably even longer) our DLT pipeline posts error messages to log4j (driver logs) like the one below. I tried with both channels (preview, current), switched between serverless and classic compute and started the pipeli...

  • 339 Views
  • 4 replies
  • 4 kudos
Latest Reply
thomas-totter
New Contributor III
  • 4 kudos

I also tried below setting below (spark.conf), but that didn't help either:spark.sql.legacy.timeParserPolicy: LEGACYLEGACY_TIME_PARSER_POLICY - Azure Databricks - Databricks SQL | Microsoft Learn

  • 4 kudos
3 More Replies
TechExplorer
by New Contributor II
  • 834 Views
  • 3 replies
  • 1 kudos

Resolved! Unable to unpack or read rar file

Hi everyone,I'm encountering an issue with the following code when trying to unpack or read a RAR file in Databricks: with rarfile.RarFile(s3_path) as rf: for file_info in rf.infolist(): with rf.open(file_info) as file: file_c...

  • 834 Views
  • 3 replies
  • 1 kudos
Latest Reply
Upendra_Dwivedi
Contributor
  • 1 kudos

Hi @Walter_C,I am also using this unrar utility but the problem it is a proprietary software and i am working for a client and this license could cause issues. What is the alternative to unrar so that we eliminate the risk of any legal compliance.

  • 1 kudos
2 More Replies
felix4572
by New Contributor II
  • 407 Views
  • 9 replies
  • 6 kudos

transformWithStateInPandas throws "Spark connect directory is not ready" error

Hello,we employ arbitrary stateful aggregations in our data processing streams on Azure Databricks, and would like to migrate from applyInPandasWithState to transformWithStateInPandas. We employ the Python API throughout our solution, and some of our...

felix4572_0-1756710186921.png
Data Engineering
stateful processing
structured streaming
transformWithStateInPandas
  • 407 Views
  • 9 replies
  • 6 kudos
Latest Reply
Advika
Databricks Employee
  • 6 kudos

Update: This is working fine with earlier DBR versions, but the issue seems to occur specifically with DBR 17.1.I’ve flagged this behaviour with the internal team for further investigation.

  • 6 kudos
8 More Replies
lizou1
by New Contributor III
  • 1592 Views
  • 3 replies
  • 0 kudos

serverless environment v3 JavaPackage object is not callable

run into this issue when use serverless environment v3JavaPackage object is not callable V2 works fine, any idea

  • 1592 Views
  • 3 replies
  • 0 kudos
Latest Reply
lizou1
New Contributor III
  • 0 kudos

I went to latest version 4 and this is no longer an issue. thanks

  • 0 kudos
2 More Replies
Labels