cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Divya_Bhadauria
by New Contributor III
  • 48 Views
  • 1 replies
  • 0 kudos

Does Databricks Runtime 7.3+ include built-in Hadoop S3 connector configurations?

I came across the KB article S3 connection reset error, which mentions not using the following Spark settings for the Hadoop S3 connector for DBR 7.3 and above:spark.hadoop.fs.s3.impl com.databricks.s3a.S3AFileSystem spark.hadoop.fs.s3n.impl com.data...

  • 48 Views
  • 1 replies
  • 0 kudos
Latest Reply
hasnat_unifeye
New Contributor
  • 0 kudos

No, you don’t need to set those on DBR 7.3 and above.From 7.3+ Databricks already uses the newer Hadoop S3A connector by default, so those com.databricks.s3a.S3AFileSystem settings are not part of the default config and shouldn’t be added.If they are...

  • 0 kudos
Surya-Prathap
by New Contributor
  • 124 Views
  • 2 replies
  • 1 kudos

Output Not Displaying in Databricks Notebook on All-Purpose Compute Cluster

Hello All,I’m encountering an issue where output from standard Python commands such as print() or display(df) is not showing up correctly when running notebooks on an All-Purpose Compute cluster.Cluster DetailsCluster Type: All-Purpose ComputeRuntime...

  • 124 Views
  • 2 replies
  • 1 kudos
Latest Reply
Sahil_Kumar
Databricks Employee
  • 1 kudos

Hi Surya, Do you face this issue only with DBR 17.3 all-purpose clusters? Did you try with lower DBRs? If not, please try and let me know. Also, from the Run menu, try “Clear state and outputs,” then re‑run the cell on the same cluster to rule out st...

  • 1 kudos
1 More Replies
spd_dat
by New Contributor III
  • 3578 Views
  • 2 replies
  • 0 kudos

Can you default to `execution-count: none` when stripping notebook outputs?

When committing to a git folder, IPYNB outputs are usually stripped, unless allowed by an admin setting and toggled by .databricks/commit_outputs. This sets the{"execution-count": 0, ... }within the IPYNB metadata. Is there a way to set it instead to...

  • 3578 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Databricks does not currently allow you to default to "execution_count": null (or "none") when stripping notebook outputs during a commit. The platform sets "execution_count": 0 as the default when outputs are stripped through their Git integration, ...

  • 0 kudos
1 More Replies
ayush667787878
by New Contributor
  • 3334 Views
  • 2 replies
  • 1 kudos

not able to install library in normal site while in community version it working please help

I am not able to install library in normal version while in community editioin i am able to add libray using compute how to install in normal databricks same as community edition.   

SCR-20250211-qxiz.png ayush667787878_0-1739282136368.png
  • 3334 Views
  • 2 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

To install libraries in the normal (paid) version of Databricks, use the cluster management interface to add libraries to your compute resources. The process is similar to the Community Edition, but workspace policies and cluster access mode may rest...

  • 1 kudos
1 More Replies
ask005
by New Contributor
  • 2079 Views
  • 1 replies
  • 0 kudos

How to write ObjectId value using Spark connector 10.2.2

In pySpark mongo connector while updating records how to handle _id as objectId.spark 3.2.4scala2.13sparkMongoConnector 2.12-10.2.2

  • 2079 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

To write an ObjectId value using Spark Mongo Connector 10.2.2 in PySpark while updating records, you must convert the ObjectId string into a special format. The Spark Mongo Connector does not automatically recognize a string as an ObjectId; it will o...

  • 0 kudos
dndeng
by New Contributor II
  • 270 Views
  • 4 replies
  • 0 kudos

Query to calculate cost of task from each job by day

I am trying to find the cost per Task in each Job every time it was executed (daily) but currently getting very huge numbers due to duplicates, can someone help me ?   WITH workspace AS ( SELECT account_id, workspace_id, workspace_name,...

  • 270 Views
  • 4 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

You are seeing inflated cost numbers because your query groups by many columns—especially run_id, task_key, usage_start_time, and usage_end_time—without addressing possible duplicate row entries that arise from your joins, especially with the system....

  • 0 kudos
3 More Replies
lmorrissey
by New Contributor II
  • 4021 Views
  • 1 replies
  • 0 kudos

GC Allocation Failure

There are a couple of related posts here and here.Seeing a similar issue with a long running job. Processes are in a "RUNNING" state, cluster is active, but stdout log shows the dreaded GC Allocation Failure. Env:I've set the following on the config:...

lmorrissey_2-1738802605421.png lmorrissey_0-1738801635404.png lmorrissey_1-1738801909227.png
  • 4021 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

A persistent "GC Allocation Failure" in Spark jobs, where processes are stuck in the RUNNING state even after attempts to clear cache and enforce GC, typically indicates ongoing memory pressure, possible data skew, or excessive memory use on the driv...

  • 0 kudos
itt
by New Contributor II
  • 4196 Views
  • 3 replies
  • 0 kudos

Graceful shutdown - stopping stream at the end of microbatch

Im trying to create a system where i let spark finish the current microbatch, and letting it know it should stop after it.The reason is that i don't want to re-calcualte a microbatch with "forcefully" stopping a stream.Is there a way spark/databricks...

  • 4196 Views
  • 3 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

There is no built-in Spark or Databricks method to gracefully stop a Structured Streaming query specifically at the end of the current microbatch, but several community and expert discussions propose common strategies to achieve this: Official and Co...

  • 0 kudos
2 More Replies
Austin1
by New Contributor
  • 3832 Views
  • 1 replies
  • 0 kudos

VSCode Integration for Data Science Analysts

Probably not posting this in the right forum, but can't find a good fit.This is a bit convuluted because we make things hard at work. I have access to a single LLM via VSCode (Amazon Q).  Since I can't use that within Databricks but I want my team to...

  • 3832 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

It’s a smart move to raise this question before investing lots of time—because with the Databricks VSCode extension, there are indeed specific limitations when it comes to accessing shared workspace folders that weren't originally created by the exte...

  • 0 kudos
thomas-totter
by New Contributor III
  • 1153 Views
  • 5 replies
  • 4 kudos

NativeADLGen2RequestComparisonHandler: Error in request comparison (when running DLT)

Since at least two weeks (but probably even longer) our DLT pipeline posts error messages to log4j (driver logs) like the one below. I tried with both channels (preview, current), switched between serverless and classic compute and started the pipeli...

  • 1153 Views
  • 5 replies
  • 4 kudos
Latest Reply
mark_ott
Databricks Employee
  • 4 kudos

The error message you are observing in your DLT pipeline logs, specifically:   text java.lang.NumberFormatException: For input string: "Fri, 29 Aug 2025 09:02:07 GMT" suggests that something in your pipeline (likely library or code respo...

  • 4 kudos
4 More Replies
chinmay0924
by New Contributor III
  • 868 Views
  • 4 replies
  • 1 kudos

mapInPandas not working in serverless compute

Using `mapInPandas` in serverless compute (Environment version 2) gives the following error,```Py4JError: An error occurred while calling o543.mapInPandas. Trace: py4j.Py4JException: Method mapInPandas([class org.apache.spark.sql.catalyst.expressions...

  • 868 Views
  • 4 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

The error you are seeing when using mapInPandas in serverless compute with Environment version 2 is due to an incompatibility in the environment's supported Spark features. Specifically, Environment version 2 on serverless compute does not support ma...

  • 1 kudos
3 More Replies
ChsAIkrishna
by Contributor
  • 4251 Views
  • 2 replies
  • 1 kudos

Vnet Gateway issues on Power bi Conn

Team,We are getting frequent vnet gateway failures on power bi Dataset using DAX(simple DAX not complex) and upon the rerun it is working, is any permanent fix for this ?Error :{"error":{"code":"DM_GWPipeline_Gateway_MashupDataAccessError","pbi.error...

  • 4251 Views
  • 2 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

Frequent VNet gateway errors in Power BI related to “DM_GWPipeline_Gateway_MashupDataAccessError” and memory allocation issues often stem from resource limits, configuration problems, or inefficient modeling—even with simple DAX. No single “permanent...

  • 1 kudos
1 More Replies
swapnilmd
by New Contributor II
  • 3852 Views
  • 2 replies
  • 0 kudos

How to handle , Error parsing WKT: Invalid coordinate value '180' found at position

DBR Version- 16.2spark.databricks.geo.st.enabled trueSQL Query I am running:  %sql WITH points ( SELECT st_astext(st_point(30D, 10D)) AS point_geom UNION SELECT st_astext(st_point(10D, 90D)) AS point_geom UNION SELECT st_astext(st_point(4...

  • 3852 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The error occurs because Databricks (based on GEOS/OGC standards) expects coordinates in Well-Known Text (WKT) that fall into valid ranges: Longitude (XX or first coordinate): −180≤X≤180−180≤X≤180 Latitude (YY or second coordinate): −90≤Y≤90−90≤Y≤9...

  • 0 kudos
1 More Replies
mkEngineer
by New Contributor III
  • 5121 Views
  • 3 replies
  • 0 kudos

How to Version & Deploy Databricks Workflows with Azure DevOps (CI/CD)?

Hi everyone,I’m trying to set up versioning and CI/CD for my Databricks workflows using Azure DevOps and Git. While I’ve successfully versioned notebooks in a Git repo, I’m struggling with handling workflows (which define orchestration, dependencies,...

  • 5121 Views
  • 3 replies
  • 0 kudos
Latest Reply
mkEngineer
New Contributor III
  • 0 kudos

As of now, my current approach is to manually copy/paste YAMLs across workspaces and version them using Git/Azure DevOps by saving them as DBFS files. The CD process is then handled using Databricks DBFS File Deployment by Data Thirst Ltd.While this ...

  • 0 kudos
2 More Replies
KurtWang
by New Contributor
  • 3683 Views
  • 1 replies
  • 0 kudos

UCX error with databricks labs ucx create-table-mapping

Hi,I am using UCX for unity catalog migration and up to the step of table migration. When I run the command databricks labs ucx create-table-mapping, it returns the error message 'ERROR [src/databricks/labs/ucx.create-table-mapping] ValueError: Pleas...

  • 3683 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

You are seeing two related errors during table migration with Databricks Labs UCX for Unity Catalog: For create-table-mapping:“ValueError: Please run as account-admin: databricks labs ucx sync-workspace-info” For sync-workspace-info:“Error: entrypo...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels