cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Suheb
by New Contributor
  • 18 Views
  • 1 replies
  • 0 kudos

How have you set up a governance structure (data access control, workspace management, cluster polic

If your company uses Databricks with many people, how do you manage security, organize teams, and control costs — and what tools do you use to make it all work smoothly?

  • 18 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

Please take a look here to get some initial ideas. https://medium.com/databricks-unity-catalog-sme/a-practical-guide-to-catalog-layout-data-sharing-and-distribution-with-databricks-unity-catalog-763e4c7b7351  

  • 0 kudos
him
by New Contributor III
  • 24990 Views
  • 14 replies
  • 10 kudos

i am getting the below error while making a GET request to job in databrick after successfully running it

"error_code": "INVALID_PARAMETER_VALUE",  "message": "Retrieving the output of runs with multiple tasks is not supported. Please retrieve the output of each individual task run instead."}

Capture
  • 24990 Views
  • 14 replies
  • 10 kudos
Latest Reply
Octavian1
Contributor
  • 10 kudos

Hi @Debayan I'd suggest to also mention this explicitly in the documentation of the workspace client for get_run_outputOne has to pay extra attention to the examplerun_id=run.tasks[0].run_id otherwise it can be easily missed. 

  • 10 kudos
13 More Replies
toproximahk
by New Contributor
  • 116 Views
  • 3 replies
  • 0 kudos

Inquiry on GraphFrame Library Upgrade Timeline for Databricks Runtime for Machine Learning

Thanks for the Databricks community and maintaining such a valuable platform.I would like to inquire if there is a planned timeline for upgrading the GraphFrame library. We’ve noticed that the latest release on GitHub is v0.9.3, while the Databricks ...

  • 116 Views
  • 3 replies
  • 0 kudos
Latest Reply
Sem-Sinchenko
  • 0 kudos

You can try to add to your cluster mvn dependency manually ... For example, for spark 3.5.x it will be like:io.graphframes:graphframes-spark3_2.12:0.10.0and add a PyPi dependency graphframes-py. Adding maven coordinates should download and install al...

  • 0 kudos
2 More Replies
alhuelamo
by New Contributor II
  • 10293 Views
  • 5 replies
  • 1 kudos

Getting non-traceable NullPointerExceptions

We're running a job that's issuing NullPointerException without traces of our job's code.Does anybody know what would be the best course of action when it comes to debugging these issues?The job is a Scala job running on DBR 11.3 LTS.In case it's rel...

  • 10293 Views
  • 5 replies
  • 1 kudos
Latest Reply
Amora
Visitor
  • 1 kudos

You could try enabling full stack traces and checking the Spark executor logs for hidden errors. Null Pointer Exceptions in Scala on DBR often come from lazy evaluations or missing schema fields during I/O. Reviewing your Data Frame transformations a...

  • 1 kudos
4 More Replies
Phani1
by Valued Contributor II
  • 4590 Views
  • 4 replies
  • 2 kudos

Convert EBCDIC (Binary) file format to ASCII

Hi Team,How can we convert EBCDIC (Binary) file format to ASCII in databricks? Do we have any libraries in Databricks?

  • 4590 Views
  • 4 replies
  • 2 kudos
Latest Reply
amulight
Visitor
  • 2 kudos

Hi Phani1 Were you able to do that successfully ? Can you share the details and steps please. Thanks.

  • 2 kudos
3 More Replies
67
by Visitor
  • 36 Views
  • 1 replies
  • 1 kudos

Simple integration to push data from third-party into a client's Databricks instance

Hi there, we have an industry data platform with multiple customers using it. We provide each customer with their own data every night via .csv. Some of our customers use Databricks, and import their data from us into it.We would like to offer a more...

  • 36 Views
  • 1 replies
  • 1 kudos
Latest Reply
jeffreyaven
Databricks Employee
  • 1 kudos

You could use external volumes with a Cloudflare R2 bucket as an intermediary - you write the nightly data files to R2 (using S3-compatible API), and your customers create external volumes in their Databricks workspace pointing to their designated R2...

  • 1 kudos
bidek56
by New Contributor III
  • 38 Views
  • 0 replies
  • 0 kudos

Location of spark.scheduler.allocation.file

In DBR 164.LTS, I am trying to add the following Spark config: spark.scheduler.allocation.file: file:/Workspace/init/fairscheduler.xmlBut the all purpose cluster is throwing this error Spark error: Driver down cause: com.databricks.backend.daemon.dri...

  • 38 Views
  • 0 replies
  • 0 kudos
Dhruv-22
by Contributor
  • 16 Views
  • 0 replies
  • 0 kudos

Reading empty json file in serverless gives error

I have a pipeline which puts json files in a storage location after reading a daily delta load. Today I encountered a case where the file as empty. I tried running the notebook manually using serverless cluster (Environment version 4) and encountered...

  • 16 Views
  • 0 replies
  • 0 kudos
GiriSreerangam
by New Contributor III
  • 72 Views
  • 2 replies
  • 1 kudos

Resolved! org.apache.spark.SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC]

Hi EveryoneI am writing a small function, with spark read from a csv and spark write into a table. I could execute this function within the notebook. But, when I register the same function as a unity catalog function and calling it from Playground, i...

GiriSreerangam_0-1761761391719.png
  • 72 Views
  • 2 replies
  • 1 kudos
Latest Reply
KaushalVachhani
Databricks Employee
  • 1 kudos

Hi @GiriSreerangam, You cannot use a Unity Catalog user-defined function (UDF) in Databricks to perform Spark read from a CSV and write to a table. Unity Catalog Python UDFs execute in a secure, isolated environment without access to the file system ...

  • 1 kudos
1 More Replies
a_user12
by New Contributor III
  • 37 Views
  • 0 replies
  • 0 kudos

Drop Delta Log seems not to be working

 I have a delta table where I set the following propertylogRetentionDuration: "interval 1 days"I was doing some table operations and see in the _delta_log folder files such as00000000000000000000.json 00000000000000000001.json 00000000000000000002.js...

  • 37 Views
  • 0 replies
  • 0 kudos
dheeraj98
by New Contributor
  • 47 Views
  • 1 replies
  • 2 kudos

dbt Cloud + Databricks SQL Warehouse with microbatching (48h lookback) — intermittent failures

Hey everyone,I’m currently running hourly dbt Cloud job (27 models with 8 threads) on a Databricks SQL Warehouse using the dbt microbatch approach, with a 48-hour lookback window.But I’m running into some recurring issues:Jobs failing intermittentlyO...

  • 47 Views
  • 1 replies
  • 2 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 2 kudos

Here are few options  you can try and see if it resolves your issue.1. SQL Warehouse TuningUse Serverless SQL Warehouse with Photon for faster spin-up and query execution. [docs.getdbt.com]Size Appropriately: Start with Medium or Large, and enable au...

  • 2 kudos
Oumeima
by New Contributor II
  • 1876 Views
  • 2 replies
  • 2 kudos

Resolved! I can't use my own .whl package in Databricks app with databricks asset bundles

I am building a databricks app using databricks asset bundles. I need to use a helpers packages that i built as an artifact and using in other resources outside the app. The only way to use it is to have the built package inside the app source code f...

  • 1876 Views
  • 2 replies
  • 2 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 2 kudos

Hi @Oumeima , One potential way around this is to upload the wheel file into a Unity Catalog volume or workspace file. For the volume route, reference it directly in your app’s requirements.txt using an absolute /Volumes/<catalog>/<schema>/<volume>/....

  • 2 kudos
1 More Replies
tt_921
by New Contributor
  • 60 Views
  • 2 replies
  • 0 kudos

Databricks CLI binding storage credential to a workspace

In the documentation from Databricks it says to run the below for binding a storage credential to a workspace (after already completing step 1 to update the `isolation-mode` to be `ISOLATED`): databricks workspace-bindings update-bindings storage-cre...

  • 60 Views
  • 2 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

This appears to be a documentation inconsistency. The CLI implementation seems to:   1. Require binding_type to be explicitly specified (contradicting the docs)   2. Require it to be placed within each workspace object, not as a top-level parameter  ...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels