Data Engineering

Forum Posts

Sorted by:

by KKo • Contributor III

Wednesday

312 Views
1 replies
0 kudos

On Prem MS sql to Azure Databricks

Hi allI need to ingest data from on prem MS sql tables using Databricks to Azure Cloud. For the ingest, previously I used notebooks, jdbc connectors, read sql tables and write in unity catalog tables. Now, I want to experiment Databricks connectors f...

Data Engineering

312 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

AbhaySingh
Databricks Employee

11 hours ago

0 kudos

This is feature is good to go... I can't think of any disadvantages. Here is a guide. https://landang.ca/2025/01/31/simple-data-ingestion-from-sql-server-to-databricks-using-lakeflow-connect/

0 kudos

11 hours ago

by Suheb • New Contributor

Wednesday

18 Views
1 replies
0 kudos

How have you set up a governance structure (data access control, workspace management, cluster polic

If your company uses Databricks with many people, how do you manage security, organize teams, and control costs — and what tools do you use to make it all work smoothly?

Data Engineering

18 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

AbhaySingh
Databricks Employee

11 hours ago

0 kudos

Please take a look here to get some initial ideas. https://medium.com/databricks-unity-catalog-sme/a-practical-guide-to-catalog-layout-data-sharing-and-distribution-with-databricks-unity-catalog-763e4c7b7351

0 kudos

11 hours ago

by him • New Contributor III

08-25-2022 12:08:00 AM

24991 Views
14 replies
10 kudos

i am getting the below error while making a GET request to job in databrick after successfully running it

"error_code": "INVALID_PARAMETER_VALUE", "message": "Retrieving the output of runs with multiple tasks is not supported. Please retrieve the output of each individual task run instead."}

Data Engineering

24991 Views
14 replies
10 kudos

08-25-2022 12:08:00 AM

View Replies

Latest Reply

Octavian1
Contributor

03-20-2025 3:31:21 AM

10 kudos

Hi @Debayan I'd suggest to also mention this explicitly in the documentation of the workspace client for get_run_outputOne has to pay extra attention to the examplerun_id=run.tasks[0].run_id otherwise it can be easily missed.

10 kudos

03-20-2025 3:31:21 AM

13 More Replies

by mkwparth • New Contributor III

yesterday

12 Views
0 replies
0 kudos

DLT | Communication lost with driver | Cluster was not reachable for 120 seconds

Hey Community, I'm facing this error, It says that "com.databricks.pipelines.common.errors.deployment.DeploymentException: Communication lost with driver. Cluster 1030-205818-yu28ft9s was not reachable for 120 seconds" This issue occurred in producti...

Data Engineering

12 Views
0 replies
0 kudos

yesterday

by toproximahk • New Contributor

Monday

118 Views
3 replies
0 kudos

Inquiry on GraphFrame Library Upgrade Timeline for Databricks Runtime for Machine Learning

Thanks for the Databricks community and maintaining such a valuable platform.I would like to inquire if there is a planned timeline for upgrading the GraphFrame library. We’ve noticed that the latest release on GitHub is v0.9.3, while the Databricks ...

Data Engineering

118 Views
3 replies
0 kudos

Monday

View Replies

Latest Reply

Sem-Sinchenko
Visitor

yesterday

0 kudos

You can try to add to your cluster mvn dependency manually ... For example, for spark 3.5.x it will be like:io.graphframes:graphframes-spark3_2.12:0.10.0and add a PyPi dependency graphframes-py. Adding maven coordinates should download and install al...

0 kudos

yesterday

2 More Replies

by alhuelamo • New Contributor II

12-07-2022 8:14:35 AM

10299 Views
5 replies
1 kudos

Getting non-traceable NullPointerExceptions

We're running a job that's issuing NullPointerException without traces of our job's code.Does anybody know what would be the best course of action when it comes to debugging these issues?The job is a Scala job running on DBR 11.3 LTS.In case it's rel...

Data Engineering

10299 Views
5 replies
1 kudos

12-07-2022 8:14:35 AM

View Replies

Latest Reply

Amora
Visitor

yesterday

1 kudos

You could try enabling full stack traces and checking the Spark executor logs for hidden errors. Null Pointer Exceptions in Scala on DBR often come from lazy evaluations or missing schema fields during I/O. Reviewing your Data Frame transformations a...

1 kudos

yesterday

4 More Replies

by Phani1 • Valued Contributor II

06-05-2023 10:38:32 PM

4598 Views
4 replies
2 kudos

Convert EBCDIC (Binary) file format to ASCII

Hi Team,How can we convert EBCDIC (Binary) file format to ASCII in databricks? Do we have any libraries in Databricks?

Data Engineering

4598 Views
4 replies
2 kudos

06-05-2023 10:38:32 PM

View Replies

Latest Reply

amulight
New Contributor

yesterday

2 kudos

Hi Phani1 Were you able to do that successfully ? Can you share the details and steps please. Thanks.

2 kudos

yesterday

3 More Replies

by 67 • New Contributor

yesterday

41 Views
1 replies
1 kudos

Simple integration to push data from third-party into a client's Databricks instance

Hi there, we have an industry data platform with multiple customers using it. We provide each customer with their own data every night via .csv. Some of our customers use Databricks, and import their data from us into it.We would like to offer a more...

Data Engineering

41 Views
1 replies
1 kudos

yesterday

View Replies

Latest Reply

jeffreyaven
Databricks Employee

yesterday

1 kudos

You could use external volumes with a Cloudflare R2 bucket as an intermediary - you write the nightly data files to R2 (using S3-compatible API), and your customers create external volumes in their Databricks workspace pointing to their designated R2...

1 kudos

yesterday

by bidek56 • New Contributor III

yesterday

43 Views
0 replies
0 kudos

Location of spark.scheduler.allocation.file

In DBR 164.LTS, I am trying to add the following Spark config: spark.scheduler.allocation.file: file:/Workspace/init/fairscheduler.xmlBut the all purpose cluster is throwing this error Spark error: Driver down cause: com.databricks.backend.daemon.dri...

Data Engineering

43 Views
0 replies
0 kudos

yesterday

by Dhruv-22 • Contributor

yesterday

16 Views
0 replies
0 kudos

Reading empty json file in serverless gives error

I have a pipeline which puts json files in a storage location after reading a daily delta load. Today I encountered a case where the file as empty. I tried running the notebook manually using serverless cluster (Environment version 4) and encountered...

Data Engineering

16 Views
0 replies
0 kudos

yesterday

by GiriSreerangam • New Contributor III

Wednesday

79 Views
2 replies
1 kudos

Resolved! org.apache.spark.SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC]

Hi EveryoneI am writing a small function, with spark read from a csv and spark write into a table. I could execute this function within the notebook. But, when I register the same function as a unity catalog function and calling it from Playground, i...

Data Engineering

79 Views
2 replies
1 kudos

Wednesday

View Replies

Latest Reply

KaushalVachhani
Databricks Employee

yesterday

1 kudos

Hi @GiriSreerangam, You cannot use a Unity Catalog user-defined function (UDF) in Databricks to perform Spark read from a CSV and write to a table. Unity Catalog Python UDFs execute in a secure, isolated environment without access to the file system ...

1 kudos

yesterday

1 More Replies

by CaptainJack • New Contributor III

yesterday

29 Views
0 replies
0 kudos

Pull workspace url and workspace name using databricks-sdk / programaticaly in notebook

1. How could I pull workspace url (https://adb-XXXXX.XX.....net) 2. How could I get workspace name visible in top right corner.I know that easies solution is dbutils.notebook.entry_point.... browserHostName but unfortunetly it is not working in job c...

Data Engineering

29 Views
0 replies
0 kudos

yesterday

by a_user12 • New Contributor III

yesterday

45 Views
0 replies
0 kudos

Drop Delta Log seems not to be working

I have a delta table where I set the following propertylogRetentionDuration: "interval 1 days"I was doing some table operations and see in the _delta_log folder files such as00000000000000000000.json 00000000000000000001.json 00000000000000000002.js...

Data Engineering

45 Views
0 replies
0 kudos

yesterday

by dheeraj98 • New Contributor

yesterday

48 Views
1 replies
2 kudos

dbt Cloud + Databricks SQL Warehouse with microbatching (48h lookback) — intermittent failures

Hey everyone,I’m currently running hourly dbt Cloud job (27 models with 8 threads) on a Databricks SQL Warehouse using the dbt microbatch approach, with a 48-hour lookback window.But I’m running into some recurring issues:Jobs failing intermittentlyO...

Data Engineering

48 Views
1 replies
2 kudos

yesterday

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor

yesterday

2 kudos

Here are few options you can try and see if it resolves your issue.1. SQL Warehouse TuningUse Serverless SQL Warehouse with Photon for faster spin-up and query execution. [docs.getdbt.com]Size Appropriately: Start with Medium or Large, and enable au...

2 kudos

yesterday

by Oumeima • New Contributor II

07-02-2025 9:57:51 AM

1876 Views
2 replies
2 kudos

Resolved! I can't use my own .whl package in Databricks app with databricks asset bundles

I am building a databricks app using databricks asset bundles. I need to use a helpers packages that i built as an artifact and using in other resources outside the app. The only way to use it is to have the built package inside the app source code f...

Data Engineering

1876 Views
2 replies
2 kudos

07-02-2025 9:57:51 AM

View Replies

Latest Reply

stbjelcevic
Databricks Employee

a week ago

2 kudos

Hi @Oumeima , One potential way around this is to upload the wheel file into a Unity Catalog volume or workspace file. For the volume route, reference it directly in your app’s requirements.txt using an absolute /Volumes/<catalog>/<schema>/<volume>/....

2 kudos

a week ago

1 More Replies

Databricks Community

Forum Posts

On Prem MS sql to Azure Databricks

How have you set up a governance structure (data access control, workspace management, cluster polic

i am getting the below error while making a GET request to job in databrick after successfully running it

DLT | Communication lost with driver | Cluster was not reachable for 120 seconds

Inquiry on GraphFrame Library Upgrade Timeline for Databricks Runtime for Machine Learning

Getting non-traceable NullPointerExceptions

Convert EBCDIC (Binary) file format to ASCII

Simple integration to push data from third-party into a client's Databricks instance

Location of spark.scheduler.allocation.file

Reading empty json file in serverless gives error

Resolved! org.apache.spark.SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC]

Pull workspace url and workspace name using databricks-sdk / programaticaly in notebook

Drop Delta Log seems not to be working

dbt Cloud + Databricks SQL Warehouse with microbatching (48h lookback) — intermittent failures

Resolved! I can't use my own .whl package in Databricks app with databricks asset bundles

Join Us as a Local Community Builder!

Hive Metastore End of Life

DLT Pipeline with unknown deleted source data

[Databricks Asset Bundles] Bug: driver_node_type_i...

Global Parameter at the Pipeline level in Lakeflow...

oracle sequence number