Data Engineering

Forum Posts

Sorted by:

by deano2025 • New Contributor II

yesterday

13 Views
0 replies
0 kudos

Databricks asset bundles CI/CD design for github actions

We are wanting to use Databricks asset bundles and deploy code changes and tests using github actions. We have seen lots of content online, but nothing concrete on how this is done at scale. So I'm wondering, if we have many changes and therefore man...

Data Engineering

asset bundles

13 Views
0 replies
0 kudos

yesterday

by Dhruv-22 • Contributor

Thursday

42 Views
5 replies
0 kudos

BUG - withColumns in pyspark doesn't handle empty dictionary

Today, while reading a delta load my notebook failed and I wanted to report a bug. The withColumns command does not tolerate an empty dictionary and gives the following error in PySpark.flat_tuple = namedtuple("flat_tuple", ["old_col", "new_col", "lo...

Data Engineering

42 Views
5 replies
0 kudos

Thursday

View Replies

Latest Reply

K_Anudeep
Databricks Employee

yesterday

0 kudos

Hello @Dhruv-22 , I have tested this internally, and this seems to be a bug with the new Serverless env version 4 As a solution, you can try switching the version to 3 as shown bleow and re-run the above code, and it should work.

0 kudos

yesterday

4 More Replies

by ak5har • New Contributor II

03-04-2025 3:41:22 AM

2696 Views
9 replies
2 kudos

Databricks connection to on-prem cloudera

Hello, we are trying to evaluate Databricks solution to extract the data from existing cloudera schema hosted on physical server. We are using the Databricks serverless compute provided by databricks express setup and we assume we will not need t...

Data Engineering

2696 Views
9 replies
2 kudos

03-04-2025 3:41:22 AM

View Replies

Latest Reply

Adrian_Ashley
Visitor

yesterday

2 kudos

I work for a databricks partner called Cirata. Our Data migrator offering allows both data and metadata replication from cloudera to be delivered to the databricks environment , whether this is just delivering it to the ADLS2 object storage or to ...

2 kudos

yesterday

8 More Replies

by pepco • New Contributor II

Thursday

31 Views
2 replies
0 kudos

Resolved! Environment in serverless

I'm playing little bit with on the Databricks free environment and I'm super confused by the documentation vs actual behavior. Maybe you could help me to understand better.For the workspace I can define base environment which I can use in serverless ...

Data Engineering

base environment

serverless

31 Views
2 replies
0 kudos

Thursday

View Replies

Latest Reply

K_Anudeep
Databricks Employee

yesterday

0 kudos

Hello @pepco , Is it possible to use environments with notebook tasks? Yes—but only in a very specific way. Notebook tasks can use base environments, but you don’t attach them in the job’s YAML. You pick the base env in the notebook’s Environment sid...

0 kudos

yesterday

1 More Replies

by JanFalta • New Contributor

yesterday

20 Views
0 replies
0 kudos

Data Masking

Hi all,I need some help on this masking problem. If you create a view with used masking function based on table.The user reading this view has to have read access to underlying table. So theoretically, he can access unmasked data in the table.I would...

Data Engineering

20 Views
0 replies
0 kudos

yesterday

by KKo • Contributor III

Wednesday

320 Views
1 replies
0 kudos

On Prem MS sql to Azure Databricks

Hi allI need to ingest data from on prem MS sql tables using Databricks to Azure Cloud. For the ingest, previously I used notebooks, jdbc connectors, read sql tables and write in unity catalog tables. Now, I want to experiment Databricks connectors f...

Data Engineering

320 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

AbhaySingh
Databricks Employee

yesterday

0 kudos

This is feature is good to go... I can't think of any disadvantages. Here is a guide. https://landang.ca/2025/01/31/simple-data-ingestion-from-sql-server-to-databricks-using-lakeflow-connect/

0 kudos

yesterday

by Suheb • New Contributor

Wednesday

19 Views
1 replies
0 kudos

How have you set up a governance structure (data access control, workspace management, cluster polic

If your company uses Databricks with many people, how do you manage security, organize teams, and control costs — and what tools do you use to make it all work smoothly?

Data Engineering

19 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

AbhaySingh
Databricks Employee

yesterday

0 kudos

Please take a look here to get some initial ideas. https://medium.com/databricks-unity-catalog-sme/a-practical-guide-to-catalog-layout-data-sharing-and-distribution-with-databricks-unity-catalog-763e4c7b7351

0 kudos

yesterday

by him • New Contributor III

08-25-2022 12:08:00 AM

24997 Views
14 replies
10 kudos

i am getting the below error while making a GET request to job in databrick after successfully running it

"error_code": "INVALID_PARAMETER_VALUE", "message": "Retrieving the output of runs with multiple tasks is not supported. Please retrieve the output of each individual task run instead."}

Data Engineering

24997 Views
14 replies
10 kudos

08-25-2022 12:08:00 AM

View Replies

Latest Reply

Octavian1
Contributor

03-20-2025 3:31:21 AM

10 kudos

Hi @Debayan I'd suggest to also mention this explicitly in the documentation of the workspace client for get_run_outputOne has to pay extra attention to the examplerun_id=run.tasks[0].run_id otherwise it can be easily missed.

10 kudos

03-20-2025 3:31:21 AM

13 More Replies

by mkwparth • New Contributor III

Thursday

18 Views
0 replies
0 kudos

DLT | Communication lost with driver | Cluster was not reachable for 120 seconds

Hey Community, I'm facing this error, It says that "com.databricks.pipelines.common.errors.deployment.DeploymentException: Communication lost with driver. Cluster 1030-205818-yu28ft9s was not reachable for 120 seconds" This issue occurred in producti...

Data Engineering

18 Views
0 replies
0 kudos

Thursday

by toproximahk • New Contributor

Monday

132 Views
3 replies
0 kudos

Inquiry on GraphFrame Library Upgrade Timeline for Databricks Runtime for Machine Learning

Thanks for the Databricks community and maintaining such a valuable platform.I would like to inquire if there is a planned timeline for upgrading the GraphFrame library. We’ve noticed that the latest release on GitHub is v0.9.3, while the Databricks ...

Data Engineering

132 Views
3 replies
0 kudos

Monday

View Replies

Latest Reply

Sem-Sinchenko
New Contributor

Thursday

0 kudos

You can try to add to your cluster mvn dependency manually ... For example, for spark 3.5.x it will be like:io.graphframes:graphframes-spark3_2.12:0.10.0and add a PyPi dependency graphframes-py. Adding maven coordinates should download and install al...

0 kudos

Thursday

2 More Replies

by alhuelamo • New Contributor II

12-07-2022 8:14:35 AM

10300 Views
5 replies
1 kudos

Getting non-traceable NullPointerExceptions

We're running a job that's issuing NullPointerException without traces of our job's code.Does anybody know what would be the best course of action when it comes to debugging these issues?The job is a Scala job running on DBR 11.3 LTS.In case it's rel...

Data Engineering

10300 Views
5 replies
1 kudos

12-07-2022 8:14:35 AM

View Replies

Latest Reply

Amora
New Contributor

Thursday

1 kudos

You could try enabling full stack traces and checking the Spark executor logs for hidden errors. Null Pointer Exceptions in Scala on DBR often come from lazy evaluations or missing schema fields during I/O. Reviewing your Data Frame transformations a...

1 kudos

Thursday

4 More Replies

by Phani1 • Valued Contributor II

06-05-2023 10:38:32 PM

4601 Views
4 replies
2 kudos

Convert EBCDIC (Binary) file format to ASCII

Hi Team,How can we convert EBCDIC (Binary) file format to ASCII in databricks? Do we have any libraries in Databricks?

Data Engineering

4601 Views
4 replies
2 kudos

06-05-2023 10:38:32 PM

View Replies

Latest Reply

amulight
New Contributor

Thursday

2 kudos

Hi Phani1 Were you able to do that successfully ? Can you share the details and steps please. Thanks.

2 kudos

Thursday

3 More Replies

by 67 • New Contributor

Thursday

45 Views
1 replies
1 kudos

Simple integration to push data from third-party into a client's Databricks instance

Hi there, we have an industry data platform with multiple customers using it. We provide each customer with their own data every night via .csv. Some of our customers use Databricks, and import their data from us into it.We would like to offer a more...

Data Engineering

45 Views
1 replies
1 kudos

Thursday

View Replies

Latest Reply

jeffreyaven
Databricks Employee

Thursday

1 kudos

You could use external volumes with a Cloudflare R2 bucket as an intermediary - you write the nightly data files to R2 (using S3-compatible API), and your customers create external volumes in their Databricks workspace pointing to their designated R2...

1 kudos

Thursday

by bidek56 • New Contributor III

Thursday

47 Views
0 replies
0 kudos

Location of spark.scheduler.allocation.file

In DBR 164.LTS, I am trying to add the following Spark config: spark.scheduler.allocation.file: file:/Workspace/init/fairscheduler.xmlBut the all purpose cluster is throwing this error Spark error: Driver down cause: com.databricks.backend.daemon.dri...

Data Engineering

47 Views
0 replies
0 kudos

Thursday

by GiriSreerangam • New Contributor III

Wednesday

91 Views
2 replies
1 kudos

Resolved! org.apache.spark.SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC]

Hi EveryoneI am writing a small function, with spark read from a csv and spark write into a table. I could execute this function within the notebook. But, when I register the same function as a unity catalog function and calling it from Playground, i...

Data Engineering

91 Views
2 replies
1 kudos

Wednesday

View Replies

Latest Reply

KaushalVachhani
Databricks Employee

Thursday

1 kudos

Hi @GiriSreerangam, You cannot use a Unity Catalog user-defined function (UDF) in Databricks to perform Spark read from a CSV and write to a table. Unity Catalog Python UDFs execute in a secure, isolated environment without access to the file system ...

1 kudos

Thursday

1 More Replies

Databricks Community

Forum Posts

Databricks asset bundles CI/CD design for github actions

BUG - withColumns in pyspark doesn't handle empty dictionary

Databricks connection to on-prem cloudera

Resolved! Environment in serverless

Data Masking

On Prem MS sql to Azure Databricks

How have you set up a governance structure (data access control, workspace management, cluster polic

i am getting the below error while making a GET request to job in databrick after successfully running it

DLT | Communication lost with driver | Cluster was not reachable for 120 seconds

Inquiry on GraphFrame Library Upgrade Timeline for Databricks Runtime for Machine Learning

Getting non-traceable NullPointerExceptions

Convert EBCDIC (Binary) file format to ASCII

Simple integration to push data from third-party into a client's Databricks instance

Location of spark.scheduler.allocation.file

Resolved! org.apache.spark.SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC]

Join Us as a Local Community Builder!

Conversational Agent App integration with genie in...

Looking for CLI-based SQL formatter for Databricks...

Hive Metastore End of Life

DLT Pipeline with unknown deleted source data

[Databricks Asset Bundles] Bug: driver_node_type_i...