Data Engineering

Forum Posts

Sorted by:

by JKR • New Contributor III

09-01-2023 9:42:40 PM

1638 Views
2 replies
1 kudos

Resolved! Got Failure: com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException error

Job is scheduled on interactive cluster, and it failed with below error and in the next scheduled run it ran fine. I want to why this error occurred and how can I prevent from occurring this again.How to debug these types of error? com.databricks.b...

Data Engineering

1638 Views
2 replies
1 kudos

09-01-2023 9:42:40 PM

View Replies

Latest Reply

Tharun-Kumar
Honored Contributor II

09-06-2023 10:19:18 PM

1 kudos

@JKR Could you try setting the configurations below at the cluster level and retry the job?spark.databricks.python.defaultPythonRepl pythonshellspark.databricks.pyspark.py4j.pinnedThread false

1 kudos

09-06-2023 10:19:18 PM

1 More Replies

by ivanychev • Contributor

08-23-2023 5:58:34 PM

1023 Views
2 replies
0 kudos

Mount Workspace to Docker container

Is there a way to mount Workspace folder (WSFS) to the Docker container if I'm using the Databricks Container Services ofr running a general purpose cluster?If I create a cluster without a Docker image, the `!ls` command in Databricks notebook return...

Data Engineering

Docker

Mount

Workspace

1023 Views
2 replies
0 kudos

08-23-2023 5:58:34 PM

View Replies

Latest Reply

User16539034020
Contributor II

11-02-2023 4:12:42 PM

0 kudos

Hello:Thanks for contacting Databricks Support! I'm afraid that mounting the WSFS directly into a Docker container isn't directly supported. The Databricks workspace is a specialized environment and isn't directly analogous to a regular filesystem. W...

0 kudos

11-02-2023 4:12:42 PM

1 More Replies

by Smitha1 • Valued Contributor II

10-26-2022 12:41:47 AM

1664 Views
9 replies
3 kudos

Databricks Certified Associate Developer for Apache Spark 3.0

Data Engineering

1664 Views
9 replies
3 kudos

10-26-2022 12:41:47 AM

View Replies

Latest Reply

Shivam_Patil
New Contributor II

11-22-2023 4:06:30 AM

3 kudos

Hey I am looking for sample papers for the above exam other than the one provided by databricks do any one have any idea about it

3 kudos

11-22-2023 4:06:30 AM

8 More Replies

by abhaigh • New Contributor III

11-10-2023 6:09:34 AM

3452 Views
2 replies
0 kudos

Resolved! Azure Shared Clusters - P4J Security Exception on non-whitelisted classes

Hi allHaving some fun trying to run a notebook on a shared UC-aware, shared cluster - I keep on running into this error:py4j.security.Py4JSecurityException: Method public static org.apache.spark.sql.SparkSession org.apache.sedona.spark.SedonaContext....

Data Engineering

3452 Views
2 replies
0 kudos

11-10-2023 6:09:34 AM

View Replies

Latest Reply

Kaniz
Community Manager

11-21-2023 12:23:08 AM

0 kudos

Hi @abhaigh , Certainly! It seems you’re encountering a security issue related to the Py4J framework when running your notebook on a shared cluster. Let’s address this and explore potential solutions: Py4J Security Exception: The error message y...

0 kudos

11-21-2023 12:23:08 AM

1 More Replies

by 210573 • New Contributor

09-11-2022 8:35:16 AM

6363 Views
4 replies
1 kudos

Internal error. Attach your notebook to a different cluster or restart the current cluster.

Started getting this error while running all the scripts. All the scripts were running fine before. I tried de-attaching and also restart nothing seems to work.Internal error. Attach your notebook to a different cluster or restart the current cluste...

Data Engineering

6363 Views
4 replies
1 kudos

09-11-2022 8:35:16 AM

View Replies

Latest Reply

tieu_quyen
New Contributor II

01-18-2023 1:46:55 AM

1 kudos

Hi @210573 (Customer) ,I got the same error, tried to restart and create a new cluster but the solution does not work. What I did to fix the issue: Instead of putting in function, break the code out to run line by line. I just want to see where the ...

1 kudos

01-18-2023 1:46:55 AM

3 More Replies

by TaBorjaTa • New Contributor II

11-09-2023 7:39:33 AM

3366 Views
1 replies
2 kudos

Pytest imports of sibling modules when using Databricks for VSCode

Hello all, I am following the Databrick's documentation on unit testing found here: Run tests with pytest for the Databricks extension for Visual Studio Code - Azure Databricks | Microsoft LearnHowever, when taking it a step further I get an ImportEr...

Data Engineering

pytest

VSCode

3366 Views
1 replies
2 kudos

11-09-2023 7:39:33 AM

View Replies

Latest Reply

Trifa
New Contributor II

11-22-2023 1:00:38 AM

2 kudos

HelloImport errors happen often with Pytest. To Debug this error you can add this in your "test_myfunction_test.py":import sys # printing all directories for # interpreter to search sys.pathsys.path is a built-in variable within the sys module. I...

2 kudos

11-22-2023 1:00:38 AM

by AFox • Contributor

07-06-2023 1:30:14 PM

2199 Views
7 replies
0 kudos

databricks-connector: Error: Cluster MASKED is in unexpected state Pending.

Is there a way to make databricks-connector wait for cluster to be running?Details:databricks-connector==13.1.0 and the python minor version of cluster and environment are both 3.10If the cluster is not running this will start it, but any commands af...

Data Engineering

2199 Views
7 replies
0 kudos

07-06-2023 1:30:14 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-21-2023 10:58:23 PM

0 kudos

Hi @AFox , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution.

0 kudos

11-21-2023 10:58:23 PM

6 More Replies

by AndyAtINX • New Contributor III

01-24-2023 7:33:02 PM

1511 Views
4 replies
1 kudos

Resolved! Error inviting user to workspace "Failed to add user: A user with email ... or username ... in different cases already exist in the account"

We have 3 workspaces - 1 old version in one AWS account, 2 latest versions in another.We are PAYG full edition, not using SSO.Our admins (existing DBX users in the `admins` group) can invite new users via the Admin Console from the 1 old and 1 new wo...

Data Engineering

1511 Views
4 replies
1 kudos

01-24-2023 7:33:02 PM

View Replies

Latest Reply

Schneider-Elect
New Contributor II

11-21-2023 6:48:45 AM

1 kudos

We are facing same issue, We are on azure. @AndyAtINX you mean if user exist in workspace with abc@gmail.com we should add the user in workspace2 with abc@gmail.com not ABC@GMAIL.COM. if this the case we tried this and its not working for us.

1 kudos

11-21-2023 6:48:45 AM

3 More Replies

by alex-syk • New Contributor II

11-21-2023 2:15:39 PM

1773 Views
0 replies
0 kudos

Delta table and AnalysisException: [PATH_NOT_FOUND] Path does not exist

I am performing some tests with delta tables. For each test, I write a delta table to Azure Blob Storage. Then I manually delete the delta table. After deleting the table and running my code again, I get this error: AnalysisException: [PATH_NOT_FOUN...

Data Engineering

1773 Views
0 replies
0 kudos

11-21-2023 2:15:39 PM

by databicky • Contributor II

11-03-2023 2:25:28 AM

758 Views
2 replies
2 kudos

how to edit or delete a post in this community after posted?

when trying to edit the post i am not able see the edit option there. @Kaniz

Screenshot_2023-11-03-14-51-47-37_40deb401b9ffe8e1df2f1cc5ba480b12.jpg

Data Engineering

758 Views
2 replies
2 kudos

11-03-2023 2:25:28 AM

View Replies

Latest Reply

AFox
Contributor

11-21-2023 9:49:04 AM

2 kudos

@Kaniz FYI still an issue

2 kudos

11-21-2023 9:49:04 AM

1 More Replies

by OliverCadman • New Contributor III

10-17-2023 3:51:44 AM

3182 Views
2 replies
0 kudos

DUPLICATE: Missing 'DBAcademy DLT' as a Cluster Policy when creating Delta Live Tables pipeline

Good afternoon,I'm currently going through Module 4 of the Data Engineering Associate pathway, specifically lesson 4.1 - DLT UI Walkthrough. We are instructed to specify the Cluster Policy as 'DBAcademy DLT' when configuring the pipeline. However, th...

Data Engineering

Data engineer Associate

dlt

pipeline

pipeline configuration

3182 Views
2 replies
0 kudos

10-17-2023 3:51:44 AM

View Replies

Latest Reply

SeRo
New Contributor II

11-21-2023 3:51:22 AM

0 kudos

The policy will be available after running the notebook /Users/<YOUR USER NAME>/Data Engineering with Databricks - v3.1.4/Includes/Workspace-Setup

0 kudos

11-21-2023 3:51:22 AM

1 More Replies

by Shankar • New Contributor III

07-10-2023 3:58:38 AM

2620 Views
1 replies
1 kudos

How does deletedFileRetentionDuration and logRetentionDuration associated with Vacuum?

I am trying to learn more about Vacuum operation and came across the two properties: delta.deletedFileRetentionDurationdelta.logRetentionDurationSo, let's say I have a delta table where few records/files have been deleted. The delta.deletedFileRetent...

Data Engineering

delta

deltatables

vacuum

2620 Views
1 replies
1 kudos

07-10-2023 3:58:38 AM

View Replies

Latest Reply

dasiekr
New Contributor II

11-21-2023 3:40:29 AM

1 kudos

No answers for those question?I also find it not clear enough to understand this process of underlying parquet files retention.Can someone help here?

1 kudos

11-21-2023 3:40:29 AM

by rbricks007 • New Contributor II

11-09-2023 12:46:48 PM

1049 Views
2 replies
0 kudos

Resolved! Trying to use pivot function with pyspark for count aggregate

I'm trying this code but getting the following error testDF = (eventsDF .groupBy("user_id") .pivot("event_name") .count("event_name")) TypeError: _api() takes 1 positional argument but 2 were givenPlease guide how to fix...

Data Engineering

count

pivot

python

1049 Views
2 replies
0 kudos

11-09-2023 12:46:48 PM

View Replies

Latest Reply

Krishnamatta
New Contributor III

11-10-2023 9:08:39 AM

0 kudos

Try thisfrom pyspark.sql import functions as F testDF = (eventsDF .groupBy("user_id") .pivot("event_name") .agg(F.count("event_name")))

0 kudos

11-10-2023 9:08:39 AM

1 More Replies

by rt-slowth • Contributor

11-21-2023 12:04:41 AM

531 Views
1 replies
0 kudos

Resolved! how to use dlt module in streaming pipeline

If anyone has example code for building a CDC live streaming pipeline generated by AWS DMS using import dlt, I'd love to see it.I'm currently able to see the parquet file starting with Load on the first full load to S3 and the cdc parquet file after ...

Data Engineering

531 Views
1 replies
0 kudos

11-21-2023 12:04:41 AM

View Replies

Latest Reply

Kaniz
Community Manager

11-21-2023 1:05:10 AM

0 kudos

Hi @rt-slowth , Certainly! Let’s explore how to create a Change Data Capture (CDC) live streaming pipeline using Delta Live Tables and AWS Database Migration Service (DMS). Delta Live Tables and AWS DMS: Delta Live Tables is an open-source storage ...

0 kudos

11-21-2023 1:05:10 AM

by alexiswl • Contributor

11-08-2023 6:19:15 PM

3107 Views
4 replies
0 kudos

Resolved! Create a UDF Table Function with DLT in UC

Hello, I am trying to generate a DLT but need to use a UDF Table Function in the process. This is what I have so far, everything works (without e CREATE OR REFRESH LIVE TABLE wrapper)```sqlCREATE OR REPLACE FUNCTION silver.portal.get_workflows_from_...

Data Engineering

3107 Views
4 replies
0 kudos

11-08-2023 6:19:15 PM

View Replies

Latest Reply

Kaniz
Community Manager

11-21-2023 12:45:58 AM

0 kudos

Hi @alexiswl , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution.

0 kudos

11-21-2023 12:45:58 AM

3 More Replies

User

Count

1602

736

343

284

247

Databricks

Forum Posts

Resolved! Got Failure: com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException error

Mount Workspace to Docker container

Databricks Certified Associate Developer for Apache Spark 3.0

Resolved! Azure Shared Clusters - P4J Security Exception on non-whitelisted classes

Internal error. Attach your notebook to a different cluster or restart the current cluster.

Pytest imports of sibling modules when using Databricks for VSCode

databricks-connector: Error: Cluster MASKED is in unexpected state Pending.

Resolved! Error inviting user to workspace "Failed to add user: A user with email ... or username ... in different cases already exist in the account"

Delta table and AnalysisException: [PATH_NOT_FOUND] Path does not exist

how to edit or delete a post in this community after posted?

DUPLICATE: Missing 'DBAcademy DLT' as a Cluster Policy when creating Delta Live Tables pipeline

How does deletedFileRetentionDuration and logRetentionDuration associated with Vacuum?

Resolved! Trying to use pivot function with pyspark for count aggregate

Resolved! how to use dlt module in streaming pipeline

Resolved! Create a UDF Table Function with DLT in UC

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...