cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

JKR
by New Contributor III
  • 1638 Views
  • 2 replies
  • 1 kudos

Resolved! Got Failure: com.databricks.backend.common.rpc.SparkDriverExceptions$ReplFatalException error

Job is scheduled on interactive cluster, and it failed with below error and in the next scheduled run it ran fine. I want to why this error occurred and how can I prevent from occurring this again.How to debug these types of error?   com.databricks.b...

  • 1638 Views
  • 2 replies
  • 1 kudos
Latest Reply
Tharun-Kumar
Honored Contributor II
  • 1 kudos

@JKR Could you try setting the configurations below at the cluster level and retry the job?spark.databricks.python.defaultPythonRepl pythonshellspark.databricks.pyspark.py4j.pinnedThread false

  • 1 kudos
1 More Replies
ivanychev
by Contributor
  • 1023 Views
  • 2 replies
  • 0 kudos

Mount Workspace to Docker container

Is there a way to mount Workspace folder (WSFS) to the Docker container if I'm using the Databricks Container Services ofr running a general purpose cluster?If I create a cluster without a Docker image, the `!ls` command in Databricks notebook return...

Data Engineering
Docker
Mount
Workspace
  • 1023 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16539034020
Contributor II
  • 0 kudos

Hello:Thanks for contacting Databricks Support! I'm afraid that mounting the WSFS directly into a Docker container isn't directly supported. The Databricks workspace is a specialized environment and isn't directly analogous to a regular filesystem. W...

  • 0 kudos
1 More Replies
Smitha1
by Valued Contributor II
  • 1664 Views
  • 9 replies
  • 3 kudos

Databricks Certified Associate Developer for Apache Spark 3.0

Databricks Certified Associate Developer for Apache Spark 3.0

  • 1664 Views
  • 9 replies
  • 3 kudos
Latest Reply
Shivam_Patil
New Contributor II
  • 3 kudos

Hey I am looking for sample papers for the above exam other than the one provided by databricks do any one have any idea about it

  • 3 kudos
8 More Replies
abhaigh
by New Contributor III
  • 3452 Views
  • 2 replies
  • 0 kudos

Resolved! Azure Shared Clusters - P4J Security Exception on non-whitelisted classes

Hi allHaving some fun trying to run a notebook on a shared UC-aware, shared cluster - I keep on running into this error:py4j.security.Py4JSecurityException: Method public static org.apache.spark.sql.SparkSession org.apache.sedona.spark.SedonaContext....

  • 3452 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @abhaigh , Certainly! It seems you’re encountering a security issue related to the Py4J framework when running your notebook on a shared cluster.    Let’s address this and explore potential solutions:   Py4J Security Exception: The error message y...

  • 0 kudos
1 More Replies
210573
by New Contributor
  • 6363 Views
  • 4 replies
  • 1 kudos

Internal error. Attach your notebook to a different cluster or restart the current cluster.

Started getting this error while running all the scripts. All the scripts were running fine before. I tried de-attaching and also restart nothing seems to work.Internal error. Attach your notebook to a different cluster or restart the current cluste...

  • 6363 Views
  • 4 replies
  • 1 kudos
Latest Reply
tieu_quyen
New Contributor II
  • 1 kudos

Hi @210573 (Customer)​ ,I got the same error, tried to restart and create a new cluster but the solution does not work. What I did to fix the issue: Instead of putting in function, break the code out to run line by line. I just want to see where the ...

  • 1 kudos
3 More Replies
TaBorjaTa
by New Contributor II
  • 3366 Views
  • 1 replies
  • 2 kudos

Pytest imports of sibling modules when using Databricks for VSCode

Hello all, I am following the Databrick's documentation on unit testing found here: Run tests with pytest for the Databricks extension for Visual Studio Code - Azure Databricks | Microsoft LearnHowever, when taking it a step further I get an ImportEr...

Data Engineering
pytest
VSCode
  • 3366 Views
  • 1 replies
  • 2 kudos
Latest Reply
Trifa
New Contributor II
  • 2 kudos

HelloImport errors happen often with Pytest. To Debug this error you can add this in your "test_myfunction_test.py":import sys # printing all directories for # interpreter to search sys.pathsys.path is a built-in variable within the sys module. I...

  • 2 kudos
AFox
by Contributor
  • 2199 Views
  • 7 replies
  • 0 kudos

databricks-connector: Error: Cluster MASKED is in unexpected state Pending.

Is there a way to make databricks-connector wait for cluster to be running?Details:databricks-connector==13.1.0 and the python minor version of cluster and environment are both 3.10If the cluster is not running this will start it, but any commands af...

  • 2199 Views
  • 7 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @AFox , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 0 kudos
6 More Replies
AndyAtINX
by New Contributor III
  • 1511 Views
  • 4 replies
  • 1 kudos

Resolved! Error inviting user to workspace "Failed to add user: A user with email ... or username ... in different cases already exist in the account"

We have 3 workspaces - 1 old version in one AWS account, 2 latest versions in another.We are PAYG full edition, not using SSO.Our admins (existing DBX users in the `admins` group) can invite new users via the Admin Console from the 1 old and 1 new wo...

  • 1511 Views
  • 4 replies
  • 1 kudos
Latest Reply
Schneider-Elect
New Contributor II
  • 1 kudos

We are facing same issue, We are on azure. @AndyAtINX you mean if user exist in workspace with abc@gmail.com we should add the user in workspace2 with abc@gmail.com not ABC@GMAIL.COM. if this the case we tried this and its not working for us.

  • 1 kudos
3 More Replies
OliverCadman
by New Contributor III
  • 3182 Views
  • 2 replies
  • 0 kudos

DUPLICATE: Missing 'DBAcademy DLT' as a Cluster Policy when creating Delta Live Tables pipeline

Good afternoon,I'm currently going through Module 4 of the Data Engineering Associate pathway, specifically lesson 4.1 - DLT UI Walkthrough. We are instructed to specify the Cluster Policy as 'DBAcademy DLT' when configuring the pipeline. However, th...

Data Engineering
Data engineer Associate
dlt
pipeline
pipeline configuration
  • 3182 Views
  • 2 replies
  • 0 kudos
Latest Reply
SeRo
New Contributor II
  • 0 kudos

The policy will be available after running the notebook /Users/<YOUR USER NAME>/Data Engineering with Databricks - v3.1.4/Includes/Workspace-Setup 

  • 0 kudos
1 More Replies
Shankar
by New Contributor III
  • 2620 Views
  • 1 replies
  • 1 kudos

How does deletedFileRetentionDuration and logRetentionDuration associated with Vacuum?

I am trying to learn more about Vacuum operation and came across the two properties: delta.deletedFileRetentionDurationdelta.logRetentionDurationSo, let's say I have a delta table where few records/files have been deleted. The delta.deletedFileRetent...

Data Engineering
delta
deltatables
vacuum
  • 2620 Views
  • 1 replies
  • 1 kudos
Latest Reply
dasiekr
New Contributor II
  • 1 kudos

No answers for those question?I also find it not clear enough to understand this process of underlying parquet files retention.Can someone help here?

  • 1 kudos
rbricks007
by New Contributor II
  • 1049 Views
  • 2 replies
  • 0 kudos

Resolved! Trying to use pivot function with pyspark for count aggregate

I'm trying this code but getting the following error testDF = (eventsDF .groupBy("user_id") .pivot("event_name") .count("event_name")) TypeError: _api() takes 1 positional argument but 2 were givenPlease guide how to fix...

Data Engineering
count
pivot
python
  • 1049 Views
  • 2 replies
  • 0 kudos
Latest Reply
Krishnamatta
New Contributor III
  • 0 kudos

Try thisfrom pyspark.sql import functions as F testDF = (eventsDF .groupBy("user_id") .pivot("event_name") .agg(F.count("event_name")))  

  • 0 kudos
1 More Replies
rt-slowth
by Contributor
  • 531 Views
  • 1 replies
  • 0 kudos

Resolved! how to use dlt module in streaming pipeline

If anyone has example code for building a CDC live streaming pipeline generated by AWS DMS using import dlt, I'd love to see it.I'm currently able to see the parquet file starting with Load on the first full load to S3 and the cdc parquet file after ...

  • 531 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @rt-slowth ,  Certainly! Let’s explore how to create a Change Data Capture (CDC) live streaming pipeline using Delta Live Tables and AWS Database Migration Service (DMS). Delta Live Tables and AWS DMS: Delta Live Tables is an open-source storage ...

  • 0 kudos
alexiswl
by Contributor
  • 3107 Views
  • 4 replies
  • 0 kudos

Resolved! Create a UDF Table Function with DLT in UC

Hello, I am trying to generate a DLT but need to use a UDF Table Function in the process.  This is what I have so far, everything works (without e CREATE OR REFRESH LIVE TABLE wrapper)```sqlCREATE OR REPLACE FUNCTION silver.portal.get_workflows_from_...

  • 3107 Views
  • 4 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @alexiswl , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution. 

  • 0 kudos
3 More Replies
Labels
Top Kudoed Authors