Data Engineering

Forum Posts

Sorted by:

by bhanuteja_1 • New Contributor II

11-08-2024 9:22:45 PM

447 Views
1 replies
0 kudos

NoClassDefFoundError: org/apache/spark/sql/SparkSession$

NoClassDefFoundError: org/apache/spark/sql/SparkSession$ at com.microsoft.nebula.common.ConfigProvider.<init>(configProvider.scala:17) at $linef37a348949c145718a08f6b29642317b35.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$...

Data Engineering

447 Views
1 replies
0 kudos

11-08-2024 9:22:45 PM

View Replies

Latest Reply

VZLA
Databricks Employee

11-11-2024 4:16:06 AM

0 kudos

Hi @bhanuteja_1 , Where are you running this from? Based on the short output, it looks like from a Databricks Notebook, but it would be a weird error unless you're having some classpath overrides or jar conflicts, leading to this error; it is simply ...

0 kudos

11-11-2024 4:16:06 AM

by qwerty1 • Contributor

03-23-2023 5:46:15 AM

6072 Views
7 replies
19 kudos

Resolved! When will databricks runtime be released for Scala 2.13?

I see that spark fully supports Scala 2.13. I wonder why is there no databricks runtime with Scala 2.13 yet. Any plans on making this available? It would be super useful.

Data Engineering

6072 Views
7 replies
19 kudos

03-23-2023 5:46:15 AM

View Replies

Latest Reply

guersam
New Contributor II

11-11-2024 1:13:41 AM

19 kudos

I agree with @777. As Scala 3 is getting mature and there are more real use cases with Scala 3 on Spark now, support for Scala 2.13 will be valuable to users including us.I think the recent upgrade of Databricks runtime from JDK 8 to 17 was one of a ...

19 kudos

11-11-2024 1:13:41 AM

6 More Replies

by sathya08 • New Contributor III

11-06-2024 2:40:24 PM

1907 Views
3 replies
0 kudos

Databricks Python function achieving Parallelism

Hello everyone,I have a very basic question wrt Databricks spark parallelism.I have a python function within a for loop, so I believe this is running sequentially.Databricks cluster is enabled with Photon and with Spark 15x, does that mean the driver...

Data Engineering

1907 Views
3 replies
0 kudos

11-06-2024 2:40:24 PM

View Replies

Latest Reply

sathya08
New Contributor III

11-10-2024 9:06:55 PM

0 kudos

any help here , thanks

0 kudos

11-10-2024 9:06:55 PM

2 More Replies

by TeachingWithDat • New Contributor II

11-02-2022 9:54:22 AM

6975 Views
3 replies
2 kudos

I am getting this error: com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: com.databricks.rpc.UnknownRemoteException: Remote exception occurred:

I am teaching a class for BYU Idaho and every table in every database has been imploded for my class. We keep getting this error:com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: com.databricks.rpc.UnknownRemoteException: ...

Data Engineering

6975 Views
3 replies
2 kudos

11-02-2022 9:54:22 AM

View Replies

Latest Reply

aparna123
New Contributor II

11-10-2024 5:56:45 PM

2 kudos

i am facing the issue before i trying to execute a code error message:com.databricks.rpc.UnknownRemoteException: Remote exception occurred:

2 kudos

11-10-2024 5:56:45 PM

2 More Replies

by User16685683696 • Databricks Employee

04-06-2023 12:17:53 PM

1707 Views
1 replies
2 kudos

Free Training: Databricks Lakehouse Fundamentals The demand for technology roles is only growing – it's projected that over 150 million jobs will ...

Free Training: Databricks Lakehouse FundamentalsThe demand for technology roles is only growing – it's projected that over 150 million jobs will be added in the next five years. Across industries and regions, this is translating to increased demand f...

Data Engineering

1707 Views
1 replies
2 kudos

04-06-2023 12:17:53 PM

View Replies

Latest Reply

Eddie_AZ
New Contributor II

11-10-2024 10:43:30 AM

2 kudos

I watched all 4 videos but getting an error when I try to take the test. How do I complete the test and get my badge?

2 kudos

11-10-2024 10:43:30 AM

by Gaurav_Lokhande • New Contributor II

10-28-2024 10:38:41 PM

1679 Views
7 replies
3 kudos

We are trying to connect to AWS RDS MySQL instance from DBX with PySpark using JDBC

We are trying to connect to AWS RDS MySQL instance from DBX with PySpark using JDBC: jdbc_df = (spark.read.format("jdbc").options(url=f"jdbc:mysql://{creds['host']}:{creds['port']}/{creds['database']}", driver="com.mysql.cj.jdbc.Driver", dbtable="(SE...

Data Engineering

1679 Views
7 replies
3 kudos

10-28-2024 10:38:41 PM

View Replies

Latest Reply

arjun_kr
Databricks Employee

11-09-2024 11:29:22 AM

3 kudos

@Gaurav_Lokhande With Spark JDBC usage, connectivity happens between your Databricks VPC (in your AWS account) and RDS VPC, assuming you are using non-serverless clusters. You may need to ensure this connectivity works (like by peering).

3 kudos

11-09-2024 11:29:22 AM

6 More Replies

by trentlglover • New Contributor

11-08-2024 9:16:39 AM

414 Views
1 replies
0 kudos

Notebooks running long in workflow

I have deployed a new databricks environment for development. I've copied a workflow from production to this environment with exactly the same compute configuration. Four notebooks that complete within minutes do not complete after 2 hours in develop...

Data Engineering

414 Views
1 replies
0 kudos

11-08-2024 9:16:39 AM

View Replies

Latest Reply

Alberto_Umana
Databricks Employee

11-08-2024 1:26:28 PM

0 kudos

Hi @trentlglover, It sounds like you're experiencing a significant performance issue with your notebooks in the new development environment. Here are a few potential areas to investigate: Cluster Configuration: Even though you mentioned that the comp...

0 kudos

11-08-2024 1:26:28 PM

by isai-ds • New Contributor

11-08-2024 11:59:27 AM

341 Views
0 replies
0 kudos

Salesforce LakeFlow connect - Deletion Salesforce records

Hello, I am new in databricks and related to data engineering. I am running a POC to sync data between a Salesforce sandbox and Databricks using LakeFlow connect.I already make the connection and i successfully sync data between salesforce and databr...

Data Engineering

341 Views
0 replies
0 kudos

11-08-2024 11:59:27 AM

by RajeshRK • Contributor II

03-17-2023 4:06:45 AM

12638 Views
7 replies
3 kudos

Resolved! Download event, driver, and executor logs

Hi Team, I can see logs in Databricks console by navigating workflow -> job name -> logs. These logs are very generic like stdout, stderr and log4-avtive.log. How to download event, driver, and executor logs at once for a job? Regards,Rajesh.

Data Engineering

12638 Views
7 replies
3 kudos

03-17-2023 4:06:45 AM

View Replies

Latest Reply

RajeshRK
Contributor II

03-21-2023 2:13:22 AM

3 kudos

@Kaniz Fatma @John Lourdu @Vidula Khanna Hi Team,I managed to download logs using the Databricks command line as below: Installed the Databricks command line on my Desktop (pip install databricks-cli)Configured the Databricks cluster URL and perso...

3 kudos

03-21-2023 2:13:22 AM

6 More Replies

by hkmodi • New Contributor II

11-07-2024 11:28:54 AM

1187 Views
3 replies
0 kudos

Perform row_number() filter in autoloader

I have created an autoloader job that reads data from S3 (files with no extension) having json using (cloudFiles.format, text). Now this job is suppose to run every 4 hours and read all the new data that arrived. But before writing into a delta table...

Data Engineering

1187 Views
3 replies
0 kudos

11-07-2024 11:28:54 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

11-08-2024 4:17:59 AM

0 kudos

HI @hkmodi ,Basically, as @daniel_sahal said, bronze layer should reflect the source system. The silver layer is dedicated for deduplication/cleaning/enrichment of dataset. If you still need to deduplicate at bronze layer you have 2 options:- use me...

0 kudos

11-08-2024 4:17:59 AM

2 More Replies

by vibhakar • New Contributor

02-21-2023 7:36:45 AM

5027 Views
3 replies
1 kudos

Not able to mount ADLS Gen2 in Data bricks

py4j.security.Py4JSecurityException: Method public com.databricks.backend.daemon.dbutils.DBUtilsCore$Result com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(java.lang.String,java.lang.String,java.lang.String,java.lang.String,java.util.Map) is ...

Data Engineering

5027 Views
3 replies
1 kudos

02-21-2023 7:36:45 AM

View Replies

Latest Reply

cpradeep
New Contributor III

11-08-2024 6:15:17 AM

1 kudos

Hi , have you sorted this issue ? can you please let me know the solution?

1 kudos

11-08-2024 6:15:17 AM

2 More Replies

by fabien_arnaud • New Contributor II

10-21-2024 1:58:18 AM

1369 Views
6 replies
0 kudos

Data shifted when a pyspark dataframe column only contains a comma

I have a dataframe containing several columns among which 1 contains, for one specific record, just a comma, nothing else.When displaying the dataframe with the commanddisplay(df_input.where(col("erp_vendor_cd") == 'B6SA-VEN0008838')) The data is dis...

Data Engineering

1369 Views
6 replies
0 kudos

10-21-2024 1:58:18 AM

View Replies

Latest Reply

MilesMartinez
New Contributor II

11-08-2024 5:21:30 AM

0 kudos

Thank you so much for the solution.

0 kudos

11-08-2024 5:21:30 AM

5 More Replies

by oakhill • New Contributor III

11-08-2024 12:26:38 AM

360 Views
1 replies
0 kudos

How to optimize queries on a 150B table? ZORDER, LC or partioning?

Hi!I am struggling to understand how to properly manage my table to make queries effective. My table has columns date_time_utc, car_id, car_owner etc. date_time_utc, car_id and position is usually the ZORDER or Liquid Clustering-columns.Selecting max...

Data Engineering

360 Views
1 replies
0 kudos

11-08-2024 12:26:38 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

11-08-2024 4:10:48 AM

0 kudos

1. According to the databricks yes But as always, I recommend to perform benchamarks yourself. There a lot of blog posts, that are saying that it's not alway the case. Yesterday, I was at data community event and presenter did several benchmark and ...

0 kudos

11-08-2024 4:10:48 AM

by AlvaroCM • New Contributor III

09-11-2024 2:15:08 AM

811 Views
2 replies
0 kudos

Resolved! DLT error at validation

Hello,I'm creating a DLT pipeline with Databricks on AWS. After creating an external location for my bucket, I encountered the following error:DataPlaneException: [DLT ERROR CODE: CLUSTER_LAUNCH_FAILURE.CLIENT_ERROR] Failed to launch pipeline cluster...

Data Engineering

811 Views
2 replies
0 kudos

09-11-2024 2:15:08 AM

View Replies

Latest Reply

AlvaroCM
New Contributor III

11-08-2024 1:02:31 AM

0 kudos

Hi!The error was related to the roles and permissions created when the workspace was set up. I reloaded the setup script in a new workspace, and it worked without problems.Hope it helps anyone in the future.Thanks!

0 kudos

11-08-2024 1:02:31 AM

1 More Replies

by AntonDBUser • New Contributor III

10-17-2024 4:24:56 AM

670 Views
1 replies
1 kudos

Lakehouse Federation with OAuth connection to Snowflake

Hi!We have a lot use cases were we need to load data from Snowflake into Databricks, where users are using both R and Python for further analysis and machine learning. For this we have been using Lakehouse Federation combined with basic auth, but are...

Data Engineering

670 Views
1 replies
1 kudos

10-17-2024 4:24:56 AM

View Replies

Latest Reply

AntonDBUser
New Contributor III

11-07-2024 11:03:26 PM

1 kudos

For anyone interested: We solved this by building an OAuth integration to Snowflake ourselfs using Entra ID: https://community.snowflake.com/s/article/External-oAuth-Token-Generation-using-Azure-ADWe also created some simple Python and R-packages tha...

1 kudos

11-07-2024 11:03:26 PM

User

Count

1611

768

347

286

252

Databricks Community

Forum Posts

NoClassDefFoundError: org/apache/spark/sql/SparkSession$

Resolved! When will databricks runtime be released for Scala 2.13?

Databricks Python function achieving Parallelism

I am getting this error: com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: com.databricks.rpc.UnknownRemoteException: Remote exception occurred:

Free Training: Databricks Lakehouse Fundamentals The demand for technology roles is only growing – it's projected that over 150 million jobs will ...

We are trying to connect to AWS RDS MySQL instance from DBX with PySpark using JDBC

Notebooks running long in workflow

Salesforce LakeFlow connect - Deletion Salesforce records

Resolved! Download event, driver, and executor logs

Perform row_number() filter in autoloader

Not able to mount ADLS Gen2 in Data bricks

Data shifted when a pyspark dataframe column only contains a comma

How to optimize queries on a 150B table? ZORDER, LC or partioning?

Resolved! DLT error at validation

Lakehouse Federation with OAuth connection to Snowflake

Join Us as a Local Community Builder!

Unity Catalog Table in Databricks Asset Bundle

Databricks data engineer associate exam

How to delete/empty notebook output

Databricks Cluster Policies

toml file syntax highlighting