Data Engineering

Forum Posts

Sorted by:

by wojciech_jakubo • New Contributor III

06-21-2023 6:25:01 AM

11945 Views
7 replies
3 kudos

Question about monitoring driver memory utilization

Hi databricks/spark experts!I have a piece on pandas-based 3rd party code that I need to execute as a part of a bigger spark pipeline. By nature, pandas-based code is executed on driver node. I ran into out of memory problems and started exploring th...

Data Engineering

11945 Views
7 replies
3 kudos

06-21-2023 6:25:01 AM

View Replies

Latest Reply

Tharun-Kumar
Databricks Employee

08-16-2023 12:59:57 PM

3 kudos

Hi @wojciech_jakubo 1. JVM memory will not be utilized for python related activities. 2. In the image we could only see the storage memory. We also have execution memory which would also be the same. Hence I came up with the executor memory to be of ...

3 kudos

08-16-2023 12:59:57 PM

6 More Replies

by Orianh • Valued Contributor II

05-23-2022 3:33:10 AM

6255 Views
4 replies
3 kudos

function does not exist in JVM ERROR

Hello guys, I'm building a python package that return 1 row from DF at a time inside data bricks environment.To improve the performance of this package i used multiprocessing library in python, I have background process that his whole purpose is to p...

Data Engineering

6255 Views
4 replies
3 kudos

05-23-2022 3:33:10 AM

View Replies

Latest Reply

dineshreddy
New Contributor III

06-27-2023 5:37:14 PM

3 kudos

Using thread instead of processes solved the issue for me

3 kudos

06-27-2023 5:37:14 PM

3 More Replies

by blackcoffeeAR • Contributor

02-02-2023 1:02:53 AM

13175 Views
10 replies
5 kudos

How to use/access in a python notebook a scala library installed from JAR file?

I'm using Azure Event Hubs Connector https://github.com/Azure/azure-event-hubs-spark to connect an Even Hub.When I install this library from Maven , then everything works, I can access lib classes using JVM:connection_string = "<connection_string>" s...

Data Engineering

13175 Views
10 replies
5 kudos

02-02-2023 1:02:53 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-08-2023 7:12:54 PM

5 kudos

Hi @blackcoffee AR Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

5 kudos

04-08-2023 7:12:54 PM

9 More Replies

by Joao_DE • New Contributor III

02-02-2023 7:00:10 AM

2505 Views
4 replies
0 kudos

JDBC connection

Hi everyone! I have a question. For a project I need to establish a jdbc connection using spark.read. My question is when does the connection is deleted. That is because I will read multiple tables from that database, so if I could just create a conn...

Data Engineering

2505 Views
4 replies
0 kudos

02-02-2023 7:00:10 AM

View Replies

Latest Reply

Joao_DE
New Contributor III

02-09-2023 1:21:16 AM

0 kudos

Hi Vidula!I haven´t figure out a solution yet, so any help would be appreciatedThank you!

0 kudos

02-09-2023 1:21:16 AM

3 More Replies

by James_209101 • New Contributor II

10-20-2022 5:33:19 AM

6792 Views
2 replies
5 kudos

Using large dataframe in-memory (data not allowed to be "at rest") results in driver crash and/or out of memory

I'm having trouble working on Databricks with data that we are not allowed to save off or persist in any way. The data comes from an API (which returns a JSON response). We have a scala package on our cluster that makes the queries (almost 6k queries...

Data Engineering

6792 Views
2 replies
5 kudos

10-20-2022 5:33:19 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-27-2022 5:47:52 AM

5 kudos

Hi @James Held Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

5 kudos

11-27-2022 5:47:52 AM

1 More Replies

by ashutoshkashyap • New Contributor II

11-08-2022 9:04:50 PM

2779 Views
0 replies
1 kudos

Py4JError: com.xxxx.TipsScore.score does not exist in the JVM

I have installed score jar file in cluster but getting error while trying to useError: Py4JError: com.xxxxx.tips.TipsScore.score does not exist in the JVMCode: score=spark._jvm.com.xxxx.tips.TipsScore.score

Data Engineering

2779 Views
0 replies
1 kudos

11-08-2022 9:04:50 PM

by ivanychev • Contributor II

07-25-2022 8:03:45 AM

1351 Views
0 replies
1 kudos

How to enable remote JMX monitoring in Databricks?

Adding these optionsEXTRA_JAVA_OPTIONS = ( '-Dcom.sun.management.jmxremote.port=9999', '-Dcom.sun.management.jmxremote.authenticate=false', '-Dcom.sun.management.jmxremote.ssl=false', )is enough in vanilla Apache Spark, but apparently it ...

Data Engineering

1351 Views
0 replies
1 kudos

07-25-2022 8:03:45 AM

by dimoobraznii • New Contributor III

10-19-2021 11:00:08 PM

6952 Views
5 replies
6 kudos

Resolved! Autoloader failed

I used autoloader with TriggerOnce = true and ran it for weeks with schedule. Today it broke:The metadata file in the streaming source checkpoint directory is missing. This metadatafile contains important default options for the stream, so the stream...

Data Engineering

6952 Views
5 replies
6 kudos

10-19-2021 11:00:08 PM

View Replies

Latest Reply

Deepak_Bhutada
Contributor III

10-20-2021 12:41:32 PM

6 kudos

Hi dimoobraznii (Customer)This error comes in streaming when someone makes changes to the streaming checkpoint directory manually or points some streaming type to the checkpoint of some other streaming type. Please check if any changes were made to t...

6 kudos

10-20-2021 12:41:32 PM

4 More Replies