Hi databricks/spark experts!I have a piece on pandas-based 3rd party code that I need to execute as a part of a bigger spark pipeline. By nature, pandas-based code is executed on driver node. I ran into out of memory problems and started exploring th...
Hi @wojciech_jakubo 1. JVM memory will not be utilized for python related activities. 2. In the image we could only see the storage memory. We also have execution memory which would also be the same. Hence I came up with the executor memory to be of ...
Hello guys, I'm building a python package that return 1 row from DF at a time inside data bricks environment.To improve the performance of this package i used multiprocessing library in python, I have background process that his whole purpose is to p...
I'm using Azure Event Hubs Connector https://github.com/Azure/azure-event-hubs-spark to connect an Even Hub.When I install this library from Maven , then everything works, I can access lib classes using JVM:connection_string = "<connection_string>"
s...
Hi @blackcoffee AR​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...
Hi everyone! I have a question. For a project I need to establish a jdbc connection using spark.read. My question is when does the connection is deleted. That is because I will read multiple tables from that database, so if I could just create a conn...
I'm having trouble working on Databricks with data that we are not allowed to save off or persist in any way. The data comes from an API (which returns a JSON response). We have a scala package on our cluster that makes the queries (almost 6k queries...
Hi @James Held​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!
I have installed score jar file in cluster but getting error while trying to useError: Py4JError: com.xxxxx.tips.TipsScore.score does not exist in the JVMCode: score=spark._jvm.com.xxxx.tips.TipsScore.score
Adding these optionsEXTRA_JAVA_OPTIONS = (
'-Dcom.sun.management.jmxremote.port=9999',
'-Dcom.sun.management.jmxremote.authenticate=false',
'-Dcom.sun.management.jmxremote.ssl=false',
)is enough in vanilla Apache Spark, but apparently it ...
I used autoloader with TriggerOnce = true and ran it for weeks with schedule. Today it broke:The metadata file in the streaming source checkpoint directory is missing. This metadatafile contains important default options for the stream, so the stream...
Hi dimoobraznii (Customer)This error comes in streaming when someone makes changes to the streaming checkpoint directory manually or points some streaming type to the checkpoint of some other streaming type. Please check if any changes were made to t...