Databricks

Ancil · ‎03-02-2023

I am getting below error some time run my databricks notebook from ADF,

If the executor node is one then it works fine, if it increases 2 or more some times its failing on same data.

Cluster Detail : Standard_F4s_v2 · Workers: Standard_F4s_v2 · 1-8 workers · 11.2 (includes Apache Spark 3.3.0, Scala 2.12)

File "/databricks/python/lib/python3.9/site-packages/Levenshtein/__init__.py", line 343 in opcodes

File "/databricks/python/lib/python3.9/site-packages/fuzzywuzzy/StringMatcher.py", line 45 in get_opcodes

File "/databricks/python/lib/python3.9/site-packages/fuzzywuzzy/StringMatcher.py", line 58 in get_matching_blocks

File "/databricks/python/lib/python3.9/site-packages/fuzzywuzzy/fuzz.py", line 47 in partial_ratio

File "/databricks/python/lib/python3.9/site-packages/fuzzywuzzy/utils.py", line 47 in decorator

File "/databricks/python/lib/python3.9/site-packages/fuzzywuzzy/utils.py", line 29 in decorator

File "/databricks/python/lib/python3.9/site-packages/fuzzywuzzy/utils.py", line 38 in decorator

File "/databricks/python/lib/python3.9/site-packages/my_package/my_function.py", line 30 in scrap_url

File "/databricks/python/lib/python3.9/site-packages/my_package/my_function.py", line 124 in my_function

File "<command-1514877556254536>", line 20 in my_function

File "<command-1514877556254534>", line 7 in my_function_01

File "/databricks/spark/python/pyspark/util.py", line 84 in wrapper

File "/databricks/spark/python/pyspark/worker.py", line 130 in <lambda>

File "/databricks/spark/python/pyspark/worker.py", line 591 in mapper

File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 384 in init_stream_yield_batches

File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 91 in dump_stream

File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 391 in dump_stream

File "/databricks/spark/python/pyspark/worker.py", line 885 in process

File "/databricks/spark/python/pyspark/worker.py", line 893 in main

File "/databricks/spark/python/pyspark/daemon.py", line 79 in worker

File "/databricks/spark/python/pyspark/daemon.py", line 204 in manager

File "/databricks/spark/python/pyspark/daemon.py", line 229 in <module>

File "/usr/lib/python3.9/runpy.py", line 87 in _run_code

File "/usr/lib/python3.9/runpy.py", line 197 in _run_module_as_main

Any one please help me on this . is that time lag issue with a task?

Some times its working and sometimes dont.

If didn't add package "Levenshtein" 3 tasks are taking 2 h to complete.

Ancil · ‎03-02-2023

@Kaniz Fatma Please help me on this

FelixLe · ‎03-06-2023

My solution is updating Python. I updated python to 3.10.9 solved my problem, when I try to use SparkTrial() in hpyeropt fmin()

Error: org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)

Anonymous · ‎03-16-2023

Hi @Ancil P A

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.

We'd love to hear from you.

Thanks!

Ancil · ‎03-23-2023

Not resolved issue, I trimmed data length to process as 1cr then its working

Anonymous · ‎03-30-2023

Hi @Ancil P A

Help us build a vibrant and resourceful community by recognizing and highlighting insightful contributions. Mark the best answers and show your appreciation!

swethaNandan · ‎04-04-2023

Hi @Ancil P A

Can you give paste the complete stacktrace from the failed task (from failed stage 10.0) and the code snippet that you are trying to run in the notebook . Also, do you think you can raise a databricks support ticket for the same?

Ancil · ‎04-10-2023

I have pasted all the logs available in data bricks

Ancil · ‎05-09-2023

Hi @Swetha Nandajan

Please find the full error log. I have job running every one hour my notebook worked for 20 runs. after that am getting below error.

Am creating new job cluster for every run from ADF.

Cluster details : Driver: Standard_F32s_v2 · Workers: Standard_F32s_v2 · 1 worker · 11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12).

I tried the data with error in my QA environment its working, but in testing environment after 20 runs am getting below error.

Error

Attached as a file

Please help me

Databricks

Job aborted due to stage failure: Task 1863 in stage 10.0 failed 4 times, most recent failure: Lost task 1863.3 in stage 10.0 (TID 2021) (10.0.4.7 executor 2): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed): Fatal Python erro

Unity Catalog Lakeguard: Industry-first and only data governance for multi-user Apache™ Spark cluste

Announcing the General Availability of Databricks Asset Bundles

Register now and save 50% on training at Data + AI Summit!

How to successfully build GenAI applications

Meet DBRX, the New Standard for High-Quality LLMs