Databricks Community

hukel · ‎10-30-2024

Background:

I've created a small function in a notebook that uses Splunk's splunk-sdk package. The original intention was to call Splunk to execute a search/query, but for the sake of simplicity while testing this issue, the function only prints properties of the service object.

import splunklib.results as results

def get_useragent_string(hostname: str, srcports: int, destip: str, earliest:float):

    return ("connected to %s" % service.info['host'])

The function works fine when I call it from a Python cell in a notebook.

get_useragent_string(hostname="XXX",srcports=49738,destip="104.16.184.241",earliest=1730235533)

The function throws this exception when I try it from a SQL statement

%sql
SELECT get_useragent_string('xxx', 49738, '104.16.184.241', 1730235533)

Any idea what is different between Python and SQL execution?

I think this is the salient part of the exception, but don't understand why this would be different in SQL vs Python.

^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/binding.py", line 1160, in _spliturl
if host.startswith('[') and host.endswith(']'): host = host[1:-1]
^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'startswith'

Full exception is below

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 574.0 failed 4 times, most recent failure: Lost task 0.3 in stage 574.0 (TID 1914) (10.244.27.5 executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/root/.ipykernel/141000/command-3792261566929826-3142948785", line 17, in get_useragent_string
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/client.py", line 3199, in oneshot
    return self.post(search=query,
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/client.py", line 941, in post
    return self.service.post(path, owner=owner, app=app, sharing=sharing, **query)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/binding.py", line 322, in wrapper
    except HTTPError as he:
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/binding.py", line 76, in new_f
    val = f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/binding.py", line 816, in post
    response = self.http.post(path, all_headers, **query)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/binding.py", line 1315, in post
    return self.request(url, message)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/binding.py", line 1338, in request
    raise
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/binding.py", line 1472, in request
    scheme, host, port, path = _spliturl(url)
                               ^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/binding.py", line 1160, in _spliturl
    if host.startswith('[') and host.endswith(']'): host = host[1:-1]
       ^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'startswith'

	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:551)
	at org.apache.spark.sql.execution.python.BasePythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:115)
	at org.apache.spark.sql.execution.python.BasePythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:98)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:507)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage9.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$5(UnsafeRowBatchUtils.scala:88)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$3(UnsafeRowBatchUtils.scala:88)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$1(UnsafeRowBatchUtils.scala:68)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:62)
	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$2(Collector.scala:214)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:211)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:199)
	at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:161)
	at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:134)
	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:155)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.Task.run(Task.scala:102)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$10(Executor.scala:1036)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:110)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:1039)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:926)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$failJobAndIndependentStages$1(DAGScheduler.scala:3998)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3996)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3910)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3897)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3897)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1758)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1741)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1741)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:4256)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:4159)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:4145)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:55)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$runJob$1(DAGScheduler.scala:1404)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1392)
	at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:3153)
	at org.apache.spark.sql.execution.collect.Collector.$anonfun$runSparkJobs$1(Collector.scala:355)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
	at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:299)
	at org.apache.spark.sql.execution.collect.Collector.$anonfun$collect$1(Collector.scala:384)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
	at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:381)
	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:122)
	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:131)
	at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:94)
	at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:90)
	at org.apache.spark.sql.execution.qrc.InternalRowFormat$.collect(cachedSparkResults.scala:78)
	at org.apache.spark.sql.execution.qrc.ResultCacheManager.$anonfun$computeResult$1(ResultCacheManager.scala:552)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
	at org.apache.spark.sql.execution.qrc.ResultCacheManager.collectResult$1(ResultCacheManager.scala:546)
	at org.apache.spark.sql.execution.qrc.ResultCacheManager.$anonfun$computeResult$2(ResultCacheManager.scala:561)
	at org.apache.spark.sql.execution.adaptive.ResultQueryStageExec.$anonfun$doMaterialize$1(QueryStageExec.scala:663)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1184)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$8(SQLExecution.scala:874)
	at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$7(SQLExecution.scala:874)
	at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$6(SQLExecution.scala:874)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$5(SQLExecution.scala:873)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$4(SQLExecution.scala:872)
	at com.databricks.sql.transaction.tahoe.ConcurrencyHelpers$.withOptimisticTransaction(ConcurrencyHelpers.scala:57)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$3(SQLExecution.scala:871)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:195)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$2(SQLExecution.scala:870)
	at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:97)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:855)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
	at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.$anonfun$run$1(SparkThreadLocalForwardingThreadPoolExecutor.scala:157)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.IdentityClaim$.withClaim(IdentityClaim.scala:48)
	at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.$anonfun$runWithCaptured$4(SparkThreadLocalForwardingThreadPoolExecutor.scala:113)
	at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51)
	at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured(SparkThreadLocalForwardingThreadPoolExecutor.scala:112)
	at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured$(SparkThreadLocalForwardingThreadPoolExecutor.scala:89)
	at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.runWithCaptured(SparkThreadLocalForwardingThreadPoolExecutor.scala:154)
	at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.run(SparkThreadLocalForwardingThreadPoolExecutor.scala:157)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/root/.ipykernel/141000/command-3792261566929826-3142948785", line 17, in get_useragent_string
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/client.py", line 3199, in oneshot
    return self.post(search=query,
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/client.py", line 941, in post
    return self.service.post(path, owner=owner, app=app, sharing=sharing, **query)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/binding.py", line 322, in wrapper
    except HTTPError as he:
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/binding.py", line 76, in new_f
    val = f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/binding.py", line 816, in post
    response = self.http.post(path, all_headers, **query)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/binding.py", line 1315, in post
    return self.request(url, message)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/binding.py", line 1338, in request
    raise
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/binding.py", line 1472, in request
    scheme, host, port, path = _spliturl(url)
                               ^^^^^^^^^^^^^^
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-2b093fd4-a851-4495-b00b-d0a105a66366/lib/python3.11/site-packages/splunklib/binding.py", line 1160, in _spliturl
    if host.startswith('[') and host.endswith(']'): host = host[1:-1]
       ^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'startswith'

	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:551)
	at org.apache.spark.sql.execution.python.BasePythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:115)
	at org.apache.spark.sql.execution.python.BasePythonUDFRunner$$anon$2.read(PythonUDFRunner.scala:98)
	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:507)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage9.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:50)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$5(UnsafeRowBatchUtils.scala:88)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$3(UnsafeRowBatchUtils.scala:88)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.$anonfun$encodeUnsafeRows$1(UnsafeRowBatchUtils.scala:68)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:62)
	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$2(Collector.scala:214)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:211)
	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:199)
	at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:161)
	at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:134)
	at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:155)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.scheduler.Task.run(Task.scala:102)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$10(Executor.scala:1036)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:110)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:1039)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:926)
	... 3 more

Mounika_Tarigop · ‎12-06-2024

When you run a Python function in a Python cell, it executes in the local Python environment of the notebook. However, when you call a Python function from a SQL cell, it runs as a UDF within the Spark execution environment.

You need to define the function as a UDF explicitly if you want to use it within SQL cells. This involves using the pyspark.sql.functions.udf decorator to register the function

# Register the UDF with Spark
spark.udf.register("get_useragent_string", get_useragent_string_udf)

After registering the function as a UDF, you can call it from a SQL cell:

%sql
SELECT get_useragent_string('xxx', 49738, '104.16.184.241', 1730235533)

Databricks Community

Python function using Splunk SDK works in Python notebook but not in SQL notebook

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples