Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-20-2025 06:41 AM
Failed to create Python data source instance: Traceback (most recent call last):
File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length
return self.loads(obj)
^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/serializers.py", line 617, in loads
return cloudpickle.loads(obj, encoding=encoding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'src'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/databricks/spark/python/pyspark/sql/worker/create_data_source.py", line 82, in main
data_source_cls = read_command(pickleSer, infile)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/worker_util.py", line 71, in read_command
command = serializer._read_with_length(file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/serializers.py", line 196, in _read_with_length
raise SerializationError("Caused by " + traceback.format_exc())
pyspark.serializers.SerializationError: Caused by Traceback (most recent call last):
File "/databricks/spark/python/pyspark/serializers.py", line 192, in _read_with_length
return self.loads(obj)
^^^^^^^^^^^^^^^
File "/databricks/spark/python/pyspark/serializers.py", line 617, in loads
return cloudpickle.loads(obj, encoding=encoding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^@hiryucodes were you ever able to come up with a solution to this issue? I've been writing a custom data source/reader and trying to validate within a notebook. I get the same issue when i try to load in the source by importing from the asset bundle.
That said, i can make it work by defining everything in place. Not sure if the issue is spark context, and more an issue of the cloudpickle that is applied to the class (and then promptly reloaded when the task is executed).