ModuleNotFoundError when run with foreachBatch on serverless mode
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-29-2024 09:52 PM
I using Notebooks to do some transformations
I install a new whl:
%pip install --force-reinstall /Workspace/<my_lib>.whl
%restart_python
Then I successfully import the installed lib
from my_lib.core import test
However when I run my code with foreachBatch it raises ModuleNotFoundError: No module named 'my_lib'.
This is my code:
from my_lib.utils import clogs
logs = clogs.logs()
def _test(df, b):
logs.add_logs('test')
mystream = spark.readStream\
.table('my_tbale') \
.writeStream\
.format("delta")\
.foreachBatch(_test)\
.trigger(once=True) \
.start()
mystream.awaitTermination()
streaming_silver.awaitTermination()
It raises an error: ModuleNotFoundError: No module named 'my_lib'.
Please help
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-02-2024 10:00 PM
Thank @Retired_mod for your response.
Today, I re-run my job again, without any changes. It doesn’t raise module not found my_lib as I mentioned above, but it raises the Access Denied on my S3 bucket. I don't see anywhere to set my IAMr or instance profile on the serverless as I did with provision compute.