- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-04-2025 10:31 PM
Hi EndreM,
How are you doing today? As per my understanding, From what you’ve described, it looks like the error might be caused by a mismatch in the Python versions between your Databricks Connect client (if you're using something like PyCharm or VS Code) and the Databricks runtime on the cluster. Even though it seems strange since you're running on Databricks, using .collect() or similar actions through Spark Connect can sometimes fail if the environments don’t match exactly. A simple workaround is to try running the same code directly in a Databricks notebook instead of through your IDE—this helps ensure everything is aligned. Also, instead of .limit(1).collect(), try using .sample(False, 0.0001).take(1) or .head(1) which can be lighter and avoid scanning the whole dataset. Let me know if you'd like help reviewing your streaming setup or tuning cluster memory—happy to help you sort it out smoothly!
Regards,
Brahma