Is anyone knows how to use python logger in Databricks python job on serverless
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-15-2024 02:55 AM
I'm trying to use the standard Python logging framework in the Databricks jobs instead of print. I'm doing this by using
spark._jvm.org.apache.log4j.LogManager.getLogger(__name__)however as I'm running this on serverless, I get the following error
[JVM_ATTRIBUTE_NOT_SUPPORTED] Directly accessing the underlying Spark driver JVM using the attribute '_jvm' is not supported on serverless compute. If you require direct access to these fields, consider using a single-user cluster. For more details on compatibility and limitations, check: https://learn.microsoft.com/azure/databricks/release-notes/serverless.html#limitations
---------------------------------------------------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-17-2024 09:41 PM
Yes, direct access to JVM is not allowed on Spark Connect or serverless compute. You could use the python logging framework to log into the output stream handle or any other Handler though.
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
stream_handler = logging.StreamHandler()
stream_handler.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-20-2024 02:59 AM
I did the same thing however the logs don't show up on the execution of the tasks so I took a
import logging
class LoggerBuilder:
def __init__(self, log_level: int = logging.INFO) -> None:
self.logger = logging.getLogger()
self.logger.setLevel(log_level)
formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s")
console_handler = PrintHandler()
console_handler.setLevel(log_level)
console_handler.setFormatter(formatter)
self.logger.addHandler(console_handler)
def get_logger(self):
return self.logger
class PrintHandler(logging.Handler):
def __init__(self) -> None:
logging.Handler.__init__(self=self)
def emit(self, record: logging.LogRecord) -> None:
print(f"{record.levelname}: {record.getMessage()}")different approach with a similar structure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-05-2025 09:00 AM
I get asynchio errors and it crashes notebook/python with @mo_moattar approach. This is something DBRX needs to provide some guidance on. I am very unsure how to do logging on Serverless.