Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-28-2026 10:53 AM
Hey @Kirankumarbs !
applyInPandas runs in separate Python worker processes, so your driver-side logging.basicConfig(...) doesn’t apply there. That’s why print(..., flush=True) works (stdout from the worker is wired through), but logger.info() doesn’t.
1. Best if you’re on DBR 18.x+: Python Worker Logs
If you can run DBR 18.0+ classic or 18.2+ shared/serverless, just enable Python worker logging and use normal logging in the UDF:
spark.conf.set("spark.sql.pyspark.worker.logging.enabled", "true")
def my_group_fn(pdf):
import logging
logger = logging.getLogger("my.udf")
logger.setLevel(logging.INFO)
logger.info(f"group size: {len(pdf)}")
...
return out_pdf
# after the job
logs = spark.tvf.python_worker_logs()
logs.filter("logger = 'my.udf'").select("level", "msg", "context").show(truncate=False)
2. Works everywhere: configure logging inside the UDF
Your setup_worker_logging() approach is a solid production pattern:
import logging, sys
def setup_worker_logging(level=logging.INFO):
logging.basicConfig(
level=level,
format='[%(levelname)s] [%(asctime)s] %(message)s',
handlers=[logging.StreamHandler(sys.stdout)],
force=True,
)
def grouped_fn(pdf):
setup_worker_logging()
logger = logging.getLogger(__name__)
logger.info("visible now")
...
return out_pdf
I hope this helps!
Sarah