sarahbhord
Databricks Employee
Databricks Employee

Hey @Kirankumarbs !

applyInPandas runs in separate Python worker processes, so your driver-side logging.basicConfig(...) doesn’t apply there. That’s why print(..., flush=True) works (stdout from the worker is wired through), but logger.info() doesn’t.

1. Best if you’re on DBR 18.x+: Python Worker Logs

If you can run DBR 18.0+ classic or 18.2+ shared/serverless, just enable Python worker logging and use normal logging in the UDF:

 
spark.conf.set("spark.sql.pyspark.worker.logging.enabled", "true")

def my_group_fn(pdf):
    import logging
    logger = logging.getLogger("my.udf")
    logger.setLevel(logging.INFO)
    logger.info(f"group size: {len(pdf)}")
    ...
    return out_pdf

# after the job
logs = spark.tvf.python_worker_logs()
logs.filter("logger = 'my.udf'").select("level", "msg", "context").show(truncate=False)

2. Works everywhere: configure logging inside the UDF

Your setup_worker_logging() approach is a solid production pattern:

import logging, sys

def setup_worker_logging(level=logging.INFO):
    logging.basicConfig(
        level=level,
        format='[%(levelname)s] [%(asctime)s] %(message)s',
        handlers=[logging.StreamHandler(sys.stdout)],
        force=True,
    )

def grouped_fn(pdf):
    setup_worker_logging()
    logger = logging.getLogger(__name__)
    logger.info("visible now")
    ...
    return out_pdf

 I hope this helps! 

Sarah