Databricks Community

jm99 · ‎01-13-2023

Most python examples show the structure of the foreachBatch method as:

def foreachBatchFunc(batchDF, batchId):
    batchDF.createOrReplaceTempView('viewName')
    
    (
        batchDF
            ._jdf.sparkSession()
            .sql(
                """
   <<   merge statement >>
 
               """
    )

._jdf.sparkSession().sql() returns a java object not a dataframe

How do you get access to the results dataframe containing the (affected, inserted, updated, deleted) row counts?

jm99 · ‎01-13-2023

Just found a solution...

Need to convert the Java Dataframe (jdf) to a DataFrame

from pyspark import sql
 
def batchFunc(batchDF, batchId):
  batchDF.createOrReplaceTempView('viewName')
  sparkSession = batchDF._jdf.sparkSession()
 
  resJdf = sparkSession .sql('merge statement')
 
  resultDf = sql.DataFrame(resJdf, batchDF.sql_ctx)
  firstRow = resultDf.first()
 
  insertedRowCount = 0
  updatedRowCount = 0
  deletedRowCount = 0
  affectedRowCount = 0
 
  if firstRow:
      if ('num_affected_rows' in resultDf .columns):
          affectedRowCount += firstRow['num_affected_rows']
      if ('num_inserted_rows' in resultDf .columns):
          insertedRowCount += firstRow['num_inserted_rows']
      if ('num_updated_rows' in resultDf .columns):
          updatedRowCount += firstRow['num_updated_rows']
      if ('num_deleted_rows' in resultDf .columns):
          deletedRowCount += firstRow['num_deleted_rows']  
 
  # do what you want with the counts!

View solution in original post

jm99 · ‎01-13-2023

Just found a solution...

Need to convert the Java Dataframe (jdf) to a DataFrame

from pyspark import sql
 
def batchFunc(batchDF, batchId):
  batchDF.createOrReplaceTempView('viewName')
  sparkSession = batchDF._jdf.sparkSession()
 
  resJdf = sparkSession .sql('merge statement')
 
  resultDf = sql.DataFrame(resJdf, batchDF.sql_ctx)
  firstRow = resultDf.first()
 
  insertedRowCount = 0
  updatedRowCount = 0
  deletedRowCount = 0
  affectedRowCount = 0
 
  if firstRow:
      if ('num_affected_rows' in resultDf .columns):
          affectedRowCount += firstRow['num_affected_rows']
      if ('num_inserted_rows' in resultDf .columns):
          insertedRowCount += firstRow['num_inserted_rows']
      if ('num_updated_rows' in resultDf .columns):
          updatedRowCount += firstRow['num_updated_rows']
      if ('num_deleted_rows' in resultDf .columns):
          deletedRowCount += firstRow['num_deleted_rows']  
 
  # do what you want with the counts!

Databricks Community

ForeachBatch() - Get results from batchDF._jdf.sparkSession().sql('merge stmt')

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples