Databricks Community

jm99 · ‎01-13-2023

Most python examples show the structure of the foreachBatch method as:

def foreachBatchFunc(batchDF, batchId):
    batchDF.createOrReplaceTempView('viewName')
    
    (
        batchDF
            ._jdf.sparkSession()
            .sql(
                """
   <<   merge statement >>
 
               """
    )

._jdf.sparkSession().sql() returns a java object not a dataframe

How do you get access to the results dataframe containing the (affected, inserted, updated, deleted) row counts?

jm99 · ‎01-13-2023

Just found a solution...

Need to convert the Java Dataframe (jdf) to a DataFrame

from pyspark import sql
 
def batchFunc(batchDF, batchId):
  batchDF.createOrReplaceTempView('viewName')
  sparkSession = batchDF._jdf.sparkSession()
 
  resJdf = sparkSession .sql('merge statement')
 
  resultDf = sql.DataFrame(resJdf, batchDF.sql_ctx)
  firstRow = resultDf.first()
 
  insertedRowCount = 0
  updatedRowCount = 0
  deletedRowCount = 0
  affectedRowCount = 0
 
  if firstRow:
      if ('num_affected_rows' in resultDf .columns):
          affectedRowCount += firstRow['num_affected_rows']
      if ('num_inserted_rows' in resultDf .columns):
          insertedRowCount += firstRow['num_inserted_rows']
      if ('num_updated_rows' in resultDf .columns):
          updatedRowCount += firstRow['num_updated_rows']
      if ('num_deleted_rows' in resultDf .columns):
          deletedRowCount += firstRow['num_deleted_rows']  
 
  # do what you want with the counts!

View solution in original post

jm99 · ‎01-13-2023

Just found a solution...

Need to convert the Java Dataframe (jdf) to a DataFrame

from pyspark import sql
 
def batchFunc(batchDF, batchId):
  batchDF.createOrReplaceTempView('viewName')
  sparkSession = batchDF._jdf.sparkSession()
 
  resJdf = sparkSession .sql('merge statement')
 
  resultDf = sql.DataFrame(resJdf, batchDF.sql_ctx)
  firstRow = resultDf.first()
 
  insertedRowCount = 0
  updatedRowCount = 0
  deletedRowCount = 0
  affectedRowCount = 0
 
  if firstRow:
      if ('num_affected_rows' in resultDf .columns):
          affectedRowCount += firstRow['num_affected_rows']
      if ('num_inserted_rows' in resultDf .columns):
          insertedRowCount += firstRow['num_inserted_rows']
      if ('num_updated_rows' in resultDf .columns):
          updatedRowCount += firstRow['num_updated_rows']
      if ('num_deleted_rows' in resultDf .columns):
          deletedRowCount += firstRow['num_deleted_rows']  
 
  # do what you want with the counts!

Databricks Community

ForeachBatch() - Get results from batchDF._jdf.sparkSession().sql('merge stmt')

Join Us as a Local Community Builder!

PSA: Community Edition retires on January 1, 2026. Move to the Free Edition today to keep your work.

🎤 Call for Presentations: Data + AI Summit 2026 is Open!

Last Chance: Help Shape the 2026 Data + AI Summit | Win a Full Conference Pass

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST