cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

alejandrofm
by Valued Contributor
  • 2877 Views
  • 7 replies
  • 8 kudos

Resolved! Pandas.spark.checkpoint() doesn't broke lineage

Hi, I'm doing some something simple on Databricks notebook:spark.sparkContext.setCheckpointDir("/tmp/")   import pyspark.pandas as ps   sql=("""select field1, field2 From table Where date>='2021-01.01""")   df = ps.sql(sql) df.spark.checkpoint()That...

  • 2877 Views
  • 7 replies
  • 8 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 8 kudos

If you need checkpointing, please try the below code. Thanks to persist, you will avoid reprocessing:df = ps.sql(sql).persist() df.spark.checkpoint()

  • 8 kudos
6 More Replies
RohanB
by New Contributor III
  • 2557 Views
  • 8 replies
  • 3 kudos

Resolved! Spark Streaming - Checkpoint State EOF Exception

I have a Spark Structured Streaming job which reads from 2 Delta tables in streams , processes the data and then writes to a 3rd Delta table. The job is being run with the Databricks service on GCP.Sometimes the job fails with the following exception...

  • 2557 Views
  • 8 replies
  • 3 kudos
Latest Reply
RohanB
New Contributor III
  • 3 kudos

Hi @Jose Gonzalez​ ,Do you require any more information regarding the code? Any idea what could be cause for the issue?Thanks and Regards,Rohan

  • 3 kudos
7 More Replies
Labels