cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

alejandrofm
by Valued Contributor
  • 4507 Views
  • 7 replies
  • 9 kudos

Resolved! Pandas.spark.checkpoint() doesn't broke lineage

Hi, I'm doing some something simple on Databricks notebook:spark.sparkContext.setCheckpointDir("/tmp/")   import pyspark.pandas as ps   sql=("""select field1, field2 From table Where date>='2021-01.01""")   df = ps.sql(sql) df.spark.checkpoint()That...

  • 4507 Views
  • 7 replies
  • 9 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 9 kudos

If you need checkpointing, please try the below code. Thanks to persist, you will avoid reprocessing:df = ps.sql(sql).persist() df.spark.checkpoint()

  • 9 kudos
6 More Replies
RohanB
by New Contributor III
  • 3959 Views
  • 8 replies
  • 3 kudos

Resolved! Spark Streaming - Checkpoint State EOF Exception

I have a Spark Structured Streaming job which reads from 2 Delta tables in streams , processes the data and then writes to a 3rd Delta table. The job is being run with the Databricks service on GCP.Sometimes the job fails with the following exception...

  • 3959 Views
  • 8 replies
  • 3 kudos
Latest Reply
RohanB
New Contributor III
  • 3 kudos

Hi @Jose Gonzalez​ ,Do you require any more information regarding the code? Any idea what could be cause for the issue?Thanks and Regards,Rohan

  • 3 kudos
7 More Replies
Labels