cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

alejandrofm
by Valued Contributor
  • 2878 Views
  • 7 replies
  • 8 kudos

Resolved! Pandas.spark.checkpoint() doesn't broke lineage

Hi, I'm doing some something simple on Databricks notebook:spark.sparkContext.setCheckpointDir("/tmp/")   import pyspark.pandas as ps   sql=("""select field1, field2 From table Where date>='2021-01.01""")   df = ps.sql(sql) df.spark.checkpoint()That...

  • 2878 Views
  • 7 replies
  • 8 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 8 kudos

If you need checkpointing, please try the below code. Thanks to persist, you will avoid reprocessing:df = ps.sql(sql).persist() df.spark.checkpoint()

  • 8 kudos
6 More Replies
Maverick1
by Valued Contributor II
  • 2113 Views
  • 10 replies
  • 9 kudos

Resolved! Lineage between model and source code breaks on movement of source notebook. How to rectify it?

If there is a registered model and it is linked with a notebook, then the lineage breaks if you move the notebook to a different path or even pull/upload a new version of the notebook.This is not good because when someone doing its development/testin...

  • 2113 Views
  • 10 replies
  • 9 kudos
Latest Reply
sean_owen
Honored Contributor II
  • 9 kudos

I also cannot reproduce this, with these exact steps (I think). After moving the notebook and moving it back, the link to it (and link to the revision) still works as expected. You are using MLflow built in to Databricks right?

  • 9 kudos
9 More Replies
Srikanth_Gupta_
by Valued Contributor
  • 631 Views
  • 1 replies
  • 0 kudos
  • 631 Views
  • 1 replies
  • 0 kudos
Latest Reply
craig_ng
New Contributor III
  • 0 kudos

Delta Live Tables offers built-in data lineage between tables and views defined in a pipeline, which allows for easier monitoring and simplified recovery

  • 0 kudos
Labels