Data Engineering

Forum Posts

Sorted by:

by alejandrofm • Valued Contributor

03-31-2022 7:39:01 AM

6350 Views
8 replies
9 kudos

Resolved! Pandas.spark.checkpoint() doesn't broke lineage

Hi, I'm doing some something simple on Databricks notebook:spark.sparkContext.setCheckpointDir("/tmp/") import pyspark.pandas as ps sql=("""select field1, field2 From table Where date>='2021-01.01""") df = ps.sql(sql) df.spark.checkpoint()That...

Data Engineering

6350 Views
8 replies
9 kudos

03-31-2022 7:39:01 AM

View Replies

Latest Reply

annafina
New Contributor II

11-21-2024 6:34:04 AM

9 kudos

checkpoint() returns a checkpointed DataFrame, so you need to assign it to a new variable:checkpointedDF = df.checkpoint()

9 kudos

11-21-2024 6:34:04 AM

7 More Replies

by Maverick1 • Valued Contributor II

09-09-2021 11:05:22 PM

5226 Views
10 replies
9 kudos

Resolved! Lineage between model and source code breaks on movement of source notebook. How to rectify it?

If there is a registered model and it is linked with a notebook, then the lineage breaks if you move the notebook to a different path or even pull/upload a new version of the notebook.This is not good because when someone doing its development/testin...

Data Engineering

5226 Views
10 replies
9 kudos

09-09-2021 11:05:22 PM

View Replies

Latest Reply

sean_owen
Databricks Employee

09-15-2021 4:50:53 PM

9 kudos

I also cannot reproduce this, with these exact steps (I think). After moving the notebook and moving it back, the link to it (and link to the revision) still works as expected. You are using MLflow built in to Databricks right?

9 kudos

09-15-2021 4:50:53 PM

9 More Replies

by Srikanth_Gupta_ • Databricks Employee

06-18-2021 10:38:37 AM

1465 Views
1 replies
0 kudos

How is Data lineage achieved in Delta lake starting from source -> Bronze -> Silver -> Gold layers

Data Engineering

1465 Views
1 replies
0 kudos

06-18-2021 10:38:37 AM

View Replies

Latest Reply

craig_ng
New Contributor III

06-18-2021 10:52:25 AM

0 kudos

Delta Live Tables offers built-in data lineage between tables and views defined in a pipeline, which allows for easier monitoring and simplified recovery

0 kudos

06-18-2021 10:52:25 AM

Databricks Community

Resolved! Pandas.spark.checkpoint() doesn't broke lineage

Resolved! Lineage between model and source code breaks on movement of source notebook. How to rectify it?

How is Data lineage achieved in Delta lake starting from source -> Bronze -> Silver -> Gold layers