cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Live Table Graph different with no change in Notebook code

Manjula_Ganesap
Contributor

I have a DLT code to create 40+ bronze tables. The tables are created on top of the latest parquet files for each of those tables. 

While executing the pipeline, sometimes I notice that the graph is different than the regular one i see. I do not understand why that happens. 

Graph i expect to see:

Manjula_Ganesap_0-1694001824330.png

Graph i see sometimes:

Manjula_Ganesap_1-1694001865082.png

As you can see from the screenshots - only 2 tables were created in the second run while all 40+ tables were created in the first run. The underlying parquet files were the same in both cases. 

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @Manjula_Ganesap , 

• The graph in the pipeline represents the data flow and is based on table dependencies.
• Changes in table dependencies can cause the graph to look different.
• Delta Live Tables manages table dependencies and updates the graph accordingly.
• Delta Live Tables supports streaming tables, materialized views, and views.
• Streaming tables are for append-only data sources and incremental data processing.
• Materialized views precompute results and handle changes in input.
• Views compute results as they are queried and are helpful for intermediate queries.
• The graph visually represents the dependencies and data flow between tables.
• Changes in table dependencies or table type can affect the graph.

Manjula_Ganesap
Contributor

@Kaniz_Fatma  - Thank you for your response. There is no change in the table dependencies. 

The code to create the individual raw tables look like this: The input to this is always the same 40 tables with only the underlying parquet file changing. I cant understand why it creates 40 tables in the first run and then only 2 tables in the second run.

def CreateTable(tableSchema,tableName, tableFilePath):
  schemaTableName = 'test_dlt_'+tableName.lower()
  @dlt.table(
    name= schemaTableName,
    comment="Raw data capture for " + tableName,
    table_properties={
      "quality": "bronze",
      "pipelines.autoOptimize.managed": "true"
    }
   )
  def create_live_table():
    return (
     (spark.read.format("parquet").load(tableFilePath))
    )  

Hi @Manjula_GanesapThe behaviour of your code could be influenced by various factors, such as the state of your data, the specific operations you're performing, and the configuration of your environment. 

From the provided information, here are a few points that might be relevant to your situation:-

Delta tables always return the most up-to-date information, so there is no need to call REFRESH TABLE manually after changes. This is handled automatically [source](https://docs.databricks.com/delta/best-practices.html).


- Delta tables track the set of partitions present in a table and update the list as data is added or removed, so there's no need to run ALTER TABLE [ADD|DROP] PARTITION or MSCK .[source](https://docs.databricks.com/delta/best-practices.html).
- Directly modifying, adding, or deleting Parquet data files in a Delta table can lead to lost data or table corruption [source](https://docs.databricks.com/delta/best-practices.html).
- If best practices for Delta tables are not followed, table statistics could differ, even if the tables are identical. If the table statistics are different, Spark can generate a different plan than it might have done if both tables had the same statistics [source](https://kb.databricks.com/delta/different-tables-with-same-data-generate-different-plans-when-used-i...).
 
However, it's impossible to provide a more accurate answer without more specific information about your code and the operations you're performing.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group