cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT CDC SCD-1 pipeline not showing stats when reading from parquet file

JVesely
New Contributor III

Hi,

I followed the tutorial here: https://docs.databricks.com/en/delta-live-tables/cdc.html#how-is-cdc-implemented-with-delta-live-tab...

The only change I did is that data is not appended to a table but is read from a parquet file. In practice this means:

Original:

@dlt.view
def users():
  return spark.readStream.format("delta").table("cdc_data.users")

My code

 

 

@dlt.view
def vcp_analyte_source():
  return (
    spark.readStream.format("cloudFiles") \
      .option("cloudFiles.format", "parquet") \
      .option("cloudFiles.schemaEvolutionMode", "none") \
      .schema(vcp_analytes_schema) \
      .load(vcp_analytes_data_path)
  )

 

 

This works, the hidden "_apply_changes_storage_" table is filled with data from the parquet files and the resulting "gold" view gives expected number of records.

However, when I go to the delta live tables dashboard where the streaming tables are rendered (see attached file), the number of "upserted" and "deleted" records is not available even though 2000 records have been ingested. 

Is that a "feature" of working with parquet files, known bug or something I have to enable elsewhere? If this is how it is, is there anywhere else to look for good record ingestion performance statistics? 

Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions

JVesely
New Contributor III

My bad - waiting a bit and doing a proper screen refresh does show the numbers. 

View solution in original post

1 REPLY 1

JVesely
New Contributor III

My bad - waiting a bit and doing a proper screen refresh does show the numbers. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group