cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Possible false positive warning on DLT pipeline

ipreston
New Contributor III

I have a DLT pipeline script that starts by extracting metadata on the tables it should generate from a delta table. Each record returned from the table should be a dlt table to generate, so I use .collect() to turn each row into a list and then iterate on calling my DLT pipeline logic on it. I don't use .collect() in any dlt functions or functions that have a dlt decorator on them. When I run the pipeline I get a warning

 

Notebook:/<my path here>/dlt_meta_pipeline used `DataFrame.collect` function that will be deprecated soon. Please fix the notebook.

 

Delta Live Tables job fails when using collect() - Databricks

based on the above post I think I should only be seeing this if I'm using collect to return results that I want to instantiate as a dlt managed table. Is this warning in error or do I actually have to change something?

6 REPLIES 6

jose_gonzalez
Databricks Employee
Databricks Employee

Hi @ipreston,

Could you share the DBR version please? also share the full stack message.

ipreston
New Contributor III

It's DLT Stable, so whatever DBR that's using under the hood. Here's the JSON of the warning with UIDs etc redacted

{
"id": "redacted",
"sequence": {
"data_plane_id": {
"instance": "execution",
"seq_no": redacted
},
"control_plane_seq_no": redacted
},
"origin": {
"cloud": "Azure",
"region": "canadacentral",
"org_id": redacted,
"pipeline_id": "redacted",
"pipeline_type": "WORKSPACE",
"pipeline_name": "redacted",
"cluster_id": "redacted",
"update_id": "redacted",
"request_id": "redacted",
"uc_resource_id": "redacted"
},
"timestamp": "2024-01-17T15:15:47.639Z",
"message": "Notebook:/redacted/dlt_meta_pipeline used `DataFrame.collect` function that will be deprecated soon. Please fix the notebook.",
"level": "WARN",
"details": {
"unsupported_operation": {
"operation": "COLLECT_TO_DRIVER"
}
},
"event_type": "unsupported_operation",
"maturity_level": "STABLE"
}

ipreston
New Contributor III

The code is a fork of dlt-meta. Here's where it does the same operation: https://github.com/databrickslabs/dlt-meta/blob/2a93dd9ae42dfdb167b73629bd1acc8a256e7a49/src/dataflo...

jose_gonzalez
Databricks Employee
Databricks Employee

Thank you for sharing more details. In this case, this is a warning message level ("level": "WARN",), so it should be fine as long as you are not getting FATAL error level. 

ipreston
New Contributor III

Thanks for the reply. Based on that response though, it seems like the warning itself is a bug in the DLT implementation. Per the docs "However, you can include these functions outside of table or view function definitions because this code is run once during the graph initialization phase." Is there a way to report this issue upstream? I'm not concerned about my code failing as a result of this warning, but I'd like to avoid having false positive alerts in the pipeline as it increases the risk that I'll miss an important warning amidst irrelevant ones like this.

Bal
New Contributor II

Are you using Take() or First() in your code?  I was using collect and get the same warning, but have since changed to Take() and still get the warning.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group