cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Possible false positive warning on DLT pipeline

ipreston
New Contributor II

I have a DLT pipeline script that starts by extracting metadata on the tables it should generate from a delta table. Each record returned from the table should be a dlt table to generate, so I use .collect() to turn each row into a list and then iterate on calling my DLT pipeline logic on it. I don't use .collect() in any dlt functions or functions that have a dlt decorator on them. When I run the pipeline I get a warning

 

Notebook:/<my path here>/dlt_meta_pipeline used `DataFrame.collect` function that will be deprecated soon. Please fix the notebook.

 

Delta Live Tables job fails when using collect() - Databricks

based on the above post I think I should only be seeing this if I'm using collect to return results that I want to instantiate as a dlt managed table. Is this warning in error or do I actually have to change something?

5 REPLIES 5

jose_gonzalez
Moderator
Moderator

Hi @ipreston,

Could you share the DBR version please? also share the full stack message.

ipreston
New Contributor II

It's DLT Stable, so whatever DBR that's using under the hood. Here's the JSON of the warning with UIDs etc redacted

{
"id": "redacted",
"sequence": {
"data_plane_id": {
"instance": "execution",
"seq_no": redacted
},
"control_plane_seq_no": redacted
},
"origin": {
"cloud": "Azure",
"region": "canadacentral",
"org_id": redacted,
"pipeline_id": "redacted",
"pipeline_type": "WORKSPACE",
"pipeline_name": "redacted",
"cluster_id": "redacted",
"update_id": "redacted",
"request_id": "redacted",
"uc_resource_id": "redacted"
},
"timestamp": "2024-01-17T15:15:47.639Z",
"message": "Notebook:/redacted/dlt_meta_pipeline used `DataFrame.collect` function that will be deprecated soon. Please fix the notebook.",
"level": "WARN",
"details": {
"unsupported_operation": {
"operation": "COLLECT_TO_DRIVER"
}
},
"event_type": "unsupported_operation",
"maturity_level": "STABLE"
}

ipreston
New Contributor II

The code is a fork of dlt-meta. Here's where it does the same operation: https://github.com/databrickslabs/dlt-meta/blob/2a93dd9ae42dfdb167b73629bd1acc8a256e7a49/src/dataflo...

jose_gonzalez
Moderator
Moderator

Thank you for sharing more details. In this case, this is a warning message level ("level": "WARN",), so it should be fine as long as you are not getting FATAL error level. 

ipreston
New Contributor II

Thanks for the reply. Based on that response though, it seems like the warning itself is a bug in the DLT implementation. Per the docs "However, you can include these functions outside of table or view function definitions because this code is run once during the graph initialization phase." Is there a way to report this issue upstream? I'm not concerned about my code failing as a result of this warning, but I'd like to avoid having false positive alerts in the pipeline as it increases the risk that I'll miss an important warning amidst irrelevant ones like this.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.