01-17-2024 09:27 AM
I have a DLT pipeline script that starts by extracting metadata on the tables it should generate from a delta table. Each record returned from the table should be a dlt table to generate, so I use .collect() to turn each row into a list and then iterate on calling my DLT pipeline logic on it. I don't use .collect() in any dlt functions or functions that have a dlt decorator on them. When I run the pipeline I get a warning
Notebook:/<my path here>/dlt_meta_pipeline used `DataFrame.collect` function that will be deprecated soon. Please fix the notebook.
Delta Live Tables job fails when using collect() - Databricks
based on the above post I think I should only be seeing this if I'm using collect to return results that I want to instantiate as a dlt managed table. Is this warning in error or do I actually have to change something?
01-17-2024 01:53 PM
Hi @ipreston,
Could you share the DBR version please? also share the full stack message.
01-17-2024 01:57 PM
It's DLT Stable, so whatever DBR that's using under the hood. Here's the JSON of the warning with UIDs etc redacted
{
"id": "redacted",
"sequence": {
"data_plane_id": {
"instance": "execution",
"seq_no": redacted
},
"control_plane_seq_no": redacted
},
"origin": {
"cloud": "Azure",
"region": "canadacentral",
"org_id": redacted,
"pipeline_id": "redacted",
"pipeline_type": "WORKSPACE",
"pipeline_name": "redacted",
"cluster_id": "redacted",
"update_id": "redacted",
"request_id": "redacted",
"uc_resource_id": "redacted"
},
"timestamp": "2024-01-17T15:15:47.639Z",
"message": "Notebook:/redacted/dlt_meta_pipeline used `DataFrame.collect` function that will be deprecated soon. Please fix the notebook.",
"level": "WARN",
"details": {
"unsupported_operation": {
"operation": "COLLECT_TO_DRIVER"
}
},
"event_type": "unsupported_operation",
"maturity_level": "STABLE"
}
01-17-2024 02:02 PM
The code is a fork of dlt-meta. Here's where it does the same operation: https://github.com/databrickslabs/dlt-meta/blob/2a93dd9ae42dfdb167b73629bd1acc8a256e7a49/src/dataflo...
01-17-2024 04:14 PM
Thank you for sharing more details. In this case, this is a warning message level ("level": "WARN",), so it should be fine as long as you are not getting FATAL error level.
01-22-2024 08:01 AM
Thanks for the reply. Based on that response though, it seems like the warning itself is a bug in the DLT implementation. Per the docs "However, you can include these functions outside of table or view function definitions because this code is run once during the graph initialization phase." Is there a way to report this issue upstream? I'm not concerned about my code failing as a result of this warning, but I'd like to avoid having false positive alerts in the pipeline as it increases the risk that I'll miss an important warning amidst irrelevant ones like this.
05-08-2024 11:31 AM - edited 05-08-2024 12:00 PM
Are you using Take() or First() in your code? I was using collect and get the same warning, but have since changed to Take() and still get the warning.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group