Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-05-2022 12:34 AM
DLT calls your functions several times - with empty dataframe when building the execution graph, and than with actual data when executing the pipeline.
But really, I believe that you can do what you want using the Dataframe APIs, in the worst case - resort to the Pandas UDFs, but you need to provide a description of what do you want to achieve. The reason for that is that RDD implementation will be very slow as each row is evaluated in the Python interpreter and Catalyst won't be able to perform any optimization