While loading data from one layer to another layer using pyspark window function, I noticed that some data is missing. This is happening if the data is huge. It's not happening for small quantity. Does anyone come across this issue before?
I tried repartitioning and renaming dataframe name for each transformation. Still it's showing missing records. Please let me know if you have any other suggestion.
I have a dataframe with key, eff date, end date... I want to use a window function with lag option to populate previous end date... I am partitioning by the key and order by the effective date. But I am seeing count diference.