I have a query with some grouping. I'm using spark.sql to run that query.
skus = spark.sql('with cte as (select... group by all) select *, .. from cte group by all')It displays as expected table.
This table I want to split into batches for processing, `rows_per_batch` in each batch
It displays some random garbage in `batch_id` column once grouped:

If I dump `batch_id` on its own, it will display expected values 0 to 3. No big numbers like "1041204193" above

If I do select distinct, I will get garbage again:

The only solution, albeit I hope temporary, I found so far is to cast original dataset into Pandas and back.
skus_pdf = skus.toPandas()
skus = spark.createDataFrame(skus_pdf)
Once I include this, everything starts working, no junk numbers.

So why spark dataframe from query fails to aggregate correctly?
I tried on both serverless and dedicated, same outcome.
Please someone advise