During migration to production workload, I switched some queries to use RocksDB. I am concerned with its memory usage though. Here is sample output from my streaming query: "stateOperators" : [ {
"operatorName" : "dedupeWithinWatermark",
"...
Thank you for the input. Is there any particular reason why deduplication watermark makes it store everything and not just the key needed for deduplication? The 1st record has to be written to the table anyway, and its content is irrelevant as it jus...
I had the same problem when starting with databricks. As outlined above, it is the shuffle partitions setting that results in number of files equal to number of partitions. Thus, you are writing low data volume but get taxed on the amount of write (a...