Hi @Arunsundar Muthumanickamโ ,
When you say workload, I believe you might be handling various volumes of data between Dev and Prod environment. If you are using Databricks cluster and do not have much idea on how the volumes might turn out in different environments, enabling Cluster Autoscaling with min and max workers would be an ideal choice as more workers might be added depending on your workloads(number of partitions).
If your workload has a shuffle phase i.e. joins, groupby, etc. please check if can tweak this number or you can set to auto so that the Spark optimizer can change them as per your partition sizes.
Below is some sample code, how you can get the distribution of data in your partitions.
from pyspark.sql.functions import spark_partition_id, asc, desc
df\
.withColumn("partitionId", spark_partition_id())\
.groupBy("partitionId")\
.count()\
.orderBy(asc("count"))\
.show()