a month ago - last edited a month ago
Hi Databricks Community. I need some suggestions on my issue. Basically we are using databricks asset bundle to deploy our forecasting repo and using aws nodes to run the forecast jobs. We built proper workflow.yml file to trigger the jobs.
Below is my workflow and cluster settings being used. Let me know if there is something needs to be change or tuned. Tagging @Shua42 , because you also helped me before. Thanks in advance.
dev:
resources:
clusters:
dev_cluster: &dev_cluster
num_workers: 0
kind: CLASSIC_PREVIEW
is_single_node: true
spark_version: 14.3.x-scala2.12
node_type_id: r6i.4xlarge
custom_tags:
clusterSource: ts-forecasting-2
ResourceClass: SingleNode
data_security_mode: SINGLE_USER
enable_elastic_disk: true
enable_local_disk_encryption: false
autotermination_minutes: 20
docker_image:
url: "*****.amazonaws.com/dev-databricks:retailforecasting-latest"
aws_attributes:
availability: SPOT
instance_profile_arn: ****
ebs_volume_type: GENERAL_PURPOSE_SSD
ebs_volume_count: 1
ebs_volume_size: 50
spark_conf:
spark.databricks.cluster.profile: singleNode
spark.memory.offHeap.enabled: false
spark.driver.memory: 4g
3 weeks ago
3 weeks ago
Hi @harishgehlot_03
Good day!
May I know what the time was in the second case using a r6i.4xlarge instance type?
3 weeks ago
3 weeks ago
Hi @Raghavan93513 , Let me know if any spark.conf I can set or something else which will help me to utilize more proportion of memory instead of limiting itself. Note: this is pandas workflow (not using spark till now)
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now