In section 2 of the exam guide, there are a number of topics mentioned pertaining to ETL with Apache Spark. It's safe to assume that most of these will be asked with pyspark syntax, so you can focus on the specific areas in that section.
You can use the system table for this, system.query.history, which is now in public preview: https://docs.databricks.com/aws/en/admin/system-tables/query-history
It contains the full query history for warehouses, including status, executer, query tex...
You should review & practice common pyspark syntax around ingestion and transformation. Additionally, you should review practice exams and questions. You can find some on Udemy, or use this online resource someone has created that has a number of pra...
I recently passed the DE professional, and happy to shed some light here.
1. There is a fair amount of overlap in the content, but I can't for certain if there are specific question overlaps. The questions follow a similar theme, but the professional...
It will depend on the transformations and how you're loading them. Assuming it's mostly in spark, I recommend starting small using a job compute cluster with autoscaling enabled for cost efficiency. For daily loads (6 million records), a driver and 2...