cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Solution Design Recommendation on Databricks

tyhatwar785
New Contributor

Hi Team,

We need to design a pipeline in Databricks to:

1. Call a metadata API (returns XML per keyword), parse, and consolidate into a combined JSON.

2. Use this metadata to generate dynamic links for a second API, download ZIPs, unzip, and extract specific HTML files into ADLS.

Looking for suggestions on: Solution design – should metadata and file download be separate jobs/notebooks or combined?

Cluster recommendations – what type/size of cluster is suitable for this workload?

Parallelism – should we use Python async (aiohttp) or Spark parallelism for faster execution?

Best practices – retries, error handling, checkpointing for flaky APIs. Would appreciate guidance on how to design this efficiently.

Thanks!

1 REPLY 1

nikhilmohod-nm
New Contributor III

Hi @tyhatwar785 


1. Should metadata and file download be separate jobs/notebooks or combined?
Keep them in separate notebooks but orchestrate them under a single Databricks Job.
for better error handling, and retries .

2. Cluster recommendations
start with a general-purpose cluster( Standard_DS4_v2 (28 GB memory, 8 vCPU) ) with autoscaling enabled

3. Parallelism
If all processing is inside Databricks

4. Best practices

Retries: Use Databricks Job-level retries and add custom retry logic using UDF

Error Handling: Use Python’s try/except with structured logging (logging library) for better observability.

Monitoring: Integrate with Databricks Lakehouse Monitoring or send metrics/logs

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now