De facto Standard for Databricks on AWS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-13-2025 10:17 PM
Hello,
I am working on creating an architecture diagram for Databricks on AWS.
I would like to adopt the de facto standard used by enterprises. Based on my research, I have identified the following components:
- Network: Customer-managed VPC,Secure Cluster Connectivity (SCC)
- Data Storage: Delta Lake (S3)
- Data Catalog: Unity Catalog
- Data Pipeline: FiveTran (ETL), dbt (Data Transformation)
- Query Engine: Photon (SQL Acceleration)
- Security: IAM + Unity Catalog (RBAC)
- Monitoring & Operations: AWS CloudWatch,Databricks Audit Logs
- If there are any other important aspects I should consider, please let me know.
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-14-2025 05:18 AM
I would not call it a 'standard' but a possible architecture. The great thing about the cloud is you can complete the puzzle in many ways and make it as complex or as easy as possible.
Also I would not consider Fivetran to be standard in companies. It is pretty expensive and there are a lot of alternatives available at lower cost (but perhaps a tad more work).
For transformation, what about Databricks itself?
You also need orchestration (or perhaps that is what you mean by pipelines).
The whole Machine Learning part is skipped, so you might wanna look into that.
And what about devops/cicd?

