cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Low Level Design for Moving Data from Databricks A to Databricks B

Pratikmsbsvm
Contributor

Hello Techie,

May someone please help me with Low level design point what all we should considered while moving data from One Delta lake instance to another delta lake.

For example :-

  1. Service principle creation.
  2. IP Whitelisting.
  3. Any gitlab / devops related support.

Diagram :

 

Pratikmsbsvm_1-1753184957933.png

I am trying to Build pipeline which bring data from A to B.

Please help.

 

1 ACCEPTED SOLUTION

Accepted Solutions

lingareddy_Alva
Honored Contributor III

Hi @Pratikmsbsvm 

Here's a brief low-level design checklist for Delta Lake to Delta Lake data migration:

1. Security & Authentication
- Create service principals for both environments
- Set up Azure Key Vault for credential management
- Configure IP whitelisting and VNet peering
- Enable private endpoints for storage accounts

2. DataBricks Setup
- Configure cross-workspace authentication
- Set up appropriate cluster sizing and auto-scaling
- Install required libraries and dependencies
- Configure Unity Catalog for governance

3. Pipeline Design
- Implement incremental data loading (CDC/watermark-based)
- Set up proper error handling and retry logic
- Configure checkpointing for streaming jobs
- Implement data validation and quality checks

4. DevOps Integration
- GitLab CI/CD pipelines for automated deployment
- Infrastructure as Code (Terraform/ARM)
- Environment-specific configurations (dev/test/prod)
- Automated testing and rollback strategies

5. Monitoring & Operations
- Azure Monitor integration
- Pipeline failure alerts
- Data freshness monitoring
- Performance metrics tracking

6. Data Considerations
- Schema evolution handling
- Partitioning strategy optimization
- Z-ordering for query performance
- Proper table versioning and retention policies

7. Network & Compliance
- NSG rules for secure communication
- Data encryption in transit and at rest
- Audit logging and data lineage
- RBAC implementation

This covers the essential components for a production-ready Delta Lake migration pipeline.

 

LR

View solution in original post

1 REPLY 1

lingareddy_Alva
Honored Contributor III

Hi @Pratikmsbsvm 

Here's a brief low-level design checklist for Delta Lake to Delta Lake data migration:

1. Security & Authentication
- Create service principals for both environments
- Set up Azure Key Vault for credential management
- Configure IP whitelisting and VNet peering
- Enable private endpoints for storage accounts

2. DataBricks Setup
- Configure cross-workspace authentication
- Set up appropriate cluster sizing and auto-scaling
- Install required libraries and dependencies
- Configure Unity Catalog for governance

3. Pipeline Design
- Implement incremental data loading (CDC/watermark-based)
- Set up proper error handling and retry logic
- Configure checkpointing for streaming jobs
- Implement data validation and quality checks

4. DevOps Integration
- GitLab CI/CD pipelines for automated deployment
- Infrastructure as Code (Terraform/ARM)
- Environment-specific configurations (dev/test/prod)
- Automated testing and rollback strategies

5. Monitoring & Operations
- Azure Monitor integration
- Pipeline failure alerts
- Data freshness monitoring
- Performance metrics tracking

6. Data Considerations
- Schema evolution handling
- Partitioning strategy optimization
- Z-ordering for query performance
- Proper table versioning and retention policies

7. Network & Compliance
- NSG rules for secure communication
- Data encryption in transit and at rest
- Audit logging and data lineage
- RBAC implementation

This covers the essential components for a production-ready Delta Lake migration pipeline.

 

LR