Azure Databricks to GCP Databricks Migration
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-27-2024 08:04 PM
Hi Team, Can you provide your thoughts on moving Databricks from Azure to GCP? What services are required for the migration, and are there any limitations on GCP compared to Azure? Also, are there any tools that can assist with the migration? Please share any documents or insights you have on this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-03-2025 05:39 AM
Hi @Phani1 , great catching you in the community!!
The complexity of migration largely depends on how deeply integrated the workspace is with the Azure ecosystem. The greatest benefit comes when most of your enterprise's cloud services are already hosted on GCP.
Key Considerations:
- Consider Azure or GCP pricing and discount systems carefully.
- Consider the time and effort required for infrastructure deployment.
- All existing Databricks integrations with Azure systems need evaluation, as some connections may not work after migration, and require alternative solutions
- Use Storage Transfer Service for lift-and-shift migration or Delta Share for data cloning.
- Some tables need temporary parallel pipelines during migration (e.g., using Fivetran).
2. Notebooks and jobs
- Review logic thoroughly, particularly for external service connections.
- Leverage CLI or REST API for automated migration.
3. Security and governance
- Entra ID, Azure's primary identity service, isn't available in GCP—plan for manual migration of governance rules.
While most Azure services have GCP equivalents, migrating each service and setting up new credentials takes considerable time. Additionally, Azure's more mature Databricks ecosystem means GCP might need extra configuration or third-party tools to match functionality. Databricks deploys new features to Azure before GCP.
Technical Partnerships Lead | SunnyData
P: (598) 95-974-524
E: eliana.oviedo@sunnydata.ai
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-03-2025 05:51 AM
Hello Team,
Adding to @sunnydata comments:
Moving Databricks from Azure to GCP involves several steps and considerations. Here are the key points based on the provided context:
- Services Required for Migration:
- Cloud Storage Data: Use GCP’s Storage Transfer Service (STS) to migrate data from Azure Blob Storage to Google Cloud Storage (GCS). This tool is recommended for its efficiency and throughput.
- External Hive Metastore: If using an external Hive metastore, migrate it to GCP’s CloudSQL. This involves modifying the reference paths of the table files from Azure to GCP.
- Workspace Asset Migration: This process is mostly manual. Tools like the Databricks migration tool (available on GitHub) can assist with migrating certain assets, but it is not recommended for cross-cloud migrations.
- Limitations on GCP Compared to Azure:
- Networking: GCP has different networking configurations and requirements compared to Azure. For example, VPC Service Controls and firewall rules need to be adapted.
- Service Parity: Ensure that the services used in Azure have equivalent services in GCP. Some services might have different configurations or limitations.
- Billing and Monitoring: GCP has a different billing and monitoring setup compared to Azure. Familiarize yourself with GCP’s billing and monitoring tools.
- Tools to Assist with Migration:
- Storage Transfer Service (STS): For migrating data from Azure Blob Storage to GCS.
- Databricks Migration Tool: Available on GitHub, this tool can help with migrating workspace assets, though it is more suited for intra-cloud migrations.
- Terraform: Can be used to manage infrastructure as code and facilitate the setup of resources in GCP.
- Best Practices:
- Profile Each Workspace: Determine what will be rehosted, refactored, or retired.
- Plan the Migration: Based on the profiling information, plan the migration steps.
- Migrate Data and Metadata: Use recommended tools to migrate data, metadata, and workspace objects.
- Reconfigure Access Control: Reapply access control settings manually as there are no public APIs for this.
- Additional Considerations:
- Egress Costs: Be aware of potential egress costs when transferring data between clouds.
- Security Approvals: Ensure security approvals are in place for creating resources in GCP.
- Service Accounts and IAM Roles: Set up appropriate service accounts and IAM roles in GCP.

