cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

DLT pipeline production deployment with AWS

satycse06
New Contributor

Hi,

We are currently working with pyspark where we are doing the ETL as well as data quality check. We are converting our code in to the wheel package to go in to the prod and manage the version for better stability.

We are running  Databricks with AWS cloud and using the azure devops pipeline for the deployment.

Since, It's not possible to change the DLT pipeline in to the wheel package and currently Databricks not support the git authentication with SPN on AWS platform so the question is how to do it if we want to move the DLT pipeline from Dev to Prod with maintaing the version and more stable in terms of Prod env.

The one way to download the code/DLT pipeline in volume then run it but this will be bit complex to download eveyrtime and then run it.

The flow will be like Wheel package >> DLT pipeline (point to the volume)

Is there any other enterprise solution to do the same?

Regards,

Satya

 

1 ACCEPTED SOLUTION

Accepted Solutions

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @satycse06,

Have you considered Declarative Automation Bundles (Previously called Databricks Asset bunddles) for this? This is exactly the type of problem it solves.

You can still keep your DLT code and a databricks.yml bundle file in Azure DevOps Git. Then, use the bundle to declare your DLT pipeline as a resource with separate dev/prod targets and perโ€‘environment overrides. And then, have an Azure DevOps pipeline check out that repository and run a bundle deployment against your Databricks on AWS workspace.

That way... your PySpark/DQ logic can still be packaged as a wheel and stored in a volume or S3. The DLT pipeline just points at that wheel path. This will get rid of all the manual workloads and you can promote from lower to higher environments with Git & CI/CD action.

This is a recommended CI/CD pattern for jobs and DLT on Databricks today, and it works fine with Azure DevOps even when your Databricks workspace runs on AWS.

Check this official documentation too. Very relevant to your requirement. 

You can also take a look at a relevant community post explaining this in even more detail. 

Hope this helps. Let me know if you have any specific questions.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post

1 REPLY 1

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @satycse06,

Have you considered Declarative Automation Bundles (Previously called Databricks Asset bunddles) for this? This is exactly the type of problem it solves.

You can still keep your DLT code and a databricks.yml bundle file in Azure DevOps Git. Then, use the bundle to declare your DLT pipeline as a resource with separate dev/prod targets and perโ€‘environment overrides. And then, have an Azure DevOps pipeline check out that repository and run a bundle deployment against your Databricks on AWS workspace.

That way... your PySpark/DQ logic can still be packaged as a wheel and stored in a volume or S3. The DLT pipeline just points at that wheel path. This will get rid of all the manual workloads and you can promote from lower to higher environments with Git & CI/CD action.

This is a recommended CI/CD pattern for jobs and DLT on Databricks today, and it works fine with Azure DevOps even when your Databricks workspace runs on AWS.

Check this official documentation too. Very relevant to your requirement. 

You can also take a look at a relevant community post explaining this in even more detail. 

Hope this helps. Let me know if you have any specific questions.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***