cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

DQX usage outside Databricks

wkgcls
Visitor

Hello,
 When evaluating data quality frameworks for PySpark pipelines, I came across DQX. I noticed it's available on PyPI (databricks-labs-dqx) and GitHub, which is great for accessibility.

However, I'm trying to understand the licensing requirements. The LICENSE information on PyPI mentions it as "Other/Proprietary", and the license file on GitHub states that the materials can only be used "in connection with your use of the Databricks Services."

I would like to understand,

  1. Does this mean DQX requires a Databricks platform agreement even when used in other Spark environments (e.g., AWS EMR, Dataproc, or standalone Spark clusters)?
  2. If a Databricks agreement is required, are there specific licensing options for using Labs projects outside the Databricks platform?
  3. Are there any recommended alternatives for data quality validation in non-Databricks PySpark environments that you'd suggest?

I want to ensure we're compliant with all licensing requirements before adopting any framework. Any clarification would be greatly appreciated.

Thanks in advance!
Kiran

1 ACCEPTED SOLUTION

Accepted Solutions

ManojkMohan
Honored Contributor II

@wkgcls DQX requires a Databricks platform license for production use, and its licensing restricts usage to environments connected with Databricks Services. This means you cannot use DQX freely in external Spark environments such as AWS EMR

To your questions

1. Yes, DQX requires a Databricks platform agreement for legitimate usage.

2.No, there are currently no publicly documented licensing options that allow Databricks Labs projects, including DQX, to be used outside the Databricks platform.

3.Yes, there are several open-source and permissively licensed alternatives example-  Great Expectations: Python-based framework with Spark DataFrame support, configurable validation rules, and strong community adoption.

View solution in original post

2 REPLIES 2

ManojkMohan
Honored Contributor II

@wkgcls DQX requires a Databricks platform license for production use, and its licensing restricts usage to environments connected with Databricks Services. This means you cannot use DQX freely in external Spark environments such as AWS EMR

To your questions

1. Yes, DQX requires a Databricks platform agreement for legitimate usage.

2.No, there are currently no publicly documented licensing options that allow Databricks Labs projects, including DQX, to be used outside the Databricks platform.

3.Yes, there are several open-source and permissively licensed alternatives example-  Great Expectations: Python-based framework with Spark DataFrame support, configurable validation rules, and strong community adoption.

wkgcls
Visitor

Thanks a lot for the quick response, @ManojkMohan! This was very helpful.
I'll keep this in mind.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now