<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Data Quality with PySpark and Great Expectations on Databricks in Community Articles</title>
    <link>https://community.databricks.com/t5/community-articles/data-quality-with-pyspark-and-great-expectations-on-databricks/m-p/129959#M617</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/179612"&gt;@WiliamRosa&lt;/a&gt;WiliamRosa:&amp;nbsp; Thanks for sharing the link. I will explore.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 28 Aug 2025 04:55:49 GMT</pubDate>
    <dc:creator>BR_DatabricksAI</dc:creator>
    <dc:date>2025-08-28T04:55:49Z</dc:date>
    <item>
      <title>Data Quality with PySpark and Great Expectations on Databricks</title>
      <link>https://community.databricks.com/t5/community-articles/data-quality-with-pyspark-and-great-expectations-on-databricks/m-p/128912#M569</link>
      <description>&lt;P&gt;Data governance is one of the most important pillars in any modern architecture. When building pipelines that process data at scale, ensuring data quality is not just a best practice—it is a critical necessity.&lt;/P&gt;&lt;P&gt;Tools like Great Expectations (GX) were created to fill this gap, allowing you to define automated, human-readable, and auditable validation rules.&lt;/P&gt;&lt;P&gt;In this article, I’ll show how to use PySpark + Great Expectations on Databricks, applying programmatic validations on Spark DataFrames in a simple and reusable way.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN&gt;Why Great Expectations?&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Great Expectations (GX) is an open-source framework that enables you to:&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Define expectations about data (e.g., non-null columns, correct data types, unique values).&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Create living documentation of data quality.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Integrate with different data engines: Pandas, Spark, SQLAlchemy.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Generate automated validation reports.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;In the context of Databricks + Spark, it’s a perfect match for validating distributed DataFrames before loading them into data lakes or data warehouses.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;For more details, see "&lt;/SPAN&gt;&lt;A href="https://greatexpectations.io/" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;https://greatexpectations.io/&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt;".&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN&gt;Generic Validation Function&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Below is a generic function I wrote to encapsulate the main validations. It takes:&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;A Spark DataFrame.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;An expected schema (as a dictionary).&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Optional parameters, such as expected row count and column order validation.&lt;/SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;import great_expectations as gx
from pyspark.sql import DataFrame

def validate_with_gx(
    df: DataFrame,
    schema: dict,
    expected_row_count: int = None,
    check_ordered_columns: bool = True,
    enable_length_check: bool = False
) -&amp;gt; None:
    """
    Runs Great Expectations checks on a Spark DataFrame.
    """
    # 1) Build a transient GX context and Spark datasource
    context = gx.get_context()
    ds = context.data_sources.add_spark(name="spark_in_memory")
    asset = ds.add_dataframe_asset(name="df_asset")
    batch_def = asset.add_batch_definition_whole_dataframe("df_batch")
    batch = batch_def.get_batch(batch_parameters={"dataframe": df})

    # 2) Run expectations per schema
    from great_expectations import expectations as E
    results = []
    ordered_cols = []
    for col, props in schema.items():
        ordered_cols.append(col)

        if props.get("unique", False):
            results.append(batch.validate(E.ExpectColumnValuesToBeUnique(column=col)))
        if props.get("nullable", True) is False:
            results.append(batch.validate(E.ExpectColumnValuesToNotBeNull(column=col)))

        dtype = props.get("dtype")
        if dtype:
            results.append(batch.validate(E.ExpectColumnValuesToBeOfType(column=col, type_=dtype)))

        if enable_length_check:
            size = props.get("size")
            if size is not None:
                results.append(
                    batch.validate(
                        E.ExpectColumnValueLengthsToBeBetween(
                            column=col, min_value=None, max_value=int(size), strict_max=True
                        )
                    )
                )

    # 3) Table-level expectations
    if check_ordered_columns:
        results.append(batch.validate(E.ExpectTableColumnsToMatchOrderedList(column_list=ordered_cols)))
    if expected_row_count is not None:
        results.append(batch.validate(E.ExpectTableRowCountToEqual(value=int(expected_row_count))))

    # 4) Summarize results
    total = len(results)
    successes = sum(1 for r in results if getattr(r, "success", False))
    failures = total - successes

    print(f"[DQ] Expectations run: {total} | Passed: {successes} | Failed: {failures}")
    if failures &amp;gt; 0:
        for r in results:
            if not getattr(r, "success", False):
                cfg = getattr(r, "expectation_config", None)
                etype = getattr(cfg, "type", "unknown") if cfg else "unknown"
                kwargs = getattr(cfg, "kwargs", {}) if cfg else {}
                print(f"[DQ][FAIL] {etype} {kwargs}")
        raise Exception("Data Quality validation failed.")
    else:
        print("[DQ] All checks passed ✔️")&lt;/LI-CODE&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN&gt;Usage Example&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;We define an expected schema as a dictionary:&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;expected_schema = {
    "id":         {"size": None, "dtype": "IntegerType",  "unique": True,  "nullable": False},
    "name":       {"size": 255,  "dtype": "StringType",   "unique": False, "nullable": False},
    "created_at": {"size": None, "dtype": "TimestampType","unique": False, "nullable": False},
}&lt;/LI-CODE&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;Create&lt;/SPAN&gt;&lt;SPAN class=""&gt; a &lt;/SPAN&gt;&lt;SPAN class=""&gt;test&lt;/SPAN&gt; &lt;SPAN class=""&gt;DataFrame&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN class=""&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;demo_df = spark.createDataFrame(
    [(1, "Alice", "2024-01-01 10:00:00"),
     (2, "Bob",   "2024-01-01 11:00:00")],
    ["id", "name", "created_at"]
).selectExpr(
    "CAST(id AS INT) as id",
    "CAST(name AS STRING) as name",
    "to_timestamp(created_at) as created_at"
)

expected_rows = demo_df.count()

validate_with_gx(
    df=demo_df,
    schema=expected_schema,
    expected_row_count=expected_rows,
    check_ordered_columns=True,
    enable_length_check=False
)&lt;/LI-CODE&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN&gt;Expected Output&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;If all expectations are met:&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;[DQ] Expectations run: 5 | Passed: 5 | Failed: 0 
[DQ] All checks passed ✔️ &lt;/LI-CODE&gt;&lt;P&gt;&lt;SPAN&gt;If there are failures, the log will show details about the unmet expectation (e.g., null values or wrong data type).&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;&lt;SPAN&gt;When to Use on Databricks?&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;This type of validation is ideal for:&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ETL/ELT pipelines → validating intermediate tables before saving them to the Delta Lake.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Data mesh → enforcing data contracts across domains.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Governance → producing evidence of data quality for audits.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Monitoring → detecting schema breaks and anomalies early.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN&gt;Complete Expectations Reference&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;For the full catalog of expectations available in Great Expectations, consult the official index at "&lt;/SPAN&gt;&lt;A href="https://greatexpectations.io/expectations/" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;https://greatexpectations.io/expectations/&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt;".&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN&gt;Summary List of Common Expectations&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Below is a concise list of frequently used expectations (names as used in GX’s V3 API). Use them as building blocks for your contracts:&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectTableRowCountToEqual&lt;/SPAN&gt;&lt;SPAN&gt; — Enforces an exact number of rows.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectTableRowCountToBeBetween&lt;/SPAN&gt;&lt;SPAN&gt; — Enforces a row count range.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectTableColumnsToMatchOrderedList&lt;/SPAN&gt;&lt;SPAN&gt; — Ensures the table has exactly these columns in this order.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectTableColumnCountToBeBetween&lt;/SPAN&gt;&lt;SPAN&gt; — Enforces a range of column count.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnValuesToNotBeNull&lt;/SPAN&gt;&lt;SPAN&gt; — Disallows nulls in a column.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnValuesToBeUnique&lt;/SPAN&gt;&lt;SPAN&gt; — Enforces uniqueness on a column.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectMulticolumnValuesToBeUnique&lt;/SPAN&gt;&lt;SPAN&gt; — Enforces uniqueness across multiple columns (composite key).&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnValuesToBeInSet&lt;/SPAN&gt;&lt;SPAN&gt; — Values must be in a given whitelist.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnDistinctValuesToBeInSet&lt;/SPAN&gt;&lt;SPAN&gt; — All distinct values are from a given set.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnValuesToBeBetween&lt;/SPAN&gt;&lt;SPAN&gt; — Numeric or datetime values fall within [min, max].&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnValueLengthsToBeBetween&lt;/SPAN&gt;&lt;SPAN&gt; — String length bounds (often used with VARCHAR-like limits).&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnValuesToMatchRegex&lt;/SPAN&gt;&lt;SPAN&gt; — String values match a regular expression.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnValuesToMatchRegexList&lt;/SPAN&gt;&lt;SPAN&gt; — String values match at least one regex from a list.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnValuesToBeOfType&lt;/SPAN&gt;&lt;SPAN&gt; — Column has an expected data type (e.g., "StringType").&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnValuesToBeDateutilParseable&lt;/SPAN&gt;&lt;SPAN&gt; — Values can be parsed as dates.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnValuesToBeInTypeList&lt;/SPAN&gt;&lt;SPAN&gt; — Type belongs to an allowed set.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnMedianToBeBetween&lt;/SPAN&gt;&lt;SPAN&gt; / &lt;/SPAN&gt;&lt;SPAN&gt;ExpectColumnMeanToBeBetween&lt;/SPAN&gt;&lt;SPAN&gt; — Distribution sanity checks.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnQuantileValuesToBeBetween&lt;/SPAN&gt;&lt;SPAN&gt; — Quantiles fall within ranges.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnProportionOfUniqueValuesToBeBetween&lt;/SPAN&gt;&lt;SPAN&gt; — Cardinality sanity check.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnPairValuesToBeInSet&lt;/SPAN&gt;&lt;SPAN&gt; — Validates allowed combinations across two columns.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;ExpectColumnValuesToBeUniqueWithinRecord&lt;/SPAN&gt;&lt;SPAN&gt; — No duplicate values within a row (useful for wide tables with repeating fields).&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN&gt;Conclusion&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Integrating PySpark + Great Expectations within Databricks is a powerful way to boost data reliability.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;With just a few lines of code, we can:&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Validate schemas, columns, and types.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Ensure quality before persisting to the data lake.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Automate checks across critical pipelines.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Run data quality checks across batches of multiple tables, not just individual DataFrames.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;Persist validation results into Delta Tables, making them available for monitoring and visualization through dashboards such as Power BI or Databricks SQL.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Tue, 19 Aug 2025 21:18:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/data-quality-with-pyspark-and-great-expectations-on-databricks/m-p/128912#M569</guid>
      <dc:creator>WiliamRosa</dc:creator>
      <dc:date>2025-08-19T21:18:31Z</dc:date>
    </item>
    <item>
      <title>Re: Data Quality with PySpark and Great Expectations on Databricks</title>
      <link>https://community.databricks.com/t5/community-articles/data-quality-with-pyspark-and-great-expectations-on-databricks/m-p/129607#M606</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/179612"&gt;@WiliamRosa&lt;/a&gt;&amp;nbsp; : Thanks for sharing the nice article on DQ. I would like to hear from your end what are the other alternative options exist, if we don't want to go for external libraries.&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 25 Aug 2025 12:16:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/data-quality-with-pyspark-and-great-expectations-on-databricks/m-p/129607#M606</guid>
      <dc:creator>BR_DatabricksAI</dc:creator>
      <dc:date>2025-08-25T12:16:14Z</dc:date>
    </item>
    <item>
      <title>Re: Data Quality with PySpark and Great Expectations on Databricks</title>
      <link>https://community.databricks.com/t5/community-articles/data-quality-with-pyspark-and-great-expectations-on-databricks/m-p/129646#M607</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97026"&gt;@BR_DatabricksAI&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you very much for the question, and if you can't access external libraries, there's the option of using Databricks' own DQ features. There's a really cool post about it, I'll leave the link below:&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.databricks.com/discover/pages/data-quality-management" target="_blank"&gt;https://www.databricks.com/discover/pages/data-quality-management&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 25 Aug 2025 16:04:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/data-quality-with-pyspark-and-great-expectations-on-databricks/m-p/129646#M607</guid>
      <dc:creator>WiliamRosa</dc:creator>
      <dc:date>2025-08-25T16:04:51Z</dc:date>
    </item>
    <item>
      <title>Re: Data Quality with PySpark and Great Expectations on Databricks</title>
      <link>https://community.databricks.com/t5/community-articles/data-quality-with-pyspark-and-great-expectations-on-databricks/m-p/129959#M617</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/179612"&gt;@WiliamRosa&lt;/a&gt;WiliamRosa:&amp;nbsp; Thanks for sharing the link. I will explore.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 28 Aug 2025 04:55:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/data-quality-with-pyspark-and-great-expectations-on-databricks/m-p/129959#M617</guid>
      <dc:creator>BR_DatabricksAI</dc:creator>
      <dc:date>2025-08-28T04:55:49Z</dc:date>
    </item>
  </channel>
</rss>

