Databricks Community

mariadawson · 03-09-2026

The AnalysisException you're seeing in the Databricks Community Edition is almost always caused by a mismatch between the JSON file format and Spark’s default reader.By default, Spark expects JSON Lines (one JSON object per line). If your file is a s...

mariadawson · 12-02-2025

To build reusable data engineering components in Databricks, focus on modular design by creating testable Python/Scala libraries instead of relying on %run notebooks. Parameterize all notebooks using widgets for dynamic execution across environments....

mariadawson · 07-23-2025

Currently, DLT doesn’t natively support applying expectations or conditional logic based on aggregate metrics like row count within a single pipeline step. That’s why `dlt.expect_or_fail` and trying to count rows within DLT tables doesn’t work as exp...

mariadawson · 07-23-2025

Ensuring annotation quality at scale is always a challenge! Here’s what’s worked for my teams:Clear guidelines: We invest time in detailed instructions and regular annotator training to avoid ambiguity.Hybrid approach: We use automated tools for high...

Databricks Community

User Stats

User Activity

Re: Attempting to load a JSON file fails due to schema issue (Free Edition)

Re: Best Development Strategies for Building Reusable Data Engineering Components in Databricks

Re: how can I verify that the result of a dlt will have enough rows before updating the table?

Re: How Important is High-Quality Data Annotation in Training ML Models?