Features |
Community Edition |
Free Edition |
Notebooks |
✅ |
✅ |
ML Flow |
✅ |
✅ |
❌ |
✅ |
|
❌ |
✅ |
|
❌ |
✅ |
|
❌ |
✅ |
|
❌ |
✅ |
|
❌ |
✅ |
|
❌ |
✅ |
|
❌ |
✅ |
|
❌ |
✅ |
|
❌ |
✅ |
|
Clean Rooms |
❌ |
❌ |
Lakebase |
❌ |
(not yet) |
Agent Bricks |
❌ |
(not yet) |
Enterprise Admin |
❌ |
❌ |
Classic Compute |
✅ |
❌ |
❌ |
✅ |
|
GPUs |
BYO* |
❌ |
* BYO = Bring Your Own
1- Ingestion:
Databricks Ingestion refers to the process of bringing data from various sources into the Databricks Lakehouse Platform so it can be processed, analyzed, or stored. It is a critical first step in building data pipelines and analytics workflows on Databricks.
Definition:
Databricks Ingestion is the method of importing structured, semi-structured, or unstructured data from internal or external sources into a Databricks workspace, where it is typically written to Delta Lake tables for reliable, scalable processing and analytics.
Ingestion Methods:
Resources:
2 - Jobs:
Databricks Jobs are a feature that lets you automate and schedule your data processing workflows. A job in Databricks runs a notebook, JAR, Python script, or SQL query on a defined schedule or in response to an event — making it essential for production data pipelines, ETL, machine learning, and more.
Definition:
A Databricks Job is a reusable, configurable task that executes code (e.g., a notebook or script) on the Databricks platform according to a defined schedule, trigger, or dependency.
Core Capabilities:
Feature |
Description |
|
Run multiple tasks in a sequence with dependencies (DAG-style) |
|
Run jobs on a cron-like schedule or at specific intervals |
|
Start jobs manually, on a schedule, or via API/Webhook |
|
Configure retry logic and failure notifications (email, webhooks, etc.) |
|
Run jobs on existing, new, or serverless clusters |
|
Monitor job history, logs, and metrics in the UI or via the API |
|
Pass parameters to make jobs dynamic (e.g., different input paths/dates) |
Supported Job Types:
Example Use Cases:
Resources:
3 - Pipelines:
In Databricks, pipelines are automated, scalable workflows that help you build, manage, and monitor end-to-end data processing tasks — such as ETL (Extract, Transform, Load), machine learning workflows, or streaming applications. There are two main types of pipelines in Databricks:
Definition:
A Delta Live Tables pipeline is a managed pipeline that lets you define and automate reliable, production-grade data transformations using declarative SQL or Python.
A Databricks Jobs pipeline (aka workflow) is a sequence of tasks (e.g., notebooks, scripts, or queries) that execute in order or in parallel, similar to an orchestrated DAG.
This is often what people mean when they refer to "pipelines" in Databricks Jobs.
Key Features:
Pipeline Type |
Description |
|
For declarative data transformations (SQL/PySpark), with auto-optimizations |
|
For custom task orchestration (any code/script/tool) in a flexible DAG |
4 - Dashboards:
Databricks Dashboards are visual tools within the Databricks platform that allow users to create, share, and view visualizations of data from notebooks or SQL queries. They are useful for presenting insights to stakeholders without needing them to dive into raw data or code.
What Can You Do with Databricks Dashboards?
Use Cases:
Key Features:
Feature |
Description |
|
Refresh dynamically from SQL queries or notebooks |
|
Easy layout customization |
|
Workspace-level permission management |
|
Embeddable in apps or web portals (Enterprise only) |
How to Create a Dashboard in Databricks:
Resources:
5 - Genie:
Databricks Genie is an AI-powered, no-code interface that lets users—especially business or non-technical teams—interact with data using natural language. Genie translates questions into SQL queries, runs them via Databricks SQL, and returns results along with visualizations.
How Genie “Thinks”:
How to Use Genie:
Benefits & Considerations:
Summary:
Genie bridges the gap between technical data assets and business users, offering an intelligent natural-language assistant for querying, charting, and exploring data—powered by SQL under the hood and enhanced with ongoing human-guided refinement.
Here’s a streamlined setup guide for Databricks AI/BI Genie, including environment requirements, step-by-step instructions, and best practices:
Summary Workflow:
Resources:
6 - Semantic Search:
Databricks Semantic Search, part of the Mosaic AI Vector Search feature, enables hybrid semantic+keyword search on text data stored in Delta tables. It uses vector embeddings plus traditional keyword search, combining results via Reciprocal Rank Fusion to deliver more meaningful and context-aware retrieval.
Core Features:
How It Works (Example Flow):
Summary:
Databricks Semantic Search (via Mosaic AI Vector Search) provides enterprise-grade hybrid search within Delta tables—automatically syncing embeddings, managing indexing, and combining semantic and keyword matching to surface more relevant results. It's an essential feature for building retrieval-augmented systems, knowledge discovery tools, or intelligent search experiences.
Resources:
Databricks Model Serving is a fully managed, production-grade system for deploying and hosting machine learning models as REST APIs. It enables users to serve models built in Databricks or imported from outside, monitor performance, and integrate seamlessly with applications or inference workflows like Retrieval-Augmented Generation (RAG), dashboards, or batch jobs.
Feature |
Description |
|
Databricks handles autoscaling, containerization, and endpoint management |
|
Expose ML models as secure, scalable HTTP endpoints |
|
Models are versioned and managed in Unity Catalog |
|
Supports MLflow, scikit-learn, PyTorch, XGBoost, TensorFlow, and Hugging Face |
|
Ideal for serving LLMs, foundation models, or custom models on GPUs |
|
Serverless Model Serving endpoints offer low-latency, autoscaling compute without infrastructure management |
8 - Model Evaluation:
Databricks Model Evaluation is a feature within the Databricks Machine Learning environment that helps data scientists and ML engineers assess the performance and quality of machine learning models. It provides a standardized, automated way to compute, log, and visualize evaluation metrics, making it easier to compare models, validate behavior, and ensure models meet required standards before deployment.
Key Capabilities:
Resources:
9 - Agents:
Databricks Agents are a new capability introduced by Databricks as part of their AI and machine learning stack, designed to enable natural language interaction with data and workflows. They are essentially AI-powered assistants that can interpret user input (typically in plain English) and then act on it by writing code, querying data, or triggering workflows using the underlying data infrastructure in Databricks.
What Are Databricks Agents?
Key Features:
Feature |
Description |
|
Users can ask data questions in plain English, and the agent generates SQL or PySpark to retrieve the answer |
|
Agents have access to Unity Catalog, Delta Tables, ML models, and notebooks |
|
They use Retrieval Augmented Generation (RAG) by pulling in context from documentation, notebooks, or other resources |
|
You can define tools (functions, APIs, SQL snippets) the agent can use to complete tasks. |
|
Interact via chat interfaces, dashboards, or embedded tools |
Example Use Case:
Imagine a user types - “Show me the revenue trend for Q2 2024 for our top 5 performing regions.”
A Databricks Agent could:
Architecture Overview:
Databricks Agents are built using:
Resources:
10 - Unity Catalog:
Databricks Unity Catalog is a unified governance solution for all data and AI assets in the Databricks Lakehouse Platform. It provides fine-grained access control, data lineage, and centralized metadata management across workspaces, data, and compute environments—ensuring consistent security and compliance for your entire data estate.
Key Concepts of Unity Catalog:
Concept |
Description |
|
A top-level container for schemas (databases). Think of it as a namespace for organizing data assets |
|
A container within a catalog that holds tables, views, and functions |
|
Structured data assets stored in Delta Lake or other formats |
|
Automatically tracks how data flows across queries, jobs, and dashboards |
|
Role-based and attribute-based access policies using ANSI SQL GRANT, REVOKE |
|
Helps label sensitive data (e.g., PII) and enforce compliance |
Why Use Unity Catalog?
Security & Compliance Features:
Supported Assets:
Multi-Workspace & Cross-Cloud:
Resources:
Databricks Documentation: Unity Catalog Overview
11 - Serverless Compute:
Databricks Serverless Compute is a fully managed compute option that automatically provisions and scales compute resources for SQL, notebooks, dashboards, and jobs—without requiring users to manage clusters. It simplifies infrastructure management, reduces cost through automatic scaling, and improves performance by using optimized hardware.
Key Features of Databricks Serverless Compute:
Feature |
Description |
|
Users don’t need to create, configure, or manage clusters manually |
|
Automatically scales resources up or down based on the workload |
|
Compute resources are ready in seconds—ideal for ad hoc queries or interactive notebooks |
|
Databricks manages compute infrastructure to ensure optimal performance, often using faster hardware and tuning |
|
You pay only for what you use, down to the second, making it cost-efficient |
Use Cases:
Benefits:
Security and Governance:
Resources:
We hope that you find this information useful in your transition from Databricks Community to Free Edition and that you get to fully leverage the expanded capabilities offered by Databricks Free Edition.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.