02-03-2026 05:09 AM
Hello Everyone,
I am creating a new college course on Database design and SQL analytics and have decided to use Databricks as our platform in the course. We are going to be using the Free Edition so students do not need to pay for access. I'm wondering what solutions people have found for creating datasets and sharing them with students? From what I can tell, the free edition limits sharing directly via emails and also limits Delta Shares. Is my only option to export to .csv files and then have the students create their own tables using the .csv file?
The same question goes for SQL editor scripts; I created some demos that I walked through in class but I would like to share the editor files directly. Is that possible using the Free Edition? My current work around is copying the SQL queries to a .txt file and the students copy & paste from the .txt into their own SQL editor.
Hoping there might be some easier sharing opportunities that I'm missing in the Free Edition.
02-03-2026 08:01 AM
One other point and, quick win for course datasets on Free Edition.
Databricks Labs has a purpose‑built synthetic data toolkit: dbldatagen (Databricks Labs Data Generator). It’s open source and runs great on Free Edition with a simple notebook‑scoped install.
Install:
In a notebook cell: %pip install dbldatagen.
Links:
Works out of the box on Databricks runtimes and Community/Free Edition via %pip (no special cluster libs).
No extra Python deps beyond what the Databricks runtime already includes for supported runtimes.
You can expose the generated DataFrame as a view and consume from other languages (SQL, Scala, R).
Comes with plug‑in style standard datasets to jump‑start common examples.
Supports multi‑table generation with cross‑references — perfect for relational concepts (FKs, dimensions/facts).
Copy/paste starter
%pip install dbldatagen
import dbldatagen as dg
dataspec = (
dg.DataGenerator(spark, name="customers", rows=10_000)
.withColumn("customer_id", "int", minValue=1, maxValue=10_000)
.withColumn("name", "string", template=r"\w \w")
.withColumn("email", "string", template=r"\w@\w.com")
.withColumn("signup_date", "date", begin="2020-01-01", end="2024-12-31")
)
df = dataspec.build()
df.write.saveAsTable("customers")
Cheers, Lou
02-03-2026 07:59 AM
Hey @Drew_Prof ,
Short answer: With Databricks Free Edition, you can’t act as a Delta Sharing provider or use Marketplace to distribute data, and you don’t have access to account-level sharing features. Instead, the most reliable path is to distribute files (CSV/Parquet) and have students load them into their own workspaces using Unity Catalog volumes; for SQL, share .sql files or notebooks via a public Git repo or simple file upload/import. This keeps each student within their own Free Edition workspace and avoids quota/contention issues.
Recommended patterns that work well for a class
Datasets (tables) Option A — Distribute files; students load into their own volume (recommended)
SQL -- one-time setup CREATE CATALOG IF NOT EXISTS workspace; CREATE SCHEMA IF NOT EXISTS workspace.default; CREATE VOLUME IF NOT EXISTS workspace.default.course_data;
-- after uploading orders.csv to the volume: CREATE TABLE IF NOT EXISTS workspace.default.orders USING CSV OPTIONS (header true, inferSchema true) LOCATION '/Volumes/workspace/default/course_data/orders.csv';
Option B — Provide a “bootstrap” notebook or SQL file
Notes
SQL editor scripts and teaching materials Option C — Public Git repository
Option D — Export/import notebooks (.dbc or source)
What not to rely on in Free Edition
If you really want “in‑platform” collaboration
Quick starter checklist you can reuse in your syllabus
Hope this helps, Louis.
02-03-2026 08:01 AM
One other point and, quick win for course datasets on Free Edition.
Databricks Labs has a purpose‑built synthetic data toolkit: dbldatagen (Databricks Labs Data Generator). It’s open source and runs great on Free Edition with a simple notebook‑scoped install.
Install:
In a notebook cell: %pip install dbldatagen.
Links:
Works out of the box on Databricks runtimes and Community/Free Edition via %pip (no special cluster libs).
No extra Python deps beyond what the Databricks runtime already includes for supported runtimes.
You can expose the generated DataFrame as a view and consume from other languages (SQL, Scala, R).
Comes with plug‑in style standard datasets to jump‑start common examples.
Supports multi‑table generation with cross‑references — perfect for relational concepts (FKs, dimensions/facts).
Copy/paste starter
%pip install dbldatagen
import dbldatagen as dg
dataspec = (
dg.DataGenerator(spark, name="customers", rows=10_000)
.withColumn("customer_id", "int", minValue=1, maxValue=10_000)
.withColumn("name", "string", template=r"\w \w")
.withColumn("email", "string", template=r"\w@\w.com")
.withColumn("signup_date", "date", begin="2020-01-01", end="2024-12-31")
)
df = dataspec.build()
df.write.saveAsTable("customers")
Cheers, Lou
02-03-2026 08:31 AM
Excellent information, thank you! The data generator was something I was not aware of so I will check that out.