cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

College Course Use - Sharing Data With Students

Drew_Prof
New Contributor II

Hello Everyone,

I am creating a new college course on Database design and SQL analytics and have decided to use Databricks as our platform in the course.  We are going to be using the Free Edition so students do not need to pay for access.  I'm wondering what solutions people have found for creating datasets and sharing them with students?  From what I can tell, the free edition limits sharing directly via emails and also limits Delta Shares.  Is my only option to export to .csv files and then have the students create their own tables using the .csv file?

The same question goes for SQL editor scripts; I created some demos that I walked through in class but I would like to share the editor files directly.  Is that possible using the Free Edition?  My current work around is copying the SQL queries to a .txt file and the students copy & paste from the .txt into their own SQL editor. 

Hoping there might be some easier sharing opportunities that I'm missing in the Free Edition.

1 ACCEPTED SOLUTION

Accepted Solutions

Louis_Frolio
Databricks Employee
Databricks Employee

One other point and, quick win for course datasets on Free Edition.

Databricks Labs has a purposeโ€‘built synthetic data toolkit: dbldatagen (Databricks Labs Data Generator). Itโ€™s open source and runs great on Free Edition with a simple notebookโ€‘scoped install.

  • Works out of the box on Databricks runtimes and Community/Free Edition via %pip (no special cluster libs).

  • No extra Python deps beyond what the Databricks runtime already includes for supported runtimes.

  • You can expose the generated DataFrame as a view and consume from other languages (SQL, Scala, R).

  • Comes with plugโ€‘in style standard datasets to jumpโ€‘start common examples.

  • Supports multiโ€‘table generation with crossโ€‘references โ€” perfect for relational concepts (FKs, dimensions/facts).

Copy/paste starter

%pip install dbldatagen

import dbldatagen as dg

dataspec = (
    dg.DataGenerator(spark, name="customers", rows=10_000)
      .withColumn("customer_id", "int", minValue=1, maxValue=10_000)
      .withColumn("name", "string", template=r"\w \w")
      .withColumn("email", "string", template=r"\w@\w.com")
      .withColumn("signup_date", "date", begin="2020-01-01", end="2024-12-31")
)

df = dataspec.build()
df.write.saveAsTable("customers")

 

Cheers, Lou

View solution in original post

3 REPLIES 3

Louis_Frolio
Databricks Employee
Databricks Employee

Hey @Drew_Prof , 

Short answer: With Databricks Free Edition, you canโ€™t act as a Delta Sharing provider or use Marketplace to distribute data, and you donโ€™t have access to account-level sharing features. Instead, the most reliable path is to distribute files (CSV/Parquet) and have students load them into their own workspaces using Unity Catalog volumes; for SQL, share .sql files or notebooks via a public Git repo or simple file upload/import. This keeps each student within their own Free Edition workspace and avoids quota/contention issues.

Recommended patterns that work well for a class

Datasets (tables) Option A โ€” Distribute files; students load into their own volume (recommended)

  • You publish small-to-moderate CSV/Parquet files through your LMS or a public link (GitHub release, course site).
  • Students upload the files to a Unity Catalog volume in their own Free Edition workspace (Catalog > Volumes > Upload). Free Edition supports volumes; DBFS root is restricted.
  • Students create tables over those files. Example:

SQL -- one-time setup CREATE CATALOG IF NOT EXISTS workspace; CREATE SCHEMA IF NOT EXISTS workspace.default; CREATE VOLUME IF NOT EXISTS workspace.default.course_data;

-- after uploading orders.csv to the volume: CREATE TABLE IF NOT EXISTS workspace.default.orders USING CSV OPTIONS (header true, inferSchema true) LOCATION '/Volumes/workspace/default/course_data/orders.csv';

Option B โ€” Provide a โ€œbootstrapโ€ notebook or SQL file

  • Ship a small notebook or .sql file that: (1) creates the volume, (2) gives students a step to upload files, (3) executes the CREATE TABLE commands. This minimizes copy/paste errors and standardizes table names.

Notes

  • Favor Parquet where possible to cut file size and speed up loads (especially useful under small-warehouse limits).
  • Avoid relying on external HTTP downloads from within the workspace; Free Edition outbound access is allowlisted and may not include arbitrary hosts.

SQL editor scripts and teaching materials Option C โ€” Public Git repository

  • Put .sql files and notebooks in a public GitHub repo.
  • Students either:
    • Use Git folders (if enabled for their Free Edition workspace) to clone the repo; or
    • Download files from GitHub and use โ€œUploadโ€ in the Databricks Workspace or SQL Editor Files to import .sql or notebooks.
      This is the simplest way to share SQL editor content without depending on workspace invites. (Git folders are generally available in Databricks; if theyโ€™re not visible in a studentโ€™s Free Edition workspace, file upload still works.)

Option D โ€” Export/import notebooks (.dbc or source)

  • Export notebooks as .dbc or source files and post them to the LMS.
  • Students import via Workspace > Import; then they can open the SQL cells in the editor.

What not to rely on in Free Edition

  • Delta Sharing as a provider or Marketplace distribution: provider objects are created at the account/metastore layer and Free Edition does not expose the account console/APIs; Marketplace provider access is explicitly disallowed.
  • Single shared instructor workspace for the whole class: one tiny SQL warehouse plus fairโ€‘use quotas will bottleneck and may shut compute down for the day if exceeded.

If you really want โ€œinโ€‘platformโ€ collaboration

  • You can add a small number of collaborators to a single workspace and coโ€‘edit notebooks/SQL files in realโ€‘time, but keep groups small and timeโ€‘boxed to avoid quotas.
  • For larger cohorts, stick with each studentโ€™s own Free Edition workspace + file/Git distribution.

Quick starter checklist you can reuse in your syllabus

  • Provide download links for datasets (CSV/Parquet) and a bootstrap SQL file/notebook that:
    1. Creates catalog/schema/volume
    2. Instructs students to upload files
    3. Runs CREATE TABLE โ€ฆ USING CSV/Parquet LOCATION '/Volumes/โ€ฆ'
  • Host all SQL editor examples in a public GitHub repo as .sql files; add a README with โ€œUpload into SQL Editor Filesโ€ instructions.
  • Keep file sizes modest and table counts reasonable to respect Free Edition limits.

Hope this helps, Louis.

Louis_Frolio
Databricks Employee
Databricks Employee

One other point and, quick win for course datasets on Free Edition.

Databricks Labs has a purposeโ€‘built synthetic data toolkit: dbldatagen (Databricks Labs Data Generator). Itโ€™s open source and runs great on Free Edition with a simple notebookโ€‘scoped install.

  • Works out of the box on Databricks runtimes and Community/Free Edition via %pip (no special cluster libs).

  • No extra Python deps beyond what the Databricks runtime already includes for supported runtimes.

  • You can expose the generated DataFrame as a view and consume from other languages (SQL, Scala, R).

  • Comes with plugโ€‘in style standard datasets to jumpโ€‘start common examples.

  • Supports multiโ€‘table generation with crossโ€‘references โ€” perfect for relational concepts (FKs, dimensions/facts).

Copy/paste starter

%pip install dbldatagen

import dbldatagen as dg

dataspec = (
    dg.DataGenerator(spark, name="customers", rows=10_000)
      .withColumn("customer_id", "int", minValue=1, maxValue=10_000)
      .withColumn("name", "string", template=r"\w \w")
      .withColumn("email", "string", template=r"\w@\w.com")
      .withColumn("signup_date", "date", begin="2020-01-01", end="2024-12-31")
)

df = dataspec.build()
df.write.saveAsTable("customers")

 

Cheers, Lou

Drew_Prof
New Contributor II

Excellent information, thank you!  The data generator was something I was not aware of so I will check that out.