Databricks Community

prakashhinduja2 · ‎08-29-2025

Hello, I'm Prakash Hinduja, an Indian-born financial advisor and consultant based in Geneva, Switzerland (Swiss). My career is focused on guiding high-net-worth individuals and business leaders through the intricate world of global investment and wealth management. Leveraging my strong background in international finance, I craft bespoke strategies that have led clients to affectionately call me the Prakash Hinduja net worth booster.

I’m trying to create an empty DataFrame in Databricks and was wondering if there are multiple ways to do it—especially with or without a predefined schema. What approaches have worked best for you? Appreciate any tips!

Regards

Prakash Hinduja Geneva, Switzerland (Swiss)

ilir_nuredini · ‎08-29-2025

Hello,

There are a couple of ways how you can define an empty spark dataframe, here are some of them:

1. Create an empty dataframe with a schema

schema = StructType([
StructField('name', StringType(), True),
StructField('age', IntegerType(), True)
])

empty_df = spark.createDataFrame([], schema)

2. Create an empty dataframe without specifying any cols

empty_df_without_cols = spark.createDataFrame([], StructType([]))

3. Creating empty RDD then converting it to dataframe (just fyi, this option won't work in free edition, because of the serverless compute)

schema = StructType([
StructField('name', StringType(), True),
StructField('age', IntegerType(), True)
])

emptyRDD = spark.sparkContext.emptyRDD()
empty_df1 = emptyRDD.toDF(schema)

Hope that helps.

Best, Ilir

ManojkMohan · ‎08-29-2025

Best Practices from Experience:

Use predefined schema if you know your column types upfront—prevents errors when appending new data.
For ad-hoc exploration, toDF or createDataFrame([], None) works fine.
Always check printSchema()—it helps avoid silent type issues later in transformations.

Possible Scenarios:

1. Without a predefined schema (completely empty)
Pros:
Quick and simple.
Useful for placeholder DataFrames.
Cons:
Columns and types aren’t defined, so adding data later can be cumbersome.

2. With a predefined schema
This is the more common and safer approach, especially if you plan to append data later.
Pros:
Ensures consistent column types.
Easy to append rows later using unionByName.

3. Using spark.createDataFrame with an empty RDD
This is essentially the same as the above, but sometimes preferred in pure Spark setups
Pros:
Works well in Spark-heavy pipelines.

4. Using toDF on an empty RDD
If you want to define only column names (types default to StringType)
Pros:
Lightweight if you don’t care about strict types.
Cons:
All columns default to StringType, so type conversions may be needed later.

Databricks Community

Prakash Hinduja ~ How do I create an empty DataFrame in Databricks—are there multiple ways?

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! October 31 – November 06, 2025

Free Edition Hackathon

🚀 Announcing the Databricks Data Intelligence Platform Cheat Sheet

Zerobus Ingest in Action: How to Stream Event Data Directly into Your Lakehouse

Find Sensitive Data at Scale with Data Classification in Unity Catalog