Databricks Community

Dave_Nithio · ‎10-12-2022

I am currently taking the Data Engineering with Databricks course and have run into an error. I have also attempted this with my own data and had a similar error. In the lab, we are using autoloader to read a spark stream of csv files saved in the DBFS. The answer for this lab is:

# ANSWER
customers_checkpoint_path = f"{DA.paths.checkpoints}/customers"
 
(spark.readStream
      .format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .option("cloudFiles.schemaLocation", customers_checkpoint_path)
      .load("/databricks-datasets/retail-org/customers/")
      .createOrReplaceTempView("customers_raw_temp"))

This results in an error message:

java.lang.UnsupportedOperationException: Schema inference is not supported for format: csv. Please specify the schema.

It seems that when using csv, a pre-defined schema is required. I attempted with my personal databricks data and had to create a schema first, then add that schema to my stream:

schema = StructType([
  StructField("Test1",StringType(),True),
  StructField("Test2",StringType(),True),
  StructField("Test3",StringType(),True)])
 
spark.readStream
                  .format("cloudFiles")
                  .option("cloudFiles.format", source_format)
                  .option("header", "True")
                  .schema(schema)
                  .load(data_source)

Is this the best solution for this error or is there a way for autoloader to get the schema as shown in the solution to the Databricks lab?

Dave_Nithio · ‎10-12-2022

After a bit more research, it looks like I was using a cluster with an outdated DBR. I updated to 11.1 and no longer received the error

View solution in original post

Dave_Nithio · ‎10-12-2022

After a bit more research, it looks like I was using a cluster with an outdated DBR. I updated to 11.1 and no longer received the error

Hubert-Dudek · ‎10-16-2022

Yes recently it was improved 🙂

My blog: https://databrickster.medium.com/

Anonymous · ‎10-12-2022

As a small aside, you don't need the third argument in the structfields

Databricks Community

Data Engineering with Databricks Module 6.3L Error: Autoload CSV

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 28 – December 04, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐