Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
No, that's the location of the schema hints (which work together with schema inference). Specifying a schema location does not turn off schema inference as I wanted. In fact schemaLocation is a required option _unless_ the schema is passed explicitly as I showed.
you can enforce the schema or use the "cloudFiles.schemaHints" to override the Inference.
df=spark.readStream.format("cloudFiles") \
.option("cloudFiles.format","csv") \
.option("header","true") \
.option("rescuedDataColumn","_rescued_data") \ # makes sure that you don't lose data.schema(<schema>) \ # provide a schema here for the files.load(<path>
Join Us as a Local Community Builder!
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!