Hello @ChristianRRL ,
No, read_files is not a native Spark function โ itโs a Databricks SQL wrapper that allows you to read files easily using SQL syntax.
The main advantage is that it adds several Databricks-specific capabilities on top of Sparkโs basic file reader, such as schema inference, schema hints, rescued data handling, and partition discovery.
For example:
SELECT * FROM read_files(
's3://my-bucket/path/',
format => 'json',
schemaHints => 'user_id STRING, event_time TIMESTAMP'
);
is "equals" to:
spark.read.format("json").load("s3://my-bucket/path/")
but with the extra Databricks logic for schema management and ingestion governance.
Regarding schemaHints, it works the same way as in Auto Loader โ it lets you override or enforce specific column types while leaving the rest of the schema inferred automatically. Docs
While spark.read in open-source Spark only allows you to fully define a schema or infer it entirely Docs, Databricks added schemaHints in this built-in function inside his DBR, you can override or enforce specific column types while letting the rest of the schema be inferred automatically.
Hope this helps, ๐
Isi