- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-10-2026 04:22 AM
You can read a JSON file into relational columns in Databricks by explicitly binding a schema and choosing the right parsing mode.
PySpark (recommended when you already have a schema JSON)
import json
from pyspark.sql.types import StructType
schema_json = json.loads(dbutils.fs.head(
"file:/Workspace/path/receipt.schema.json", 1000000))
schema = StructType.fromJson(schema_json)
df = (spark.read
.schema(schema)
.option("multiLine", "true") # if JSON is multi-line
.option("mode", "PERMISSIVE")
.option("columnNameOfCorruptRecord", "_corrupt_record")
.json("file:/Workspace/path/receipt.json"))
display(df)Error handling
PERMISSIVE → keeps rows, bad records go to _corrupt_record
DROPMALFORMED → drops bad rows
FAILFAST → fails on first error
Databricks SQL
SELECT
from_json(content,
schema_of_json(
(SELECT content
FROM read_files('file:/Workspace/path/receipt.schema.json', format => 'text')
)
) AS parsed
FROM read_files('file:/Workspace/path/receipt.json', format => 'text');Then select fields from parsed to get relational columns.
Notes
Use file:/Workspace/... paths for workspace files (best for small dev/test data).
For production, use DBFS Volumes or cloud storage.
Docs
Workspace files: https://docs.databricks.com/files/workspace-interact.html
JSON parsing & error modes: https://docs.databricks.com/sql/language-manual/functions/from_json.html
read_files() (SQL): https://docs.databricks.com/sql/language-manual/functions/read_files.html