Re: Reading JSON file to columns as relational ?

bianca_unifeye · ‎02-10-2026

You can read a JSON file into relational columns in Databricks by explicitly binding a schema and choosing the right parsing mode.

PySpark (recommended when you already have a schema JSON)

import json
from pyspark.sql.types import StructType

schema_json = json.loads(dbutils.fs.head(
    "file:/Workspace/path/receipt.schema.json", 1000000))
schema = StructType.fromJson(schema_json)

df = (spark.read
      .schema(schema)
      .option("multiLine", "true")  # if JSON is multi-line
      .option("mode", "PERMISSIVE")
      .option("columnNameOfCorruptRecord", "_corrupt_record")
      .json("file:/Workspace/path/receipt.json"))

display(df)

Error handling

PERMISSIVE → keeps rows, bad records go to _corrupt_record
DROPMALFORMED → drops bad rows
FAILFAST → fails on first error

Databricks SQL

SELECT
  from_json(content,
    schema_of_json(
      (SELECT content
       FROM read_files('file:/Workspace/path/receipt.schema.json', format => 'text')
      )
  ) AS parsed
FROM read_files('file:/Workspace/path/receipt.json', format => 'text');

Then select fields from parsed to get relational columns.

Notes

Use file:/Workspace/... paths for workspace files (best for small dev/test data).
For production, use DBFS Volumes or cloud storage.

Docs

Workspace files: https://docs.databricks.com/files/workspace-interact.html
JSON parsing & error modes: https://docs.databricks.com/sql/language-manual/functions/from_json.html
read_files() (SQL): https://docs.databricks.com/sql/language-manual/functions/read_files.html