Databricks Community

ozmike · ‎02-25-2025

Hi I'm in a data bricks notebook and want to select from a web site that returns json.

For example this web site http://ergast.com/api/f1/2004/1/results.json

will return some JSON. (example only)

Can i do the following or Do you need to use python. (please give an example of py code if so)

Will DB notebooks allow you to do this with out opening anything up.

%sql
select * from json.`http://ergast.com/api/f1/2004/1/results.json`

`

Stefan-Koch · ‎02-25-2025

Hi @ozmike

You could do this as example with wget on shell:

%sh
wget -P /Volumes/demo/raw/files/ergast http://ergast.com/api/f1/2004/1/results.json

Then you can access the json file in a volume.

The better way is to use python:

%pip install wget

import wget

url = "http://ergast.com/api/f1/2004/1/results.json"
destination_path = "/Volumes/demo/raw/files/my_results.json"
wget.download(url, out=destination_path)

Then you see the content again in the volume:

Read the json content into a dataframe:

df = spark.read.json(destination_path)
display(df)

You may want to flatten the strucure:

from pyspark.sql.functions import col, explode

df_flat = df.select(
    col("MRData.RaceTable.Races.season").alias("season"),
    col("MRData.RaceTable.Races.round").alias("round"),
    col("MRData.RaceTable.Races.url").alias("race_url"),
    col("MRData.RaceTable.Races.raceName").alias("race_name"),
    col("MRData.RaceTable.Races.Circuit.circuitId").alias("circuit_id"),
    col("MRData.RaceTable.Races.Circuit.url").alias("circuit_url"),
    col("MRData.RaceTable.Races.Circuit.circuitName").alias("circuit_name"),
    col("MRData.RaceTable.Races.Circuit.Location.lat").alias("circuit_lat"),
    col("MRData.RaceTable.Races.Circuit.Location.long").alias("circuit_long"),
    col("MRData.RaceTable.Races.Circuit.Location.locality").alias("circuit_locality"),
    col("MRData.RaceTable.Races.Circuit.Location.country").alias("circuit_country"),
    explode("MRData.RaceTable.Races.Results").alias("result")
).select(
    col("season"),
    col("round"),
    col("race_url"),
    col("race_name"),
    col("circuit_id"),
    col("circuit_url"),
    col("circuit_name"),
    col("circuit_lat"),
    col("circuit_long"),
    col("circuit_locality"),
    col("circuit_country"),
    col("result.number").alias("driver_number"),
    col("result.position").alias("driver_position"),
    col("result.positionText").alias("driver_position_text"),
    col("result.points").alias("driver_points"),
    col("result.Driver.driverId").alias("driver_id"),
    col("result.Driver.permanentNumber").alias("driver_permanent_number"),
    col("result.Driver.code").alias("driver_code"),
    col("result.Driver.url").alias("driver_url"),
    col("result.Driver.givenName").alias("driver_given_name"),
    col("result.Driver.familyName").alias("driver_family_name"),
    col("result.Driver.dateOfBirth").alias("driver_dob"),
    col("result.Driver.nationality").alias("driver_nationality"),
    col("result.Constructor.constructorId").alias("constructor_id"),
    col("result.Constructor.url").alias("constructor_url"),
    col("result.Constructor.name").alias("constructor_name"),
    col("result.Constructor.nationality").alias("constructor_nationality"),
    col("result.grid").alias("grid_position"),
    col("result.laps").alias("laps"),
    col("result.status").alias("status")
)

display(df_flat)

May, you adapt the script and dynamically flatten the structure.

ozmike · ‎02-25-2025

Hi Thanks
Can you do a select from this volume you have created ( from SQL not PY )?

SELECT * FROM JSON.'volume:xyz'

Also how do you get shell access from the GUI ? or - is this an admin thing.

Stefan-Koch · ‎02-25-2025

yes, you could use SQL to read the data from the volume:

%sql
SELECT * FROM json.`/Volumes/demo/raw/files/ergast/my_results.json`

You still have to flatten the result.

The thing with the shell was just an example. Wouldn't recommend that, just use python

Databricks Community

Databrick select from web address that returns JSON

Photos

Join Us as a Local Community Builder!

Business Intelligence in the Era of AI

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Databricks Community Champion - March 2025 - Takuya Omi

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.