Databrick select from web address that returns JSON
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-25-2025 06:47 PM - edited 02-25-2025 06:49 PM
%sql
select * from json.`http://ergast.com/api/f1/2004/1/results.json`
`- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-25-2025 09:33 PM
Hi @ozmike
You could do this as example with wget on shell:
%sh
wget -P /Volumes/demo/raw/files/ergast http://ergast.com/api/f1/2004/1/results.json
Then you can access the json file in a volume.
The better way is to use python:
%pip install wget
import wget
url = "http://ergast.com/api/f1/2004/1/results.json"
destination_path = "/Volumes/demo/raw/files/my_results.json"
wget.download(url, out=destination_path)
Then you see the content again in the volume:
Read the json content into a dataframe:
df = spark.read.json(destination_path)
display(df)
You may want to flatten the strucure:
from pyspark.sql.functions import col, explode
df_flat = df.select(
col("MRData.RaceTable.Races.season").alias("season"),
col("MRData.RaceTable.Races.round").alias("round"),
col("MRData.RaceTable.Races.url").alias("race_url"),
col("MRData.RaceTable.Races.raceName").alias("race_name"),
col("MRData.RaceTable.Races.Circuit.circuitId").alias("circuit_id"),
col("MRData.RaceTable.Races.Circuit.url").alias("circuit_url"),
col("MRData.RaceTable.Races.Circuit.circuitName").alias("circuit_name"),
col("MRData.RaceTable.Races.Circuit.Location.lat").alias("circuit_lat"),
col("MRData.RaceTable.Races.Circuit.Location.long").alias("circuit_long"),
col("MRData.RaceTable.Races.Circuit.Location.locality").alias("circuit_locality"),
col("MRData.RaceTable.Races.Circuit.Location.country").alias("circuit_country"),
explode("MRData.RaceTable.Races.Results").alias("result")
).select(
col("season"),
col("round"),
col("race_url"),
col("race_name"),
col("circuit_id"),
col("circuit_url"),
col("circuit_name"),
col("circuit_lat"),
col("circuit_long"),
col("circuit_locality"),
col("circuit_country"),
col("result.number").alias("driver_number"),
col("result.position").alias("driver_position"),
col("result.positionText").alias("driver_position_text"),
col("result.points").alias("driver_points"),
col("result.Driver.driverId").alias("driver_id"),
col("result.Driver.permanentNumber").alias("driver_permanent_number"),
col("result.Driver.code").alias("driver_code"),
col("result.Driver.url").alias("driver_url"),
col("result.Driver.givenName").alias("driver_given_name"),
col("result.Driver.familyName").alias("driver_family_name"),
col("result.Driver.dateOfBirth").alias("driver_dob"),
col("result.Driver.nationality").alias("driver_nationality"),
col("result.Constructor.constructorId").alias("constructor_id"),
col("result.Constructor.url").alias("constructor_url"),
col("result.Constructor.name").alias("constructor_name"),
col("result.Constructor.nationality").alias("constructor_nationality"),
col("result.grid").alias("grid_position"),
col("result.laps").alias("laps"),
col("result.status").alias("status")
)
display(df_flat)
May, you adapt the script and dynamically flatten the structure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-25-2025 09:42 PM
Hi Thanks
Can you do a select from this volume you have created ( from SQL not PY )?
SELECT * FROM JSON.'volume:xyz'
Also how do you get shell access from the GUI ? or - is this an admin thing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-25-2025 09:52 PM
yes, you could use SQL to read the data from the volume:
%sql
SELECT * FROM json.`/Volumes/demo/raw/files/ergast/my_results.json`
You still have to flatten the result.
The thing with the shell was just an example. Wouldn't recommend that, just use python

