cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Creating a Api links by url & list from a saved df

KayCon86
New Contributor

I have 106,000 + api's I need to call, so instead of calling them one by one I would like to create a loop as I have the list of location Id's which I've called from there api locations list and these will sit at the end of the url to get more info on each location as the location list is limited.

e.g I want it to bring back all 106,000 api links from the 'IdColumn' from my loaded list

www.apilink/24582

www.apilink/24563 ....

Please see code below if anyone could help it would be so helpful.

from pyspark.sql.types import StructField, StructType, StringType, DataType, Row

Idlist = spark.read.load("loadedfile.paquet")

locid = Idlist.select('IdColumn')

LookUppy = str('https://apilink/locations/') + str(Idlist['IdColumn'])

print(LookUppy)

I get this as the output =

https://apilink/locations/<'ldColumn'>;

3 REPLIES 3

daniel_sahal
Esteemed Contributor

@Kay Connollyโ€‹ 

Please check the below example:

data = [{"ID": 1},
        {"ID": 2},
        {"ID": 3},
        {"ID": 4}
        ]
df = spark.createDataFrame(data)
 
for row in df.rdd.collect():
    print("https://apilink/locations/"+str(row["ID"]))

image

Anonymous
Not applicable

Hi @Kay Connollyโ€‹ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

Anonymous
Not applicable

@Kay Connollyโ€‹ :

It looks like you are trying to concatenate a string with a column object, which is causing the error. You need to convert the column object to a string first before concatenating it to the URL. Here's a modified code snippet that should work:

from pyspark.sql.functions import concat_ws
 
Idlist = spark.read.load("loadedfile.paquet")
locid = Idlist.select('IdColumn')
 
# Convert the IdColumn to string and concatenate with the URL
lookup_urls = locid.withColumn('url', concat_ws('', 'https://apilink/locations/', locid.IdColumn.cast('string')))
 
# Show the resulting URLs
lookup_urls.show()

This should create a new column called url that contains the complete API links for each location ID in your dataframe. You can then use this column to make the API calls in a loop.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group