cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Creating a Api links by url & list from a saved df

KayCon86
New Contributor

I have 106,000 + api's I need to call, so instead of calling them one by one I would like to create a loop as I have the list of location Id's which I've called from there api locations list and these will sit at the end of the url to get more info on each location as the location list is limited.

e.g I want it to bring back all 106,000 api links from the 'IdColumn' from my loaded list

www.apilink/24582

www.apilink/24563 ....

Please see code below if anyone could help it would be so helpful.

from pyspark.sql.types import StructField, StructType, StringType, DataType, Row

Idlist = spark.read.load("loadedfile.paquet")

locid = Idlist.select('IdColumn')

LookUppy = str('https://apilink/locations/') + str(Idlist['IdColumn'])

print(LookUppy)

I get this as the output =

https://apilink/locations/<'ldColumn'>;

3 REPLIES 3

daniel_sahal
Esteemed Contributor

@Kay Connolly​ 

Please check the below example:

data = [{"ID": 1},
        {"ID": 2},
        {"ID": 3},
        {"ID": 4}
        ]
df = spark.createDataFrame(data)
 
for row in df.rdd.collect():
    print("https://apilink/locations/"+str(row["ID"]))

image

Anonymous
Not applicable

Hi @Kay Connolly​ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

Anonymous
Not applicable

@Kay Connolly​ :

It looks like you are trying to concatenate a string with a column object, which is causing the error. You need to convert the column object to a string first before concatenating it to the URL. Here's a modified code snippet that should work:

from pyspark.sql.functions import concat_ws
 
Idlist = spark.read.load("loadedfile.paquet")
locid = Idlist.select('IdColumn')
 
# Convert the IdColumn to string and concatenate with the URL
lookup_urls = locid.withColumn('url', concat_ws('', 'https://apilink/locations/', locid.IdColumn.cast('string')))
 
# Show the resulting URLs
lookup_urls.show()

This should create a new column called url that contains the complete API links for each location ID in your dataframe. You can then use this column to make the API calls in a loop.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.