cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

DievanB
by New Contributor
  • 1231 Views
  • 1 replies
  • 0 kudos

pyspark: How to run selenium in UDF

Hi all, I am building a webscraper to get prices of certain EAN's from the amazon website. Therefore I use selenium to get the product links. I wrote te following function to get the productlinks based on a EAN: def getProductLinkAmazonPY(EAN): st...

  • 1231 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

UDF functions are serialized and then executed on executors. I don't think it will be possible with Selenium.

  • 0 kudos
dtr
by New Contributor
  • 5077 Views
  • 1 replies
  • 0 kudos

PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers.

I am trying to write a function in Azure databricks. I would like to spark.sql inside the function. But it looks like I cannot use it with worker nodes. def SEL_ID(value, index): # some processing on value here ans = spark.sql("SELECT id FRO...

  • 5077 Views
  • 1 replies
  • 0 kudos
Latest Reply
MartinhoAzevedo
New Contributor II
  • 0 kudos

Hi there. i guess im a bit late but do you remember how and if you fixed this issue? im getting the same exact problem. @dtr

  • 0 kudos
Labels