@Nick Studenski​ , Can you try declaring the un and pw variables outside the scope of for each partition? Do it before, so that way you are just passing a variable into that function rather than the dbutils object.
Got it - how about using a UnionAll? I believe this code snippet does what you'd want:from pyspark.sql import Row
array = [Row(value=1), Row(value=2), Row(value=3)] df = sqlContext.createDataFrame(sc.parallelize(array))
array2 = [Row(value=4), Ro...
1) Use sc.parallelize to create the table.
2) Register just a temporary table.
3) You can keep adding insert statements into this table. Note that Spark SQL supports inserting from other tables. So again, you might need to create temporary tables to...
You can use python libraries in Spark. I suggest using fuzzy-wuzzy for computing the similarities.
Then you just need to join the client list with the internal dataset. If you wanted to make sure you tried every single client list against the intern...