Databricks Community

vida · 06-17-2015

It's taking 10-12 minutes - can I make it faster?

vida · 09-29-2021

@Nick Studenski , Can you try declaring the un and pw variables outside the scope of for each partition? Do it before, so that way you are just passing a variable into that function rather than the dbutils object.

vida · 04-12-2016

Got it - how about using a UnionAll? I believe this code snippet does what you'd want:from pyspark.sql import Row array = [Row(value=1), Row(value=2), Row(value=3)] df = sqlContext.createDataFrame(sc.parallelize(array)) array2 = [Row(value=4), Ro...

vida · 04-11-2016

@Deepak Chokkadi %run doesn't take a dbfs path - it takes the path to the notebook from the workspace.

vida · 04-08-2016

1) Use sc.parallelize to create the table. 2) Register just a temporary table. 3) You can keep adding insert statements into this table. Note that Spark SQL supports inserting from other tables. So again, you might need to create temporary tables to...

vida · 04-01-2016

You can use python libraries in Spark. I suggest using fuzzy-wuzzy for computing the similarities. Then you just need to join the client list with the internal dataset. If you wanted to make sure you tried every single client list against the intern...

Databricks Community

User Stats

User Activity

My Spark SQL join is very slow - what can I do to speed it up?

Re: How do I handle a task not serializable exception?

Re: Create a in-memory table in Spark and insert data into it

Re: How to import local python file in notebook?

Re: Create a in-memory table in Spark and insert data into it

Re: Fuzzy text matching in Spark