<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: I want to use databricks workers to run a function in parallel on the worker nodes in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/12242#M7081</link>
    <description>&lt;P&gt;You guys are not getting the point, I am making API calls in a function and want to store the results in a dataframe. I want multiple processes to run this task in parallel. &lt;/P&gt;&lt;P&gt;How do I create a UDF and use it in a dataframe when the task is calling an API repeatedly and storing the JSON payload in BLOB storage? The examples you gave me are for making calculations etc. Please advise ASAP.&lt;/P&gt;</description>
    <pubDate>Mon, 01 Nov 2021 13:49:53 GMT</pubDate>
    <dc:creator>HamzaJosh</dc:creator>
    <dc:date>2021-11-01T13:49:53Z</dc:date>
    <item>
      <title>I want to use databricks workers to run a function in parallel on the worker nodes</title>
      <link>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/12236#M7075</link>
      <description>&lt;P&gt;I have a function making api calls. I want to run this function in parallel so I can use the workers in databricks clusters to run it in parallel. I have tried &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;with ThreadPoolExecutor() as executor:&lt;/P&gt;&lt;P&gt;&amp;nbsp;results = executor.map(getspeeddata, alist)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;to run my function but this does not make use of the workers and runs everything on the driver. How do I make my function run in parallel?&lt;/P&gt;</description>
      <pubDate>Wed, 27 Oct 2021 22:27:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/12236#M7075</guid>
      <dc:creator>HamzaJosh</dc:creator>
      <dc:date>2021-10-27T22:27:38Z</dc:date>
    </item>
    <item>
      <title>Re: I want to use databricks workers to run a function in parallel on the worker nodes</title>
      <link>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/12238#M7077</link>
      <description>&lt;P&gt;Hi please make UDF (user defined function) and than tun it directly from dataframe.&lt;/P&gt;&lt;P&gt;I have dataframe with params url and udf load responses to new column as Sturctured Object and than is flatted.&lt;/P&gt;</description>
      <pubDate>Thu, 28 Oct 2021 08:52:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/12238#M7077</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2021-10-28T08:52:36Z</dc:date>
    </item>
    <item>
      <title>Re: I want to use databricks workers to run a function in parallel on the worker nodes</title>
      <link>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/12239#M7078</link>
      <description>&lt;P&gt;you want to make sure the Spark framework is used, and not just plain python/scala.&lt;/P&gt;&lt;P&gt;So a UDF is the way to go.&lt;/P&gt;</description>
      <pubDate>Thu, 28 Oct 2021 08:59:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/12239#M7078</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-10-28T08:59:43Z</dc:date>
    </item>
    <item>
      <title>Re: I want to use databricks workers to run a function in parallel on the worker nodes</title>
      <link>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/12240#M7079</link>
      <description>&lt;P&gt;Thanks Hubert and werners for responding. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please give me some urls which shows how I can create UDF's.  Do i still need to use threadpool? How do I make it run in parallel after using a UDF? &lt;/P&gt;&lt;P&gt;I am newbie and need more than just create a UDF. Please help&lt;/P&gt;</description>
      <pubDate>Thu, 28 Oct 2021 13:44:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/12240#M7079</guid>
      <dc:creator>HamzaJosh</dc:creator>
      <dc:date>2021-10-28T13:44:51Z</dc:date>
    </item>
    <item>
      <title>Re: I want to use databricks workers to run a function in parallel on the worker nodes</title>
      <link>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/12241#M7080</link>
      <description>&lt;P&gt;Hi @Hamza Josh​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here are some links that might be able to help you to undertand better how to create an run UDFs&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;UDFs in python &lt;A href="https://docs.databricks.com/spark/latest/spark-sql/udf-python.html#user-defined-functions---python" alt="https://docs.databricks.com/spark/latest/spark-sql/udf-python.html#user-defined-functions---python" target="_blank"&gt;here&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;Pandas UDFs &lt;A href="https://docs.databricks.com/spark/latest/spark-sql/pandas-function-apis.html#pandas-function-apis" alt="https://docs.databricks.com/spark/latest/spark-sql/pandas-function-apis.html#pandas-function-apis" target="_blank"&gt;here&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;More docs &lt;A href="https://databricks.com/blog/2020/05/20/new-pandas-udfs-and-python-type-hints-in-the-upcoming-release-of-apache-spark-3-0.html" alt="https://databricks.com/blog/2020/05/20/new-pandas-udfs-and-python-type-hints-in-the-upcoming-release-of-apache-spark-3-0.html" target="_blank"&gt;here&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Sat, 30 Oct 2021 00:12:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/12241#M7080</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-10-30T00:12:20Z</dc:date>
    </item>
    <item>
      <title>Re: I want to use databricks workers to run a function in parallel on the worker nodes</title>
      <link>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/12242#M7081</link>
      <description>&lt;P&gt;You guys are not getting the point, I am making API calls in a function and want to store the results in a dataframe. I want multiple processes to run this task in parallel. &lt;/P&gt;&lt;P&gt;How do I create a UDF and use it in a dataframe when the task is calling an API repeatedly and storing the JSON payload in BLOB storage? The examples you gave me are for making calculations etc. Please advise ASAP.&lt;/P&gt;</description>
      <pubDate>Mon, 01 Nov 2021 13:49:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/12242#M7081</guid>
      <dc:creator>HamzaJosh</dc:creator>
      <dc:date>2021-11-01T13:49:53Z</dc:date>
    </item>
    <item>
      <title>Re: I want to use databricks workers to run a function in parallel on the worker nodes</title>
      <link>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/12243#M7082</link>
      <description>&lt;P&gt;I think we do get the point. But the thing is:&lt;/P&gt;&lt;P&gt;if you want to distribute the work to the workers, you have to use the spark framework.&lt;/P&gt;&lt;P&gt;So a UDF is the way to go (as UDF's are part of Spark).&lt;/P&gt;&lt;P&gt;Plain python code will only execute on the driver.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also, Spark is lazy evaluated, meaning data is only queried/written when you apply an action.&lt;/P&gt;&lt;P&gt;That is pretty important.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So in the end you will have to create a UDF.&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/jamesshocking/Spark-REST-API-UDF-Scala" alt="https://github.com/jamesshocking/Spark-REST-API-UDF-Scala" target="_blank"&gt;https://github.com/jamesshocking/Spark-REST-API-UDF-Scala&lt;/A&gt; is an example in Scala, but the same principles apply to pyspark.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Nov 2021 08:33:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/12243#M7082</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2021-11-02T08:33:43Z</dc:date>
    </item>
    <item>
      <title>Re: I want to use databricks workers to run a function in parallel on the worker nodes</title>
      <link>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/153466#M53962</link>
      <description>&lt;P&gt;Hi Hubert,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have the same problem. We are calling 40-50 different api's running sequentially. Now, after creating udf and dataframe with url and udf, how to pass credentials username and password.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Do we need to broadcast credentials so they are available at every worker?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 05 Apr 2026 18:02:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/i-want-to-use-databricks-workers-to-run-a-function-in-parallel/m-p/153466#M53962</guid>
      <dc:creator>mordex</dc:creator>
      <dc:date>2026-04-05T18:02:46Z</dc:date>
    </item>
  </channel>
</rss>

