<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: how to parallel n number of process in databricks in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/how-to-parallel-n-number-of-process-in-databricks/m-p/128840#M3896</link>
    <description>&lt;P&gt;kindly share any example for the autoloader which will call txt n time with python n times&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 19 Aug 2025 11:21:56 GMT</pubDate>
    <dc:creator>jitenjha11</dc:creator>
    <dc:date>2025-08-19T11:21:56Z</dc:date>
    <item>
      <title>how to parallel n number of process in databricks</title>
      <link>https://community.databricks.com/t5/administration-architecture/how-to-parallel-n-number-of-process-in-databricks/m-p/128815#M3893</link>
      <description>&lt;P&gt;Requirement: I have a volume in which random txt file coming from MQ with random numbers. In my workspace I have python script. Also, i have created job which, when new file will come in volume it will trigger automatically.&amp;nbsp;&lt;/P&gt;&lt;P&gt;My requirement is, I need some thing in middle which will run or execute n number of times when n number of files will come in volume with n number of python script, meaning python script is only one but it should call n number of times with n number of files. I do not want python scrpit as a multithreading or multiprocess to do this work is there any other way to do it.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am attaching flow chart of understading my requirement.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 19 Aug 2025 07:32:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/how-to-parallel-n-number-of-process-in-databricks/m-p/128815#M3893</guid>
      <dc:creator>jitenjha11</dc:creator>
      <dc:date>2025-08-19T07:32:07Z</dc:date>
    </item>
    <item>
      <title>Re: how to parallel n number of process in databricks</title>
      <link>https://community.databricks.com/t5/administration-architecture/how-to-parallel-n-number-of-process-in-databricks/m-p/128821#M3894</link>
      <description>&lt;P&gt;HI&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/170182"&gt;@jitenjha11&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;You have a couple of options to handle this scenario:&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Batch Processing:&lt;/STRONG&gt;&lt;BR /&gt;Once the n number of text files arrive in the volume, you can read them in batches, process the required data, and then move the processed files to an archive directory.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Iterative Processing:&lt;/STRONG&gt;&lt;BR /&gt;Alternatively, you can loop through the directory in volume using &lt;STRONG&gt;dbutils&lt;/STRONG&gt;&amp;nbsp;commands, read each text file one by one, and process them sequentially.&lt;/P&gt;&lt;P&gt;However, I’d recommend using &lt;STRONG&gt;AutoLoader&lt;/STRONG&gt;&amp;nbsp;here. It’s more reliable since it automatically handles file detection and provides fault tolerance using checkpointing. Auto Loader reads files in micro-batches, ensuring that each file is processed exactly once. On top of this batch, you can run your script — for example, by using the &lt;STRONG&gt;foreachBatch()&lt;/STRONG&gt; function.&lt;/P&gt;&lt;P&gt;Questions, let me know.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Aug 2025 08:10:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/how-to-parallel-n-number-of-process-in-databricks/m-p/128821#M3894</guid>
      <dc:creator>MujtabaNoori</dc:creator>
      <dc:date>2025-08-19T08:10:36Z</dc:date>
    </item>
    <item>
      <title>Re: how to parallel n number of process in databricks</title>
      <link>https://community.databricks.com/t5/administration-architecture/how-to-parallel-n-number-of-process-in-databricks/m-p/128840#M3896</link>
      <description>&lt;P&gt;kindly share any example for the autoloader which will call txt n time with python n times&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 19 Aug 2025 11:21:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/how-to-parallel-n-number-of-process-in-databricks/m-p/128840#M3896</guid>
      <dc:creator>jitenjha11</dc:creator>
      <dc:date>2025-08-19T11:21:56Z</dc:date>
    </item>
    <item>
      <title>Re: how to parallel n number of process in databricks</title>
      <link>https://community.databricks.com/t5/administration-architecture/how-to-parallel-n-number-of-process-in-databricks/m-p/129081#M3926</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/170182"&gt;@jitenjha11&lt;/a&gt;&amp;nbsp; : You can do it same manner they way it has highlighted by &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/132810"&gt;@MujtabaNoori&lt;/a&gt;&amp;nbsp;but you have to call the process process twice.&lt;BR /&gt;Sharing the sample reference code below :&lt;/P&gt;&lt;P&gt;Iterating through the files in each directory.&amp;nbsp;&lt;/P&gt;&lt;P&gt;for directory in directories:&lt;BR /&gt;# Read files using Autoloader&lt;BR /&gt;df = spark.readStream.format("cloudFiles") \&lt;BR /&gt;.option("cloudFiles.format", "csv") \&lt;BR /&gt;.load(directory)&lt;/P&gt;&lt;P&gt;# # Process the data (e.g., write to Delta table)&lt;BR /&gt;df.writeStream.format("delta") \&lt;BR /&gt;.option("checkpointLocation", f"&amp;lt;&amp;lt;Location&amp;gt;&amp;gt;") \&lt;BR /&gt;.start(f"/mnt/delta/&amp;lt;&amp;lt;location&amp;gt;&amp;gt;")&lt;/P&gt;&lt;P&gt;2nd Process&amp;nbsp;&lt;BR /&gt;directories = ["/mnt/data/src1", "/mnt/data/src2"]&lt;/P&gt;&lt;P&gt;for directory in directories:&lt;BR /&gt;# Call external Python script with arguments&lt;BR /&gt;subprocess.run(["python", "process_data.py", directory])&lt;/P&gt;&lt;P&gt;I would request you to use the workflow which provide you the flexibility to run the process in for each loop and when new files arrived you pass the new file name as parameter and call the second notebook.&amp;nbsp;&lt;BR /&gt;Please go through the below link this might help.&amp;nbsp;&lt;BR /&gt;&lt;A href="https://medium.com/@luijk.r/for-each-in-databricks-workflows-f1f1af3d2417" target="_blank"&gt;For Each In Databricks Workflows. One For Each, Each For All! | by René Luijk | Medium&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Aug 2025 07:35:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/how-to-parallel-n-number-of-process-in-databricks/m-p/129081#M3926</guid>
      <dc:creator>BR_DatabricksAI</dc:creator>
      <dc:date>2025-08-21T07:35:32Z</dc:date>
    </item>
  </channel>
</rss>

