<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Rename the file in Databricks is so hard.How to make it simpler in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/rename-the-file-in-databricks-is-so-hard-how-to-make-it-simpler/m-p/71948#M34443</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106383"&gt;@Philospher1425&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="du-bois-dark-typography css-eyq5p0"&gt;Allow me to clarify that dbutils.fs serves as an interface to submit commands to your cloud provider storage. As such, the speed of copy operations is determined by the cloud provider and is beyond Databricks' control.&lt;/DIV&gt;
&lt;DIV class="du-bois-dark-typography css-eyq5p0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="du-bois-dark-typography css-eyq5p0"&gt;That being said, you may find that using dbutils.fs.mv results in a faster process, as it is a move operation rather than a copy operation. However, please note that this is not a Databricks-specific issue, but rather a characteristic of the filesystem.&lt;/DIV&gt;
&lt;DIV class="du-bois-dark-typography css-eyq5p0"&gt;&amp;nbsp;&lt;/DIV&gt;</description>
    <pubDate>Thu, 06 Jun 2024 19:48:02 GMT</pubDate>
    <dc:creator>raphaelblg</dc:creator>
    <dc:date>2024-06-06T19:48:02Z</dc:date>
    <item>
      <title>Rename the file in Databricks is so hard.How to make it simpler</title>
      <link>https://community.databricks.com/t5/data-engineering/rename-the-file-in-databricks-is-so-hard-how-to-make-it-simpler/m-p/71881#M34424</link>
      <description>&lt;P&gt;Hi Community&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Actually my requirement is simple , I need to drop the files into Azure data Lake gen 2 storage from Databricks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But When I use&amp;nbsp;&lt;/P&gt;&lt;P&gt;df.coalesce(1).write.csv("url to gen 2/stage/)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It's creating part .CSV file . But I need to rename to a custom name.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have gone through work around using&lt;/P&gt;&lt;P&gt;dbutils.fs.cp()&lt;/P&gt;&lt;P&gt;It worked , but I have a thousands batch files to transfer like that with fustome name, So everytime it's creating a new job when zi do that .cp() operation and taking lots of time&amp;nbsp;&lt;/P&gt;&lt;P&gt;Compared to direct push as part.csv.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there any work around. And I cant use other libs like Pandas or some others .Am not allowed .&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please Help me.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Jun 2024 11:39:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rename-the-file-in-databricks-is-so-hard-how-to-make-it-simpler/m-p/71881#M34424</guid>
      <dc:creator>Philospher1425</dc:creator>
      <dc:date>2024-06-06T11:39:24Z</dc:date>
    </item>
    <item>
      <title>Re: Rename the file in Databricks is so hard.How to make it simpler</title>
      <link>https://community.databricks.com/t5/data-engineering/rename-the-file-in-databricks-is-so-hard-how-to-make-it-simpler/m-p/71948#M34443</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106383"&gt;@Philospher1425&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="du-bois-dark-typography css-eyq5p0"&gt;Allow me to clarify that dbutils.fs serves as an interface to submit commands to your cloud provider storage. As such, the speed of copy operations is determined by the cloud provider and is beyond Databricks' control.&lt;/DIV&gt;
&lt;DIV class="du-bois-dark-typography css-eyq5p0"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;DIV class="du-bois-dark-typography css-eyq5p0"&gt;That being said, you may find that using dbutils.fs.mv results in a faster process, as it is a move operation rather than a copy operation. However, please note that this is not a Databricks-specific issue, but rather a characteristic of the filesystem.&lt;/DIV&gt;
&lt;DIV class="du-bois-dark-typography css-eyq5p0"&gt;&amp;nbsp;&lt;/DIV&gt;</description>
      <pubDate>Thu, 06 Jun 2024 19:48:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rename-the-file-in-databricks-is-so-hard-how-to-make-it-simpler/m-p/71948#M34443</guid>
      <dc:creator>raphaelblg</dc:creator>
      <dc:date>2024-06-06T19:48:02Z</dc:date>
    </item>
    <item>
      <title>Re: Rename the file in Databricks is so hard.How to make it simpler</title>
      <link>https://community.databricks.com/t5/data-engineering/rename-the-file-in-databricks-is-so-hard-how-to-make-it-simpler/m-p/71950#M34444</link>
      <description>&lt;P&gt;That's the reason why I asked for alternative workaround, I have tried my, anyway it doesn't reduce no of jobs.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Spark can add this tiny thing like , when we write like dr.write(/filename.csv) it should write with the given filename instead of creating the folder . I know this is very silly why it has not been done till today. I just an alternative, (without file operations please), as they add up the time. If it's not possible, just leave it . I will move on.&lt;/P&gt;</description>
      <pubDate>Thu, 06 Jun 2024 19:55:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rename-the-file-in-databricks-is-so-hard-how-to-make-it-simpler/m-p/71950#M34444</guid>
      <dc:creator>Philospher1425</dc:creator>
      <dc:date>2024-06-06T19:55:57Z</dc:date>
    </item>
    <item>
      <title>Re: Rename the file in Databricks is so hard.How to make it simpler</title>
      <link>https://community.databricks.com/t5/data-engineering/rename-the-file-in-databricks-is-so-hard-how-to-make-it-simpler/m-p/71951#M34445</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106383"&gt;@Philospher1425&lt;/a&gt;,&lt;BR /&gt;&lt;BR /&gt;The problem is, in order to generate a single .csv file you have to coalesce your dataset to one partition and lose all parallelism that spark provides. While this might work for small datasets, such pattern will certainly lead to memory issues on larger datasets.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;If you think that the pattern you described is a good and valid idea, please submit your idea to&amp;nbsp;&lt;A href="https://github.com/apache/spark" target="_blank"&gt;https://github.com/apache/spark&amp;nbsp;&lt;/A&gt;or&amp;nbsp;&lt;A class="reference external" href="https://ideas.databricks.com/?_ga=2.211524176.491101347.1717700898-704541007.1707435477" target="_blank" rel="noopener"&gt;Databricks Ideas Portal.&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Jun 2024 20:10:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rename-the-file-in-databricks-is-so-hard-how-to-make-it-simpler/m-p/71951#M34445</guid>
      <dc:creator>raphaelblg</dc:creator>
      <dc:date>2024-06-06T20:10:40Z</dc:date>
    </item>
    <item>
      <title>Re: Rename the file in Databricks is so hard.How to make it simpler</title>
      <link>https://community.databricks.com/t5/data-engineering/rename-the-file-in-databricks-is-so-hard-how-to-make-it-simpler/m-p/72810#M34611</link>
      <description>&lt;P&gt;Ypp, totally agree with you&amp;nbsp;&lt;SPAN&gt;dbutils.fs.mv is much faster and is the best way to rename files.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jun 2024 15:36:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/rename-the-file-in-databricks-is-so-hard-how-to-make-it-simpler/m-p/72810#M34611</guid>
      <dc:creator>Shivanshu_</dc:creator>
      <dc:date>2024-06-12T15:36:06Z</dc:date>
    </item>
  </channel>
</rss>

