<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: process mongo table to delta table databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/process-mongo-table-to-delta-table-databricks/m-p/125132#M47348</link>
    <description>&lt;P&gt;What if you do not update the delta table for each incoming microbatch but f.e. only do this every 15 min/hour/whatever.&lt;BR /&gt;Like that you can keep on ingesting in a streaming way, but the actual update towards the delta table is more batch approached so the overhead of the merge is less of an issue.&lt;/P&gt;</description>
    <pubDate>Mon, 14 Jul 2025 09:02:34 GMT</pubDate>
    <dc:creator>-werners-</dc:creator>
    <dc:date>2025-07-14T09:02:34Z</dc:date>
    <item>
      <title>process mongo table to delta table databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/process-mongo-table-to-delta-table-databricks/m-p/124872#M47283</link>
      <description>&lt;P&gt;Hello Guys,&lt;BR /&gt;&lt;BR /&gt;I have a table mongo which size is 67GB, I use streaming to ingest but is very slow to copying all data to Delta table.&amp;nbsp;&lt;BR /&gt;Someone have an answer to this?&amp;nbsp; I use connector mongodb V10.5&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;this is my code&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;pipeline_mongo_sec&lt;/SPAN&gt; &lt;SPAN class=""&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;[&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;    &lt;SPAN class=""&gt;{&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;        &lt;SPAN class=""&gt;"$unwind"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;"$data"&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;    &lt;SPAN class=""&gt;},&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;    &lt;SPAN class=""&gt;{&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;        &lt;SPAN class=""&gt;"$project"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;{&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;"_id"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt;&lt;SPAN class=""&gt;0&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;"point"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;{&lt;/SPAN&gt; &lt;/SPAN&gt;
&lt;SPAN class=""&gt;                &lt;SPAN class=""&gt;"$toUpper"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;"$point"&lt;/SPAN&gt; &lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;},&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;"since"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;{&lt;/SPAN&gt; &lt;/SPAN&gt;
&lt;SPAN class=""&gt;                &lt;SPAN class=""&gt;"$dateFromString"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;{&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;                    &lt;SPAN class=""&gt;"dateString"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;"$since"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;                    &lt;SPAN class=""&gt;"timezone"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;"UTC"&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;                &lt;SPAN class=""&gt;}&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;},&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;"date"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;{&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;                &lt;SPAN class=""&gt;"$dateFromString"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;{&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;                    &lt;SPAN class=""&gt;"dateString"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;"$data.date"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;                    &lt;SPAN class=""&gt;"timezone"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;"UTC"&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;                &lt;SPAN class=""&gt;}&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;},&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;"label"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;"$data.label"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;"measure_type"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;"$data.measure"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;"tariff"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;"$data.cost"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;"unit"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;"$data.unit"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;"value"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;{&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;                &lt;SPAN class=""&gt;"$toDecimal"&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt; &lt;SPAN class=""&gt;"$data.value"&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;}&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;        &lt;SPAN class=""&gt;}&lt;/SPAN&gt;&lt;/SPAN&gt;
            
&lt;SPAN class=""&gt;    &lt;SPAN class=""&gt;}&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;]&lt;/SPAN&gt;&lt;/SPAN&gt;

&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;self&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;spark&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;read&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;format&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"mongo"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;option&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"spark.mongodb.input.uri"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt; &lt;SPAN class=""&gt;self&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;mongo_uri&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;option&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"database"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt; &lt;SPAN class=""&gt;self&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;mongo_database&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;option&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"collection"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt; &lt;SPAN class=""&gt;self&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;collection&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;option&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"partitioner"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt; &lt;SPAN class=""&gt;"MongoSamplePartitioner"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;def&lt;/SPAN&gt; &lt;SPAN class=""&gt;_write_to_output&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;self&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt; &lt;SPAN class=""&gt;df&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt; &lt;SPAN class=""&gt;-&amp;gt;&lt;/SPAN&gt; &lt;SPAN class=""&gt;bool&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt;&lt;/SPAN&gt;

&lt;SPAN class=""&gt;        &lt;SPAN class=""&gt;try&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;if&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;len&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;self&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;partition_columns&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt; &lt;SPAN class=""&gt;&amp;gt;&lt;/SPAN&gt; &lt;SPAN class=""&gt;0&lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;                &lt;SPAN class=""&gt;print&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;f&lt;/SPAN&gt;&lt;SPAN class=""&gt;"Avec partitionning : &lt;/SPAN&gt;&lt;SPAN class=""&gt;{&lt;/SPAN&gt;&lt;SPAN class=""&gt;','&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;join&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;self&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;partition_columns&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;SPAN class=""&gt;}&lt;/SPAN&gt;&lt;SPAN class=""&gt;"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;                &lt;SPAN class=""&gt;df&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;write&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;partitionBy&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;*&lt;/SPAN&gt;&lt;SPAN class=""&gt;self&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;partition_columns&lt;/SPAN&gt;&lt;SPAN class=""&gt;).&lt;/SPAN&gt;&lt;SPAN class=""&gt;mode&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"overwrite"&lt;/SPAN&gt;&lt;SPAN class=""&gt;).&lt;/SPAN&gt;&lt;SPAN class=""&gt;parquet&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;self&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;output_path&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;else&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;                &lt;SPAN class=""&gt;df&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;write&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;mode&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;self&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;write_mode&lt;/SPAN&gt;&lt;SPAN class=""&gt;).&lt;/SPAN&gt;&lt;SPAN class=""&gt;parquet&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;self&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;output_path&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;
            
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;return&lt;/SPAN&gt; &lt;SPAN class=""&gt;True&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;        &lt;SPAN class=""&gt;except&lt;/SPAN&gt; &lt;SPAN class=""&gt;Exception&lt;/SPAN&gt; &lt;SPAN class=""&gt;as&lt;/SPAN&gt; &lt;SPAN class=""&gt;e&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;print&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;f&lt;/SPAN&gt;&lt;SPAN class=""&gt;"&lt;span class="lia-unicode-emoji" title=":cross_mark:"&gt;❌&lt;/span&gt; Erreur d'ecriture pour &lt;/SPAN&gt;&lt;SPAN class=""&gt;{&lt;/SPAN&gt;&lt;SPAN class=""&gt;self&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;mongo_database&lt;/SPAN&gt;&lt;SPAN class=""&gt;}&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;{&lt;/SPAN&gt;&lt;SPAN class=""&gt;self&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;collection&lt;/SPAN&gt;&lt;SPAN class=""&gt;}&lt;/SPAN&gt;&lt;SPAN class=""&gt; : &lt;/SPAN&gt;&lt;SPAN class=""&gt;{&lt;/SPAN&gt;&lt;SPAN class=""&gt;str&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;e&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;SPAN class=""&gt;}&lt;/SPAN&gt;&lt;SPAN class=""&gt;"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;SPAN class=""&gt;            &lt;SPAN class=""&gt;return&lt;/SPAN&gt; &lt;SPAN class=""&gt;False&lt;/SPAN&gt;&lt;/SPAN&gt;
&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Jul 2025 08:15:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/process-mongo-table-to-delta-table-databricks/m-p/124872#M47283</guid>
      <dc:creator>seefoods</dc:creator>
      <dc:date>2025-07-11T08:15:53Z</dc:date>
    </item>
    <item>
      <title>Re: process mongo table to delta table databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/process-mongo-table-to-delta-table-databricks/m-p/125132#M47348</link>
      <description>&lt;P&gt;What if you do not update the delta table for each incoming microbatch but f.e. only do this every 15 min/hour/whatever.&lt;BR /&gt;Like that you can keep on ingesting in a streaming way, but the actual update towards the delta table is more batch approached so the overhead of the merge is less of an issue.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Jul 2025 09:02:34 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/process-mongo-table-to-delta-table-databricks/m-p/125132#M47348</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2025-07-14T09:02:34Z</dc:date>
    </item>
  </channel>
</rss>

