<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: 🇪🇸 Por qué el DataFrame es el objeto de datos más importante en el procesamiento distribuido in Community Articles</title>
    <link>https://community.databricks.com/t5/community-articles/por-qu%C3%A9-el-dataframe-es-el-objeto-de-datos-m%C3%A1s-importante-en-el/m-p/149147#M1028</link>
    <description>&lt;P&gt;That's such a great idea. Can't wait for another post &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 24 Feb 2026 10:03:36 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2026-02-24T10:03:36Z</dc:date>
    <item>
      <title>🇪🇸 Por qué el DataFrame es el objeto de datos más importante en el procesamiento distribuido</title>
      <link>https://community.databricks.com/t5/community-articles/por-qu%C3%A9-el-dataframe-es-el-objeto-de-datos-m%C3%A1s-importante-en-el/m-p/149140#M1025</link>
      <description>&lt;P&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":spain:"&gt;🇪🇸&lt;/span&gt; En este video, creado como recordatorio para mi mala memoria a largo plazo, explico de forma sencilla: &lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; Qué es un DataFrame &lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; Cómo se distribuye en particiones &lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; Cómo se ejecuta en un cluster (driver y workers) &lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; Qué ocurre en un shuffle&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; Relación entre particiones, jobs, stages, shuffle y tasks&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; Por qué es la pieza clave en Databricks y Apache Spark&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;span class="lia-unicode-emoji" title=":united_kingdom:"&gt;🇬🇧&lt;/span&gt; Do you know why the DataFrame is the most important data object in distributed processing?&lt;/P&gt;&lt;P&gt;In this video, created as a reminder for my poor long-term human memory, I explain in a simple way:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; What a DataFrame is&lt;/LI&gt;&lt;LI&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; How it's distributed across partitions&lt;/LI&gt;&lt;LI&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; How it runs in a cluster (drivers and workers)&lt;/LI&gt;&lt;LI&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; What happens during a shuffle&lt;/LI&gt;&lt;LI&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; How&amp;nbsp;&lt;SPAN&gt;partitions, jobs, stages, shuffle and tasks are related&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;span class="lia-unicode-emoji" title=":white_heavy_check_mark:"&gt;✅&lt;/span&gt; Why it's the key component in Databricks and Apache Spark&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN&gt;Now, only in Spanish version, who knows later ...&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;div class="video-embed-center video-embed"&gt;&lt;iframe class="embedly-embed" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FekFa9sRCbT0%3Ffeature%3Doembed&amp;amp;display_name=YouTube&amp;amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DekFa9sRCbT0&amp;amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FekFa9sRCbT0%2Fhqdefault.jpg&amp;amp;type=text%2Fhtml&amp;amp;schema=youtube" width="600" height="337" scrolling="no" title="🚀 DataFrames: La Base del Procesamiento Distribuido en Databricks y Spark ⚡ #databricks #spark" frameborder="0" allow="autoplay; fullscreen; encrypted-media; picture-in-picture;" allowfullscreen="true"&gt;&lt;/iframe&gt;&lt;/div&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Feb 2026 08:38:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/por-qu%C3%A9-el-dataframe-es-el-objeto-de-datos-m%C3%A1s-importante-en-el/m-p/149140#M1025</guid>
      <dc:creator>Coffee77</dc:creator>
      <dc:date>2026-02-24T08:38:30Z</dc:date>
    </item>
    <item>
      <title>Re: 🇪🇸 Por qué el DataFrame es el objeto de datos más importante en el procesamiento distribuido</title>
      <link>https://community.databricks.com/t5/community-articles/por-qu%C3%A9-el-dataframe-es-el-objeto-de-datos-m%C3%A1s-importante-en-el/m-p/149142#M1026</link>
      <description>&lt;P&gt;Thanks for sharing&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/179536"&gt;@Coffee77&lt;/a&gt;&amp;nbsp;!&lt;/P&gt;</description>
      <pubDate>Tue, 24 Feb 2026 08:45:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/por-qu%C3%A9-el-dataframe-es-el-objeto-de-datos-m%C3%A1s-importante-en-el/m-p/149142#M1026</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2026-02-24T08:45:41Z</dc:date>
    </item>
    <item>
      <title>Re: 🇪🇸 Por qué el DataFrame es el objeto de datos más importante en el procesamiento distribuido</title>
      <link>https://community.databricks.com/t5/community-articles/por-qu%C3%A9-el-dataframe-es-el-objeto-de-datos-m%C3%A1s-importante-en-el/m-p/149144#M1027</link>
      <description>&lt;P&gt;Recently, I am creating some "self-reminder" videos for helping my long-term poor human memory &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt; and maybe to help others. Understand internals of Dataframes, how partitions are related to jobs, stages, shuffles and tasks and, how transformations or aggregations are executed in cluster is something that can make your project fail or succeed.&lt;/P&gt;&lt;P&gt;In my current project, I had to deal with complex scenarios to combine processing of small, medium and large Dataframes loaded from input files in same all-purpose cluster, with challenging requirements on concurrency (up to 50-70 concurrent jobs/pipelines), very complex DAGs, same platform/code for all inputs, and even ad-hoc IA-generated transformations. Only after fully understanding and monitoring what was going on in the background, we were able to make those pipelines work with acceptable performance with small files and very great performance with large or very large files in comparison with legacy platform. All of this, keeping CPU and memory levels stable (specially in driver node) and trying to not increment a lot our hardware/clusters costs.&lt;/P&gt;&lt;P&gt;I'll write a post about it when possible to help in similar scenarios and get feedback from brilliant databricks experts as you &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Feb 2026 09:05:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/por-qu%C3%A9-el-dataframe-es-el-objeto-de-datos-m%C3%A1s-importante-en-el/m-p/149144#M1027</guid>
      <dc:creator>Coffee77</dc:creator>
      <dc:date>2026-02-24T09:05:19Z</dc:date>
    </item>
    <item>
      <title>Re: 🇪🇸 Por qué el DataFrame es el objeto de datos más importante en el procesamiento distribuido</title>
      <link>https://community.databricks.com/t5/community-articles/por-qu%C3%A9-el-dataframe-es-el-objeto-de-datos-m%C3%A1s-importante-en-el/m-p/149147#M1028</link>
      <description>&lt;P&gt;That's such a great idea. Can't wait for another post &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Feb 2026 10:03:36 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/por-qu%C3%A9-el-dataframe-es-el-objeto-de-datos-m%C3%A1s-importante-en-el/m-p/149147#M1028</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2026-02-24T10:03:36Z</dc:date>
    </item>
  </channel>
</rss>

