<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic GraphFrames and DLT in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/graphframes-and-dlt/m-p/83613#M36976</link>
    <description>&lt;P&gt;I am trying to run a DLT job that uses GraphFrames, which is in the ML standard image.&amp;nbsp; &amp;nbsp;I am using it successfully in my job compute instances but I'm running into problems trying to use it in a DLT job.&amp;nbsp; Here are my overrides for the standard job compute policy:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;{&lt;BR /&gt;"spark_version": {&lt;BR /&gt;"type": "unlimited",&lt;BR /&gt;"defaultValue": "auto:latest-lts-ml"&lt;BR /&gt;},&lt;BR /&gt;"cluster_type": {&lt;BR /&gt;"type": "allowlist",&lt;BR /&gt;"defaultValue": "all-purpose",&lt;BR /&gt;"values": [&lt;BR /&gt;"all-purpose",&lt;BR /&gt;"job",&lt;BR /&gt;"dlt"&lt;BR /&gt;]&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, when I run the DLT job, I get the following error:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;ModuleNotFoundError: No module named 'graphframes',None,Map(),Map(),List(),List(),Map())&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;GraphFrames is not pip installable that I know of.&amp;nbsp; Primary instructions are maven coords as the python package uses underlying java/scala.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Will DLT pipelines support GraphFrames?&lt;/P&gt;&lt;P&gt;Related but unresolved&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://community.databricks.com/t5/data-engineering/workaround-for-graphframes-not-working-on-delta-live-table/m-p/20190#M13616" target="_self"&gt;question&lt;/A&gt;.&lt;/P&gt;</description>
    <pubDate>Tue, 20 Aug 2024 14:06:45 GMT</pubDate>
    <dc:creator>lprevost</dc:creator>
    <dc:date>2024-08-20T14:06:45Z</dc:date>
    <item>
      <title>GraphFrames and DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/graphframes-and-dlt/m-p/83613#M36976</link>
      <description>&lt;P&gt;I am trying to run a DLT job that uses GraphFrames, which is in the ML standard image.&amp;nbsp; &amp;nbsp;I am using it successfully in my job compute instances but I'm running into problems trying to use it in a DLT job.&amp;nbsp; Here are my overrides for the standard job compute policy:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;{&lt;BR /&gt;"spark_version": {&lt;BR /&gt;"type": "unlimited",&lt;BR /&gt;"defaultValue": "auto:latest-lts-ml"&lt;BR /&gt;},&lt;BR /&gt;"cluster_type": {&lt;BR /&gt;"type": "allowlist",&lt;BR /&gt;"defaultValue": "all-purpose",&lt;BR /&gt;"values": [&lt;BR /&gt;"all-purpose",&lt;BR /&gt;"job",&lt;BR /&gt;"dlt"&lt;BR /&gt;]&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, when I run the DLT job, I get the following error:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;ModuleNotFoundError: No module named 'graphframes',None,Map(),Map(),List(),List(),Map())&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;GraphFrames is not pip installable that I know of.&amp;nbsp; Primary instructions are maven coords as the python package uses underlying java/scala.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Will DLT pipelines support GraphFrames?&lt;/P&gt;&lt;P&gt;Related but unresolved&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://community.databricks.com/t5/data-engineering/workaround-for-graphframes-not-working-on-delta-live-table/m-p/20190#M13616" target="_self"&gt;question&lt;/A&gt;.&lt;/P&gt;</description>
      <pubDate>Tue, 20 Aug 2024 14:06:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/graphframes-and-dlt/m-p/83613#M36976</guid>
      <dc:creator>lprevost</dc:creator>
      <dc:date>2024-08-20T14:06:45Z</dc:date>
    </item>
    <item>
      <title>Re: GraphFrames and DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/graphframes-and-dlt/m-p/87116#M37381</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/9"&gt;@Retired_mod&lt;/a&gt;&amp;nbsp;- any chance I can get a definitive answer to this question?&amp;nbsp; I know I can %pip install in DLT jobs but graphframes requires a maven type install as it uses underlying java/scala modules/jar files.&amp;nbsp; &amp;nbsp;A related question is whether there is a plan for DLT to support the ML instance (which has GraphFrames installed).&amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Sun, 01 Sep 2024 19:26:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/graphframes-and-dlt/m-p/87116#M37381</guid>
      <dc:creator>lprevost</dc:creator>
      <dc:date>2024-09-01T19:26:55Z</dc:date>
    </item>
    <item>
      <title>Re: GraphFrames and DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/graphframes-and-dlt/m-p/91294#M38133</link>
      <description>&lt;P&gt;Crickets .....&lt;/P&gt;</description>
      <pubDate>Sat, 21 Sep 2024 18:25:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/graphframes-and-dlt/m-p/91294#M38133</guid>
      <dc:creator>lprevost</dc:creator>
      <dc:date>2024-09-21T18:25:46Z</dc:date>
    </item>
  </channel>
</rss>

