<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Koalas dropna in DLT in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/koalas-dropna-in-dlt/m-p/21960#M14998</link>
    <description>&lt;P&gt;Greetings !&lt;/P&gt;&lt;P&gt;I've been trying out DLT for a few days but I'm running into an unexpected issue when trying to use Koalas dropna in my pipeline.&lt;/P&gt;&lt;P&gt;My goal is to drop all columns that contain only null/na values before writing it.&lt;/P&gt;&lt;P&gt;Current code is this :&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;  @dlt.table(name=f"silver_table")
  def silver():
    df = (dlt
          .read(f"bronze_table")
           .to_koalas()
           .dropna(axis=1, how="all")
         )
    return df&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Running the pipeline, I get the following error message :&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;org.apache.spark.sql.AnalysisException: 
You are trying to create an external table [...]
from `dbfs:/pipelines/[...]` using Databricks Delta, but there is no transaction log present at
`dbfs:/pipelines/[...]/_delta_log`. Check the upstream job to make sure that it is writing using
format("delta") and that the path is the root of the table.&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Am I missing something or is the dropna function not usable in DLT for some reason ?&lt;/P&gt;&lt;P&gt;Thanks a lot !&lt;/P&gt;</description>
    <pubDate>Thu, 28 Apr 2022 09:44:41 GMT</pubDate>
    <dc:creator>Thefan</dc:creator>
    <dc:date>2022-04-28T09:44:41Z</dc:date>
    <item>
      <title>Koalas dropna in DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/koalas-dropna-in-dlt/m-p/21960#M14998</link>
      <description>&lt;P&gt;Greetings !&lt;/P&gt;&lt;P&gt;I've been trying out DLT for a few days but I'm running into an unexpected issue when trying to use Koalas dropna in my pipeline.&lt;/P&gt;&lt;P&gt;My goal is to drop all columns that contain only null/na values before writing it.&lt;/P&gt;&lt;P&gt;Current code is this :&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;  @dlt.table(name=f"silver_table")
  def silver():
    df = (dlt
          .read(f"bronze_table")
           .to_koalas()
           .dropna(axis=1, how="all")
         )
    return df&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Running the pipeline, I get the following error message :&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;org.apache.spark.sql.AnalysisException: 
You are trying to create an external table [...]
from `dbfs:/pipelines/[...]` using Databricks Delta, but there is no transaction log present at
`dbfs:/pipelines/[...]/_delta_log`. Check the upstream job to make sure that it is writing using
format("delta") and that the path is the root of the table.&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Am I missing something or is the dropna function not usable in DLT for some reason ?&lt;/P&gt;&lt;P&gt;Thanks a lot !&lt;/P&gt;</description>
      <pubDate>Thu, 28 Apr 2022 09:44:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/koalas-dropna-in-dlt/m-p/21960#M14998</guid>
      <dc:creator>Thefan</dc:creator>
      <dc:date>2022-04-28T09:44:41Z</dc:date>
    </item>
  </channel>
</rss>

