Koalas dropna in DLT

Thefan — Thu, 28 Apr 2022 09:44:41 GMT

Greetings !

I've been trying out DLT for a few days but I'm running into an unexpected issue when trying to use Koalas dropna in my pipeline.

My goal is to drop all columns that contain only null/na values before writing it.

Current code is this :

  @dlt.table(name=f"silver_table")
  def silver():
    df = (dlt
          .read(f"bronze_table")
           .to_koalas()
           .dropna(axis=1, how="all")
         )
    return df

Running the pipeline, I get the following error message :

org.apache.spark.sql.AnalysisException: 
You are trying to create an external table [...]
from `dbfs:/pipelines/[...]` using Databricks Delta, but there is no transaction log present at
`dbfs:/pipelines/[...]/_delta_log`. Check the upstream job to make sure that it is writing using
format("delta") and that the path is the root of the table.

Am I missing something or is the dropna function not usable in DLT for some reason ?

Thanks a lot !

topic Koalas dropna in DLT in Data Engineering

Koalas dropna in DLT