Greetings !
I've been trying out DLT for a few days but I'm running into an unexpected issue when trying to use Koalas dropna in my pipeline.
My goal is to drop all columns that contain only null/na values before writing it.
Current code is this :
@dlt.table(name=f"silver_table")
def silver():
df = (dlt
.read(f"bronze_table")
.to_koalas()
.dropna(axis=1, how="all")
)
return df
Running the pipeline, I get the following error message :
org.apache.spark.sql.AnalysisException:
You are trying to create an external table [...]
from `dbfs:/pipelines/[...]` using Databricks Delta, but there is no transaction log present at
`dbfs:/pipelines/[...]/_delta_log`. Check the upstream job to make sure that it is writing using
format("delta") and that the path is the root of the table.
Am I missing something or is the dropna function not usable in DLT for some reason ?
Thanks a lot !