<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DLT failure: ABFS does not allow files or directories to end with a dot in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/dlt-failure-abfs-does-not-allow-files-or-directories-to-end-with/m-p/42910#M27447</link>
    <description>&lt;P&gt;UPDATE:&amp;nbsp;&lt;/P&gt;&lt;P&gt;See the code below (for some reason couldn't post it yesterday.)&lt;BR /&gt;I decided to revert back to an earlier version of the DLT pipeline which defined each table one by one rather than programatically, which had run without issue. That ran fine.&lt;BR /&gt;I added on a few extra tables and ran it before signing off last night, and started the job that updates the source tables&lt;BR /&gt;&lt;BR /&gt;Today I ran the DLT pipeline (and adds to the source tables) and the second of the three tables/steps that I added failed with the same error.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;@dataclass
class IdTable:
    schema_name: str
    table_name: str
    email_col_name: str
    idno_col_name: Optional[str] = None
    surname_col_name: Optional[str] = None
    table_user_id_col: Optional[str] = None


def generate_raw_table(source_path: str, target_name_base: str):
    @dlt.table(
        name=f"{target_name_base}_raw" ,
        comment=f"Raw records from bronze layer for {source_path}"
    )
    def create_live_raw_table():
        return (spark.readStream.table(source_path))
 
 
def generate_cleaned_bronze_table(target_name_base: str, id_table: IdTable):
    @dlt.table(
        name= f"{target_name_base}_clean"
        , comment=f"Cleaned records from raw bronze views for {target_name_base}_raw"
        # , partition_cols=["insert_time", "ref", "table_uid"]
    )
    def create_live_clean_table():
        df_raw = spark.readStream.table(f"LIVE.{target_name_base}_raw")  
        df_clean = df_raw.withColumn(
# ... 
)
        return df_clean


id_tables: List[IdTable] = [
    IdTable(
        schema_name="source_name",
        table_name="acc_user_account",
        idno_col_name="identification_nr",
        surname_col_name="surname",
        email_col_name="email_address",
        table_user_id_col="user_id"
        ),

]

for table in id_tables:
    source_path: str = f"bronze_raw.{table.schema_name}.{table.table_name}"
    target_name_base: str = f"{table.schema_name}_{table.table_name}"

    generate_raw_table(source_path=source_path, target_name_base=target_name_base)
    generate_cleaned_bronze_table(target_name_base=target_name_base, id_table=table)&lt;/LI-CODE&gt;&lt;P&gt;the downstream code, which worked both runs&lt;/P&gt;&lt;LI-CODE lang="python"&gt;@dlt.table(
  comment = "UUIDs for where name AND id AND email match",
  partition_cols=["ref", "deduped_uid_1"],
)
def deduped_1_name_id_email():
    q: str = generate_dedup_query(
        source_table='deduped_0_per_table', 
        matching_cols=['ref'], 
        source_id_col='deduped_uid_0',
        output_id_col='deduped_uid_1'
        )
    return spark.sql(q)&lt;/LI-CODE&gt;&lt;P&gt;the table that worked on the first run, then errored on the second&lt;/P&gt;&lt;LI-CODE lang="python"&gt;@dlt.table(
  comment = "UUIDs for where surname AND id NOT email match",
  partition_cols=["idno", "surname", "deduped_uid_2"],
)
def deduped_2_name_id():
    q: str = generate_dedup_query(
        source_table='deduped_1_name_id_email', 
        matching_cols=['idno', "surname"], 
        source_id_col='deduped_uid_1',
        output_id_col='deduped_uid_2',
        where_clause="WHERE idno is not null AND surname is not null"
        )
    print(q)
    return spark.sql(q)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;BR /&gt;My only thought is that its the values themselves causing the error, but during my debugging yesterday I tried adding a terminating column (containing a string like 'row_end') to no efffect.&lt;BR /&gt;I've tried storing them under diferent schemas, deleting and recreating the pipeline, no change.&lt;/P&gt;</description>
    <pubDate>Thu, 31 Aug 2023 02:37:33 GMT</pubDate>
    <dc:creator>scvbelle</dc:creator>
    <dc:date>2023-08-31T02:37:33Z</dc:date>
    <item>
      <title>DLT failure: ABFS does not allow files or directories to end with a dot</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-failure-abfs-does-not-allow-files-or-directories-to-end-with/m-p/42863#M27435</link>
      <description>&lt;P&gt;In my DLT pipeline outlined below which generically cleans identifier tables, after successfully creating initial streaming tables from the append-only sources, fails when trying to create the second cleaned tables witht the following:&lt;BR /&gt;&lt;BR /&gt;It'**bleep** clearly a generated file name, so I don't know what'**bleep** causing there to be a dot at the end.&lt;BR /&gt;&lt;BR /&gt;An empty file?&lt;BR /&gt;An empty column?&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows to abfss://xx@xx.dfs.core.windows.net/xx/tables/xx...-b53c-4a7c-be49-35ca3d4f9a50. at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:905) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:515) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$3(FileFormatWriter.scala:316)&lt;/SPAN&gt;&lt;BR /&gt;...&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Caused by: java.lang.IllegalArgumentException: ABFS does not allow files or directories to end with a dot. at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.trailingPeriodCheck(AzureBlobFileSystem.java:737) at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.create(AzureBlobFileSystem.java:372)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 30 Aug 2023 11:47:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-failure-abfs-does-not-allow-files-or-directories-to-end-with/m-p/42863#M27435</guid>
      <dc:creator>scvbelle</dc:creator>
      <dc:date>2023-08-30T11:47:11Z</dc:date>
    </item>
    <item>
      <title>Re: DLT failure: ABFS does not allow files or directories to end with a dot</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-failure-abfs-does-not-allow-files-or-directories-to-end-with/m-p/42910#M27447</link>
      <description>&lt;P&gt;UPDATE:&amp;nbsp;&lt;/P&gt;&lt;P&gt;See the code below (for some reason couldn't post it yesterday.)&lt;BR /&gt;I decided to revert back to an earlier version of the DLT pipeline which defined each table one by one rather than programatically, which had run without issue. That ran fine.&lt;BR /&gt;I added on a few extra tables and ran it before signing off last night, and started the job that updates the source tables&lt;BR /&gt;&lt;BR /&gt;Today I ran the DLT pipeline (and adds to the source tables) and the second of the three tables/steps that I added failed with the same error.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;@dataclass
class IdTable:
    schema_name: str
    table_name: str
    email_col_name: str
    idno_col_name: Optional[str] = None
    surname_col_name: Optional[str] = None
    table_user_id_col: Optional[str] = None


def generate_raw_table(source_path: str, target_name_base: str):
    @dlt.table(
        name=f"{target_name_base}_raw" ,
        comment=f"Raw records from bronze layer for {source_path}"
    )
    def create_live_raw_table():
        return (spark.readStream.table(source_path))
 
 
def generate_cleaned_bronze_table(target_name_base: str, id_table: IdTable):
    @dlt.table(
        name= f"{target_name_base}_clean"
        , comment=f"Cleaned records from raw bronze views for {target_name_base}_raw"
        # , partition_cols=["insert_time", "ref", "table_uid"]
    )
    def create_live_clean_table():
        df_raw = spark.readStream.table(f"LIVE.{target_name_base}_raw")  
        df_clean = df_raw.withColumn(
# ... 
)
        return df_clean


id_tables: List[IdTable] = [
    IdTable(
        schema_name="source_name",
        table_name="acc_user_account",
        idno_col_name="identification_nr",
        surname_col_name="surname",
        email_col_name="email_address",
        table_user_id_col="user_id"
        ),

]

for table in id_tables:
    source_path: str = f"bronze_raw.{table.schema_name}.{table.table_name}"
    target_name_base: str = f"{table.schema_name}_{table.table_name}"

    generate_raw_table(source_path=source_path, target_name_base=target_name_base)
    generate_cleaned_bronze_table(target_name_base=target_name_base, id_table=table)&lt;/LI-CODE&gt;&lt;P&gt;the downstream code, which worked both runs&lt;/P&gt;&lt;LI-CODE lang="python"&gt;@dlt.table(
  comment = "UUIDs for where name AND id AND email match",
  partition_cols=["ref", "deduped_uid_1"],
)
def deduped_1_name_id_email():
    q: str = generate_dedup_query(
        source_table='deduped_0_per_table', 
        matching_cols=['ref'], 
        source_id_col='deduped_uid_0',
        output_id_col='deduped_uid_1'
        )
    return spark.sql(q)&lt;/LI-CODE&gt;&lt;P&gt;the table that worked on the first run, then errored on the second&lt;/P&gt;&lt;LI-CODE lang="python"&gt;@dlt.table(
  comment = "UUIDs for where surname AND id NOT email match",
  partition_cols=["idno", "surname", "deduped_uid_2"],
)
def deduped_2_name_id():
    q: str = generate_dedup_query(
        source_table='deduped_1_name_id_email', 
        matching_cols=['idno', "surname"], 
        source_id_col='deduped_uid_1',
        output_id_col='deduped_uid_2',
        where_clause="WHERE idno is not null AND surname is not null"
        )
    print(q)
    return spark.sql(q)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;BR /&gt;My only thought is that its the values themselves causing the error, but during my debugging yesterday I tried adding a terminating column (containing a string like 'row_end') to no efffect.&lt;BR /&gt;I've tried storing them under diferent schemas, deleting and recreating the pipeline, no change.&lt;/P&gt;</description>
      <pubDate>Thu, 31 Aug 2023 02:37:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-failure-abfs-does-not-allow-files-or-directories-to-end-with/m-p/42910#M27447</guid>
      <dc:creator>scvbelle</dc:creator>
      <dc:date>2023-08-31T02:37:33Z</dc:date>
    </item>
    <item>
      <title>Re: DLT failure: ABFS does not allow files or directories to end with a dot</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-failure-abfs-does-not-allow-files-or-directories-to-end-with/m-p/43117#M27493</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/70238"&gt;@scvbelle&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;The error message you're seeing is caused by an&amp;nbsp;&lt;/SPAN&gt;IllegalArgumentException&lt;SPAN&gt;&amp;nbsp;error due to the restriction in Azure Blob File System (ABFS) that does not allow files or directories to end with a dot. This error is thrown by the&amp;nbsp;&lt;/SPAN&gt;trailingPeriodCheck&lt;SPAN&gt;&amp;nbsp;method in the&amp;nbsp;&lt;/SPAN&gt;AzureBlobFileSystem&lt;SPAN&gt;&amp;nbsp;class of the ABFS client, indicating that the naming rule is violated.&lt;/SPAN&gt;&lt;SPAN&gt;In order to solve this issue, you should ensure that none of your files or directories in ABFS end with a dot. This rule applies to all operations involving ABFS, including creating, renaming, and moving files or directories.&lt;BR /&gt;&lt;/SPAN&gt;&lt;SPAN&gt;Can you check if any of the source files are ending with dot ?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Sep 2023 21:56:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-failure-abfs-does-not-allow-files-or-directories-to-end-with/m-p/43117#M27493</guid>
      <dc:creator>Priyanka_Biswas</dc:creator>
      <dc:date>2023-09-01T21:56:44Z</dc:date>
    </item>
    <item>
      <title>Re: DLT failure: ABFS does not allow files or directories to end with a dot</title>
      <link>https://community.databricks.com/t5/data-engineering/dlt-failure-abfs-does-not-allow-files-or-directories-to-end-with/m-p/43347#M27502</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/39246"&gt;@Priyanka_Biswas&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks so much for responding!&lt;BR /&gt;I dove into the metastore and saw that because of my partitioning (?), directories are created using actual values from the table. On checking the source data, I see that indeed one of the values ends with a period, so adding a sanitisation step to remove punctuation (a good idea in any case) should solve it. Obvious in retrospect to check this.&lt;/P&gt;&lt;P&gt;A bit of an awkward situation to deal with for the engine: I guess it's not trivial to automatically sanitise during the directory creation process as it would mess with lookups if the directory name doesn't match the table value.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Because partition column shouldn't generally be random long strings (containing punctuation) i guess it's not usually much of a problem.&lt;BR /&gt;&lt;BR /&gt;Because this is a limitation of the engine it would be good if the exception and solution was mentioned in the documentation, though.&lt;BR /&gt;&lt;BR /&gt;Thanks again!&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Sep 2023 08:55:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/dlt-failure-abfs-does-not-allow-files-or-directories-to-end-with/m-p/43347#M27502</guid>
      <dc:creator>scvbelle</dc:creator>
      <dc:date>2023-09-04T08:55:18Z</dc:date>
    </item>
  </channel>
</rss>

