cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT failure: ABFS does not allow files or directories to end with a dot

scvbelle
New Contributor III

In my DLT pipeline outlined below which generically cleans identifier tables, after successfully creating initial streaming tables from the append-only sources, fails when trying to create the second cleaned tables witht the following:

It'**bleep** clearly a generated file name, so I don't know what'**bleep** causing there to be a dot at the end.

An empty file?
An empty column?

org.apache.spark.SparkException: [TASK_WRITE_FAILED] Task failed while writing rows to abfss://xx@xx.dfs.core.windows.net/xx/tables/xx...-b53c-4a7c-be49-35ca3d4f9a50. at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:905) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:515) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$3(FileFormatWriter.scala:316)
...

Caused by: java.lang.IllegalArgumentException: ABFS does not allow files or directories to end with a dot. at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.trailingPeriodCheck(AzureBlobFileSystem.java:737) at shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.create(AzureBlobFileSystem.java:372)


 

1 ACCEPTED SOLUTION

Accepted Solutions

Priyanka_Biswas
Valued Contributor
Valued Contributor

Hi @scvbelle 
The error message you're seeing is caused by an 
IllegalArgumentException error due to the restriction in Azure Blob File System (ABFS) that does not allow files or directories to end with a dot. This error is thrown by the trailingPeriodCheck method in the AzureBlobFileSystem class of the ABFS client, indicating that the naming rule is violated.In order to solve this issue, you should ensure that none of your files or directories in ABFS end with a dot. This rule applies to all operations involving ABFS, including creating, renaming, and moving files or directories.
Can you check if any of the source files are ending with dot ? 

View solution in original post

3 REPLIES 3

scvbelle
New Contributor III

UPDATE: 

See the code below (for some reason couldn't post it yesterday.)
I decided to revert back to an earlier version of the DLT pipeline which defined each table one by one rather than programatically, which had run without issue. That ran fine.
I added on a few extra tables and ran it before signing off last night, and started the job that updates the source tables

Today I ran the DLT pipeline (and adds to the source tables) and the second of the three tables/steps that I added failed with the same error.

@dataclass
class IdTable:
    schema_name: str
    table_name: str
    email_col_name: str
    idno_col_name: Optional[str] = None
    surname_col_name: Optional[str] = None
    table_user_id_col: Optional[str] = None


def generate_raw_table(source_path: str, target_name_base: str):
    @dlt.table(
        name=f"{target_name_base}_raw" ,
        comment=f"Raw records from bronze layer for {source_path}"
    )
    def create_live_raw_table():
        return (spark.readStream.table(source_path))
 
 
def generate_cleaned_bronze_table(target_name_base: str, id_table: IdTable):
    @dlt.table(
        name= f"{target_name_base}_clean"
        , comment=f"Cleaned records from raw bronze views for {target_name_base}_raw"
        # , partition_cols=["insert_time", "ref", "table_uid"]
    )
    def create_live_clean_table():
        df_raw = spark.readStream.table(f"LIVE.{target_name_base}_raw")  
        df_clean = df_raw.withColumn(
# ... 
)
        return df_clean


id_tables: List[IdTable] = [
    IdTable(
        schema_name="source_name",
        table_name="acc_user_account",
        idno_col_name="identification_nr",
        surname_col_name="surname",
        email_col_name="email_address",
        table_user_id_col="user_id"
        ),

]

for table in id_tables:
    source_path: str = f"bronze_raw.{table.schema_name}.{table.table_name}"
    target_name_base: str = f"{table.schema_name}_{table.table_name}"

    generate_raw_table(source_path=source_path, target_name_base=target_name_base)
    generate_cleaned_bronze_table(target_name_base=target_name_base, id_table=table)

the downstream code, which worked both runs

@dlt.table(
  comment = "UUIDs for where name AND id AND email match",
  partition_cols=["ref", "deduped_uid_1"],
)
def deduped_1_name_id_email():
    q: str = generate_dedup_query(
        source_table='deduped_0_per_table', 
        matching_cols=['ref'], 
        source_id_col='deduped_uid_0',
        output_id_col='deduped_uid_1'
        )
    return spark.sql(q)

the table that worked on the first run, then errored on the second

@dlt.table(
  comment = "UUIDs for where surname AND id NOT email match",
  partition_cols=["idno", "surname", "deduped_uid_2"],
)
def deduped_2_name_id():
    q: str = generate_dedup_query(
        source_table='deduped_1_name_id_email', 
        matching_cols=['idno', "surname"], 
        source_id_col='deduped_uid_1',
        output_id_col='deduped_uid_2',
        where_clause="WHERE idno is not null AND surname is not null"
        )
    print(q)
    return spark.sql(q)

 
My only thought is that its the values themselves causing the error, but during my debugging yesterday I tried adding a terminating column (containing a string like 'row_end') to no efffect.
I've tried storing them under diferent schemas, deleting and recreating the pipeline, no change.

Priyanka_Biswas
Valued Contributor
Valued Contributor

Hi @scvbelle 
The error message you're seeing is caused by an 
IllegalArgumentException error due to the restriction in Azure Blob File System (ABFS) that does not allow files or directories to end with a dot. This error is thrown by the trailingPeriodCheck method in the AzureBlobFileSystem class of the ABFS client, indicating that the naming rule is violated.In order to solve this issue, you should ensure that none of your files or directories in ABFS end with a dot. This rule applies to all operations involving ABFS, including creating, renaming, and moving files or directories.
Can you check if any of the source files are ending with dot ? 

Hi @Priyanka_Biswas 

Thanks so much for responding!
I dove into the metastore and saw that because of my partitioning (?), directories are created using actual values from the table. On checking the source data, I see that indeed one of the values ends with a period, so adding a sanitisation step to remove punctuation (a good idea in any case) should solve it. Obvious in retrospect to check this.

A bit of an awkward situation to deal with for the engine: I guess it's not trivial to automatically sanitise during the directory creation process as it would mess with lookups if the directory name doesn't match the table value. 

Because partition column shouldn't generally be random long strings (containing punctuation) i guess it's not usually much of a problem.

Because this is a limitation of the engine it would be good if the exception and solution was mentioned in the documentation, though.

Thanks again! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group