cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

New delta log folder is not getting created

Coders
New Contributor II

I have following code which reads the stream of data and process the data in the foreachBatch and writes to the provided path as shown below.

public static void writeToDatalake(SparkSession session, Configuration config, Dataset<Row> data, Entity entity, throws TimeoutException, StreamingQueryException {
String writePath = getWritePath(config, entity, profile);
log.info(writePath);
data.writeStream()
.outputMode(OutputMode.Update))
.format(source:"delta")
.foreachBatch(mergeData(session, config, entity, writePath))
.option("checkpointLocation", writePath + CHECKPOINT_PATH)
.trigger(Trigger.Once())
start()
.awaitTermination();
}


In this merge function, I'm checking if the table exists in the path, if not I'm creating new one and writing the data. If it exists, I'm straight forwardly writing as shown below.

I have used foreachBatch to specify a custom function (mergeData) to be executed for each batch of data generated by the streaming query. The mergeData function is responsible for merging the current batch of data with the existing data in the Delta Lake.

public static VoidFunction2<Dataset<Row>, Long> mergeData(SparkSession session, Configuration config, Entity entity {
return (data, batchid) -> {
boolean exists = DeltaTable.isDeltaTable(writePath);
if (lexists) {
Dataset<Row> emptyDF = session.createDataFrame(new ArrayList<>(), data.schema());
emptyDF.write()
.format("delta")
.mode(SaveMode.Overwrite)
.partitionBy(
JavaConverters.asScalaBuffer(config.getReadBlobStorage.get(entity.getSource).getPartition())
.save(writePath);
log.info("created Empty df");
emptyDF.unpersist();
writeData(writePath, entity, data, config);
data.unpersist();
}
};
}

This should create deltalog folder in blob store if doesn't exists. But it's not when I run this in databricks. I'm ending up with error as shown below:

> com.databricks.sql.transaction.tahoe.DeltaAnalysisException:
> Incompatible format detected. You are trying to write to
> `abfss://<path>/`
> using Delta, but there is no transaction log present. Check the
> upstream job to make sure that it is writing using format("delta") and
> that you are trying to write to the table base path.

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @CodersIt seems you’re encountering an issue while writing data to Delta Lake in Azure Databricks. The error message indicates that the format is incompatible, and it’s related to the absence of a transaction log. Let’s troubleshoot this together.

  1. Check the Path:

    • First, verify the writePath you’re using. Make sure it points to the correct location in your Azure Blob Storage.
    • The error message suggests that the path is abfss://<path>/. Ensure that it’s a valid path.
  2. Transaction Log:

    • The error indicates that there’s no transaction log present at the destination path.
    • Delta Lake requires a transaction log for its operations.
    • Check if there’s existing data at the specified path. If there is, ensure that it’s in the Delta format (i.e., has a transaction log).
    • If you’re writing to a new location, proceed to the next step.
  3. Format Check:

    • Confirm that your upstream job is writing data using the Delta format (format("delta")).
    • Ensure that you’re trying to write to the base path of the table.
    • To disable the format check, you can set the configuration property spark.databricks.delta.formatCheck.enabled to false.
  4. Example Solution:

    • Explicitly specify .write.format("parquet") instead of .format("delta") if needed.
    • In some cases, specifying parquet format explicitly can resolve compatibility issues.

Remember that Delta Lake provides powerful features like ACID transactions, schema evolution, and time travel.

If you encounter any further issues, feel free to ask for more assistance! 😊

 
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.