cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

New delta log folder is not getting created

Coders
New Contributor II

I have following code which reads the stream of data and process the data in the foreachBatch and writes to the provided path as shown below.

public static void writeToDatalake(SparkSession session, Configuration config, Dataset<Row> data, Entity entity, throws TimeoutException, StreamingQueryException {
String writePath = getWritePath(config, entity, profile);
log.info(writePath);
data.writeStream()
.outputMode(OutputMode.Update))
.format(source:"delta")
.foreachBatch(mergeData(session, config, entity, writePath))
.option("checkpointLocation", writePath + CHECKPOINT_PATH)
.trigger(Trigger.Once())
start()
.awaitTermination();
}


In this merge function, I'm checking if the table exists in the path, if not I'm creating new one and writing the data. If it exists, I'm straight forwardly writing as shown below.

I have used foreachBatch to specify a custom function (mergeData) to be executed for each batch of data generated by the streaming query. The mergeData function is responsible for merging the current batch of data with the existing data in the Delta Lake.

public static VoidFunction2<Dataset<Row>, Long> mergeData(SparkSession session, Configuration config, Entity entity {
return (data, batchid) -> {
boolean exists = DeltaTable.isDeltaTable(writePath);
if (lexists) {
Dataset<Row> emptyDF = session.createDataFrame(new ArrayList<>(), data.schema());
emptyDF.write()
.format("delta")
.mode(SaveMode.Overwrite)
.partitionBy(
JavaConverters.asScalaBuffer(config.getReadBlobStorage.get(entity.getSource).getPartition())
.save(writePath);
log.info("created Empty df");
emptyDF.unpersist();
writeData(writePath, entity, data, config);
data.unpersist();
}
};
}

This should create deltalog folder in blob store if doesn't exists. But it's not when I run this in databricks. I'm ending up with error as shown below:

> com.databricks.sql.transaction.tahoe.DeltaAnalysisException:
> Incompatible format detected. You are trying to write to
> `abfss://<path>/`
> using Delta, but there is no transaction log present. Check the
> upstream job to make sure that it is writing using format("delta") and
> that you are trying to write to the table base path.

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group