cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

New delta log folder is not getting created

Coders
New Contributor II

I have following code which reads the stream of data and process the data in the foreachBatch and writes to the provided path as shown below.

public static void writeToDatalake(SparkSession session, Configuration config, Dataset<Row> data, Entity entity, throws TimeoutException, StreamingQueryException {
String writePath = getWritePath(config, entity, profile);
log.info(writePath);
data.writeStream()
.outputMode(OutputMode.Update))
.format(source:"delta")
.foreachBatch(mergeData(session, config, entity, writePath))
.option("checkpointLocation", writePath + CHECKPOINT_PATH)
.trigger(Trigger.Once())
start()
.awaitTermination();
}


In this merge function, I'm checking if the table exists in the path, if not I'm creating new one and writing the data. If it exists, I'm straight forwardly writing as shown below.

I have used foreachBatch to specify a custom function (mergeData) to be executed for each batch of data generated by the streaming query. The mergeData function is responsible for merging the current batch of data with the existing data in the Delta Lake.

public static VoidFunction2<Dataset<Row>, Long> mergeData(SparkSession session, Configuration config, Entity entity {
return (data, batchid) -> {
boolean exists = DeltaTable.isDeltaTable(writePath);
if (lexists) {
Dataset<Row> emptyDF = session.createDataFrame(new ArrayList<>(), data.schema());
emptyDF.write()
.format("delta")
.mode(SaveMode.Overwrite)
.partitionBy(
JavaConverters.asScalaBuffer(config.getReadBlobStorage.get(entity.getSource).getPartition())
.save(writePath);
log.info("created Empty df");
emptyDF.unpersist();
writeData(writePath, entity, data, config);
data.unpersist();
}
};
}

This should create deltalog folder in blob store if doesn't exists. But it's not when I run this in databricks. I'm ending up with error as shown below:

> com.databricks.sql.transaction.tahoe.DeltaAnalysisException:
> Incompatible format detected. You are trying to write to
> `abfss://<path>/`
> using Delta, but there is no transaction log present. Check the
> upstream job to make sure that it is writing using format("delta") and
> that you are trying to write to the table base path.

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now