Topics with Label: Checkpoint Directory

Forum Posts

Sorted by:

by DimaP • New Contributor II

05-23-2023 3:01:22 PM

1279 Views
0 replies
0 kudos

Does Spark Structured streaming query require write-ahead logs for at least once delivery with ForeachBatch output sink?

Is it sufficient to use the checkpoint directory with write-ahead logs?BTW. I use Kafka connector to read data from EventHub

Data Engineering

1279 Views
0 replies
0 kudos

05-23-2023 3:01:22 PM

by Fed • New Contributor III

01-26-2023 6:36:45 AM

9545 Views
1 replies
0 kudos

Setting checkpoint directory for checkpointInterval argument of estimators in pyspark.ml

Tree-based estimators in pyspark.ml have an argument called checkpointIntervalcheckpointInterval = Param(parent='undefined', name='checkpointInterval', doc='set checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will ...

Data Engineering

9545 Views
1 replies
0 kudos

01-26-2023 6:36:45 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 7:00:04 AM

0 kudos

@Federico Trifoglio :If sc.getCheckpointDir() returns None, it means that no checkpoint directory is set in the SparkContext. In this case, the checkpointInterval argument will indeed be ignored. To set a checkpoint directory, you can use the SparkC...

0 kudos

04-10-2023 7:00:04 AM

by LearnerShahid • Databricks Partner

09-02-2022 1:31:35 AM

9463 Views
6 replies
4 kudos

Resolved! Lesson 6.1 of Data Engineering. Error when reading stream - java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.resolvePathOnPhysicalStorage(path: Path)

Below function executes fine: def autoload_to_table(data_source, source_format, table_name, checkpoint_directory): query = (spark.readStream .format("cloudFiles") .option("cloudFiles.format", source_format) .option("cloudFile...

I have verified that source data exists.

Data Engineering

9463 Views
6 replies
4 kudos

09-02-2022 1:31:35 AM

View Replies

Latest Reply

Anonymous
Not applicable

09-05-2022 4:48:52 AM

4 kudos

Autoloader is not supported on community edition.

4 kudos

09-05-2022 4:48:52 AM

5 More Replies

Databricks Community

Does Spark Structured streaming query require write-ahead logs for at least once delivery with ForeachBatch output sink?

Setting checkpoint directory for checkpointInterval argument of estimators in pyspark.ml

Resolved! Lesson 6.1 of Data Engineering. Error when reading stream - java.lang.UnsupportedOperationException: com.databricks.backend.daemon.data.client.DBFSV1.resolvePathOnPhysicalStorage(path: Path)