Schema change and OpenSearch
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago
Let me be crystal clear: Schema Change and OpenSeach do not fit well together. However, the data pushed to it are processed and always have the same schema. The problem here is that Spark is reading a CDC feed, which is subject to Schema Change because the source table may be changed.
I attempted to solve the issue by providing the mergeSchema and schemaTrackingLocation. I think these settings are useful to Spark for the checkpoint data.
But it is not working, the code keeps failing with:
com.databricks.sql.transaction.tahoe.DeltaStreamingColumnMappingSchemaIncompatibleException: Streaming read is not supported on tables with read-incompatible schema changes (e.g. rename or drop or datatype changes).
Please provide a 'schemaTrackingLocation' to enable non-additive schema evolution for Delta stream processing.
The above error is thrown for this schema change detection. Please note that the source table has a delta.columMapping on ID enabled. This makes the diff larger, however, only a new field has been added in additive way.
@@ -178,39 +249,64 @@
"type": "integer",
"nullable": true,
"metadata": {
- "comment": "Day extraction from `action_ts`"
+ "comment": "Day extraction from `action_ts`",
+ "delta.columnMapping.id": 27,
+ "delta.columnMapping.physicalName": "day"
}
},
{
"name": "merchant_shared_request_id",
"type": "string",
"nullable": true,
- "metadata": {}
+ "metadata": {
+ "delta.columnMapping.id": 28,
+ "delta.columnMapping.physicalName": "merchant_shared_request_id"
+ }
},
{
"name": "merchant_nsid",
"type": "string",
"nullable": true,
- "metadata": {}
+ "metadata": {
+ "delta.columnMapping.id": 29,
+ "delta.columnMapping.physicalName": "merchant_nsid"
+ }
},
{
"name": "refunded_on_behalf_of",
"type": "string",
"nullable": true,
- "metadata": {}
- },parse error: Invalid numeric literal at line 1, column 2833
-
+ "metadata": {
+ "delta.columnMapping.id": 30,
+ "delta.columnMapping.physicalName": "refunded_on_behalf_of"
+ }
+ },
{
"name": "payment_provider_to_merchant",
"type": "string",
"nullable": true,
- "metadata": {}
+ "metadata": {
+ "delta.columnMapping.id": 31,
+ "delta.columnMapping.physicalName": "payment_provider_to_merchant"
+ }
},
{
"name": "idempotency",
"type": "string",
"nullable": true,
- "metadata": {}
+ "metadata": {
+ "delta.columnMapping.id": 34,
+ "delta.columnMapping.physicalName": "col-eeea8bdf-5e74-4088-8d9e-208fd9e55014"
+ }
+ },
+ {
+ "name": "payment_provider_operation_id",
+ "type": "string",
+ "nullable": true,
+ "metadata": {
+ "delta.columnMapping.id": 35,
+ "delta.columnMapping.physicalName": "col-a4b6a352-73cf-4af5-aae6-364c57d6a4cf"
+ }
}
]
}
I can handle the schema change manually, however, it will be much better in an automatic fashion.
Any idea?
0 REPLIES 0

