Databricks Community

mmenjivar · ‎10-08-2024

We have been testing the usage of Streaming Tables in our pipelines with different results depending on the streaming source

For Streaming Tables reading from read_files everything works as expected
For Streaming Tables reading from read_kafka we have some contradictory results when executing in a SQL Warehouse:
- It works when selecting from read_kafka as in the next block:

SELECT

*

FROM read_kafka(bootstrapServers => 'server',

subscribe => 'topic',

startingOffsets => 'earliest',

`kafka.sasl.mechanism` => 'SCRAM-SHA-512',

`kafka.security.protocol` => 'SASL_PLAINTEXT',

`kafka.sasl.jaas.config` => "kafkashaded.org.apache.kafka.common.security.scram.ScramLoginModule required username='user' password = 'pass';",

failOnDataLoss => 'false'

) limit 10;

It doesn't work when I try to create a streaming table using the same query running the script on the same SQL Warehouse:

CREATE OR REFRESH STREAMING TABLE u_marlonmenjivar.test as
SELECT
*
FROM stream read_kafka(bootstrapServers => 'server',
subscribe => 'topic',
startingOffsets => 'earliest',
`kafka.sasl.mechanism` => 'SCRAM-SHA-512',
`kafka.security.protocol` => 'SASL_PLAINTEXT',
`kafka.sasl.jaas.config` => "kafkashaded.org.apache.kafka.common.security.scram.ScramLoginModule required username='user' password = 'pass';",
failOnDataLoss => 'false'
) limit 10;

The error that returns is: terminated with exception: kafkashaded.org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeTopics This error is the same for classic, pro and serverless SQL endpoints.

When executed from a notebook it fails with Multipart table names is not supported and when I execute it without schema doesn't fail but it says

To populate your table you must either: