We have been testing the usage of Streaming Tables in our pipelines with different results depending on the streaming source
- For Streaming Tables reading from read_files everything works as expected
- For Streaming Tables reading from read_kafka we have some contradictory results when executing in a SQL Warehouse:
- It works when selecting from read_kafka as in the next block:
SELECT
*
FROM read_kafka(bootstrapServers => 'server',
subscribe => 'topic',
startingOffsets => 'earliest',
`kafka.sasl.mechanism` => 'SCRAM-SHA-512',
`kafka.security.protocol` => 'SASL_PLAINTEXT',
`kafka.sasl.jaas.config` => "kafkashaded.org.apache.kafka.common.security.scram.ScramLoginModule required username='user' password = 'pass';",
failOnDataLoss => 'false'
) limit 10;
- It doesn't work when I try to create a streaming table using the same query running the script on the same SQL Warehouse:
CREATE OR REFRESH STREAMING TABLE u_marlonmenjivar.test as
SELECT
*
FROM stream read_kafka(bootstrapServers => 'server',
subscribe => 'topic',
startingOffsets => 'earliest',
`kafka.sasl.mechanism` => 'SCRAM-SHA-512',
`kafka.security.protocol` => 'SASL_PLAINTEXT',
`kafka.sasl.jaas.config` => "kafkashaded.org.apache.kafka.common.security.scram.ScramLoginModule required username='user' password = 'pass';",
failOnDataLoss => 'false'
) limit 10;
The error that returns is: terminated with exception: kafkashaded.org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeTopics This error is the same for classic, pro and serverless SQL endpoints.
When executed from a notebook it fails with Multipart table names is not supported and when I execute it without schema doesn't fail but it says
To populate your table you must either:
- Run an existing pipeline using the Delta Live Tables menu
- Create a new pipeline
According to documentation, the DLT pipeline should be created automatically, any clue on what I'm doing wrong?