cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Strcutured Streaming with queue in separate storage account

ADuma
New Contributor III

Hello,

we are running a structured streaming job which consumes zipped Json files that arrive in our Azure Prod storage account. We are using AutoLoader and have set up an Eventgrid Queue which we pass to the streaming job using cloudFiles.queueName. We also specify a glob pattern which we pass to the load() function.

My goal is to have the same job running in our Test environment but consuming files that arrive in Prod. We do not want to copy over all files to Test but want to be able to test our jobs with all files that are arriving.

My problem is that if I pass the glob pattern on the Prod storage account, I get the following error.

[STREAM_FAILED] Query [id = xxx, runId = xxx] terminated with exception: The queue prod-queue-on-test doesn't exist SQLSTATE: XXKST

But if I pass the glob pattern for the Test storage account my Job does not process any files, probably because the messages from the queue do not match the glob pattern.

We're running more or less the following code.

 

 

spark.readStream.format("cloudFiles")
            .option("cloudFiles.format", data_format)
            .option("cloudFiles.schemaLocation", autoloader_schema_path)
            .option("checkpointLocation", stream_checkpoint_path)
            .option("cloudFiles.useNotifications", "true")
            .options(
                "cloudFiles.subscriptionId": subscription_id,
                "cloudFiles.clientId": client_id,
                "cloudFiles.clientSecret": kv_secret_key,
                "cloudFiles.tenantId": tenant_id,
                "cloudFiles.resourceGroup": f"rg-{stage}-xxx",
                "cloudFiles.queueName": queue_name,
            )
            .schema(input_schema)
            .load(path_glob_prefix, pathGlobFilter=path_glob_suffix)

 

 

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

You are attempting to have your Test Databricks streaming job consume files that arrive in your Prod storage, using AutoLoader and EventGrid notifications, without physically copying the data or EventGrid queue to Test. The core challenge is that EventGrid queues, glob patterns, and notification messages are tightly coupled to the specific storage account context in which they're created and referenced. Your error indicates a misalignment between the queue or glob pattern and the actual storage context being scanned, which is an architectural limitation rather than a pure configuration issue.

Key Points and Root Cause

  • AutoLoader's cloudFiles.queueName must refer to an EventGrid queue that exists and is accessible from the Databricks environment running the job.

  • Queue and notification messages are scoped to the storage account and container for which they are created.

  • If you point your Test job at the Prod storage glob pattern but use a queue that does not exist (or is inaccessible from Test), you'll hit the exception:

    text
    [STREAM_FAILED] Query ... terminated with exception: The queue prod-queue-on-test doesn't exist

    because the queue is not deployed, or not visible, to your Test workspace and resource context.

  • If you use the Test queue (which only knows about files arriving in the Test storage account), you won't see any messages for files in Prod.

Options for Testing Production Files from Test

You have limited options due to Azure security boundaries and how Databricks AutoLoader composes notifications:

1. Cross-Environment Access via Shared EventGrid

  • Expose the Prod EventGrid subscription/queue to the Test environment, making sure your Test Databricks workspace, network, and credentials have read/listen permissions on the Prod queue.

  • In practice, this means using the Prod queue name, but running the streaming job from the Test workspace, with access to the Prod EventGrid, Prod storage account, and appropriate RBAC/credentials.

  • Your subscription_id, client_id, client_secret, etc. in Test must grant access to the Prod resources.

  • This is often not recommended for strict test/prod isolation in large organizations but is technically possible if your security policy allows it.

2. Using Manifest Files or File Lists

  • Instead of using notifications, generate a manifest or list of files from Prod, and feed that list to your Test job as input (e.g., with .load() and pathGlobFilter), operating in manual/periodic mode rather than true streaming.

  • This does NOT use EventGrid notifications but allows periodic ingestion from Prod without needing a queue or subscription.

3. Dedicated Test Storage with Mirrored Notifications

  • Mirror select files (or a sample) from Prod to Test storage; configure a Test EventGrid to mirror notifications.

  • This is the "copy files" solution you said you want to avoid.

4. Temporary Shared EventGrid for Integration Testing

  • Temporarily allow Test workspace access to the Prod EventGrid queue only for integration runs, removing access after validation.

Summary Table

Scenario Queue Name / Location Storage Account Can Process Prod Files? Comments
Test job, Prod queue, Prod storage Prod queue Prod Yes, if security allows Security concerns; works if Test has keys
Test job, Test queue, Test storage Test queue Test No No Prod visibility
Test job, Test queue, Prod storage Test queue Prod No Queue has no Prod notifications
Test job, Prod queue, file list/manual N/A Prod Yes, batch only Not true streaming
 
 

Best Practice

For robust test/prod separation, it's safest to copy a subset of files to Test and configure EventGrid notifications for that scope. If you must use real Prod files for integration, ensure your Test workspace credentials have temporary access to the Prod resources and use the actual queue name and storage path, matching your streaming logic precisely. Double-check RBAC, networking, and firewall rules.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now