cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

ingestion pipeline configuration

Neelimak
Visitor

When trying to create a ingestion pipelines, auto generated cluster is hitting quota limit errors. The type of vm its trying to use is not available in our region and there seems no way to add fallback to different types of vms. Can you please help how this can be resolved. 

3 REPLIES 3

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Neelimak,

For managed ingestion pipelines, the autoโ€‘generated cluster is just a classic jobs cluster whose shape is controlled by a compute policy, so you can override the VM type and add fallbacks.

Ask a workspace admin to create or edit a Job Compute / cluster_type = dlt policy and pin it to VM sizes that actually exist in your region, for example:

 

{
  "cluster_type": { "type": "fixed", "value": "dlt" },
  "num_workers": { "type": "unlimited", "defaultValue": 1, "isOptional": true },
  "driver_node_type_id": { "type": "fixed", "value": "REGION_SUPPORTED_DRIVER" },
  "node_type_id": { "type": "fixed", "value": "REGION_SUPPORTED_WORKER" }
}
Then make sure the ingestion gateway uses this policy (via the policy family or explicit assignment).
 
To get automatic fallback to alternative VM types when that SKU hits capacity or quota, enable flexible node types for the workspace and, if needed, set driver_node_type_flexibility.alternate_node_type_ids and worker_node_type_flexibility.alternate_node_type_ids in the policy (commaโ€‘separated list of compatible SKUs).

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

Thank for this information Ashwin.

Unfortunately, I currently do not see any way to force a policy on the automatically created ingestion gateway. I am an admin in my workspace and i dont see any option in either policy tab or during pipeline creation to specify certain policy for the ingestion gateway. 

The compute that is generated is locked and is using Default "Unrestricted" Policy. And it seems its system managed and cant be edited.

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Neelimak,

I should've been a bit clearer. Internally, the ingestion gateway does run on a classic jobs cluster, and those clusters are, in general, governed by compute policies. However, for managed ingestion pipelines created via the Data Ingestion UI, the gateway compute is systemโ€‘managed and today it is always attached to the default Job Compute policy (often "Unrestricted"). There is currently no way in the UI... even as an admin... to swap the policy on that autoโ€‘generated gateway cluster, or to override its node type / flexible node types.

The docs explicitly say you can use a custom gateway policy "API only", i.e., when you define the pipeline via API/Bundles rather than the UI wizard. In that API/Bundles scenario, you can control driver_node_type_id / node_type_id and flexible node types.

So, to be precise, Gateway Compute is locked to the default policy. You canโ€™t change the VM family or fallbacks via UIโ€‘created ingestion. You can only attach a custom Job Compute policy or explicit cluster config and choose VM types/fallbacks when using the API/Bundles.

Check these links for reference.

Reference 1

Reference 2

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***