cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

SOAP API - Connection

AlexPedurand
New Contributor

Hello

We have a workflow in our team to perform usual monthly tasks to be ran on the first working day of the month.

Each of the ~20 users will run a clone of this workflow most likely all around the same time but with different options. Because we don't have access to Job-Compute, it runs on a few All-Purpose Computes shared across users.

The first step of this workflow consists in downloading data using a SOAP API (wrapped in a R Package). Since two months, we observed a significant degradation in performance of this task, going from ~5min to ~10 min, if it ever finishes.

It feels like the network now can't handle the possibly concurrent calls to the API. Restarting a cluster and organizing the users in a queue solves the issue but is far from being optimal.

Any recommendations for improvements here ?

Thanks

 

1 REPLY 1

feiyun0112
Honored Contributor

maybe you can set a lock before call SOAP API

python - Using a Lock with redis-py - Stack Overflow

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group