cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Informatica API data retrieval through Datrbicks Workflows or ADF!! Which is better?

rohit_kumar
New Contributor

                                                            

rohit_kumar_0-1724760571211.png

 

 

The above set of activities took some 4 hours at ADF to explore and design with greater ease of use, connections, monitoring and it could have probably taken 4 days or more using Databricks Workflows.

 

The integration runtime at ADF was just up for under a minute to complete all these activities with limited compute usage probably lower than workflows in this case as it is not compute intensive.

 

Can there be a better solution through Databricks with the similar ease of use and lesser compute?

 

Illustrating the detailed steps performed at ADF below:

 

 

  • The first step is the generation of Session ID:

 

rohit_kumar_1-1724760571240.png

 

 

It will require a web activity to be created in ADF Pipeline with a Post Method and body incorporating username and password and then the API end points of Informatica login will be hit.

 

The body can be passed with secret values from Key vault for the implementation purpose. A random user and password have been placed in the body for demonstration purpose.

 

It will generate an output set from which “orgUuid” and “iCsessionid” will be used for the next step of Access Token Generation.

 

rohit_kumar_2-1724760571264.png

 

 

rohit_kumar_3-1724760571282.png

 

 

This will be dynamically generated and will get changed with every execution.

 

  • The second step is Access Token Generation:

 

rohit_kumar_4-1724760571305.png

 

 

It will require another subsequent web activity to be created in ADF Pipeline with a Post Method and the URL for Access token generation.

It will have an additional two headers i.e., “IDS-SESSION-ID” and “Cookie” in a particular format with the URL which will be fetched dynamically from the previous activity of session ID generation.

 

rohit_kumar_5-1724760571329.png

 

 

rohit_kumar_6-1724760571349.png

 

 

Output for the Access Toke Generation step will come in below format as “jwt_token

 

rohit_kumar_7-1724760571372.png

 

  • The third step will be Job ID Generation:

 

rohit_kumar_8-1724760571399.png

 

 

It will require another subsequent web activity to be created in ADF Pipeline with a Post Method and the URL for Job ID generation.

It will have additional parameters of from and size defined in the body.

It will also have additional two headers. The first one is “X-INFRA-ORG-ID” which has been fetched dynamically from the first activity of ‘session ID generation’ and the second one is “Authorization” in a particular format which has been fetched dynamically from the previous activity of Token generation.

 

The dynamic Headers will be as shown below:

rohit_kumar_9-1724760571454.png

 

 

rohit_kumar_10-1724760571486.png

 

 

 

 

The output of the Job ID Generation:

 

rohit_kumar_11-1724760571510.png

 

 

The output of the Job_ID_Generation will be JobId, trackingURI and OutputURI which will be used in subsequent steps for tracking and getting attachment.

 

 

 

 

 

 

 

 

 

 

  • The fourth step will be Track Job Status:

 

rohit_kumar_12-1724760571539.png

 

 

It will require another subsequent web activity to be created in ADF Pipeline with GET Method this time and the dynamic URL to track job status. The URL will consist of the JOB ID generated from the previous step dynamically.

 

The below step depicts the format of the dynamic URL in concatenated format with JOB ID fetched from previous step.

 

rohit_kumar_13-1724760571567.png

 

 

 

It will also have the same two additional headers as in last step. The first one is “X-INFRA-ORG-ID” which has been fetched dynamically from the first activity of ‘session ID generation’ and the second one is “Authorization” in a particular format which has been fetched dynamically from the previous activity of Token generation.

 

 

  • The fifth step be wait and re-check after certain interval.

 

rohit_kumar_14-1724760571585.png

 

 

 

 

 

 

 

 

 

 

  • The sixth step will be Get Attachment:

 

 

rohit_kumar_15-1724760571603.png

 

 

 

 

It will require another subsequent web activity to be created in ADF Pipeline with GET Method again and the dynamic URL to retrieve attachment. The URL will consist of the “Output URI” generated from the Job ID Generation step dynamically.

 

The below step depicts the format of the dynamic URL in concatenated format with JOB ID fetched from previous step.

 

rohit_kumar_16-1724760571630.png

 

 

It will also have an additional headers “Authorization” in a particular format which has been fetched dynamically from the previous activity of Token generation as used in the last step of Job ID Generation and wait and check activity.

 

The output of the step is as below in the Response section which is an unreadable xlsx attachment. We will download this in the next step.

 

rohit_kumar_17-1724760571652.png

 

 

  • The seventh step will be Extract Data:

 

rohit_kumar_18-1724760571675.png

 

 

The last step will a copy activity to fetch the data. It will have a source dataset of type Binary with the dynamic URL passed with OutputURI generated from Job_ID_Generation step shown earlier.

 

The request method will be GET and would require an additional header for Authorization with jwt token in the format shown in earlier steps.

 

rohit_kumar_19-1724760571702.png

 

 

The associated Linked service will be a HTTP type designed with dynamic URL which can be provided during the run time from previous steps and the authentication type as Anonymous to be handled during runtime with token.

 

rohit_kumar_20-1724760571722.png

 

 

We would need to configure a Sink setting to dump the data in ADLS. The dataset designed here is of type Binary to capture the attachment as is with the dataset properties of ADLS name, container, file path, file name defined.

 

The associate Linked service of type ADLS is as shown below:

 

 

rohit_kumar_21-1724760571741.png

 

 

The file dumped at the end of the last step is as show below:

 

rohit_kumar_22-1724760571770.png

 

 

rohit_kumar_23-1724760571784.png

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 REPLY 1

Brahmareddy
Valued Contributor II

Hi @rohit_kumar, How are you dong today?

To answer to your subject line question, If you're looking for flexibility and integration, Databricks Workflows might be better since it offers native support for complex data transformations and seamless integration with Spark. However, Azure Data Factory (ADF) is ideal if you prefer a more visual and managed approach, especially for orchestrating data pipelines with multiple sources. ADF is also great for integrating with other Azure services. Ultimately, choose based on your team's familiarity and the complexity of the tasks.

Give a try and let me know if it works.

Regards,

Brahma

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group