cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Azure Data Factory and Photon

MartinH
New Contributor II

Hello, we have Databricks Python workbooks accessing Delta tables. These workbooks are scheduled/invoked by Azure Data Factory. How can I enable Photon on the linked services that are used to call Databricks?

If I specify new job cluster, there does not seem to be any way to specify the job cluster is to be Photon enabled.

There are json parameters under the advanced tab, however I cannot figure out if these will let me specify Photon.

Or, will I have to use an existing cluster?

Thanks

6 REPLIES 6

daniel_sahal
Honored Contributor III

@Martin Huige​ 

Currently I don't see a way of using Photon through ADF Databricks Linked Service.

What you can do is to use Jobs API to run Databricks notebooks from ADF instead of using Linked Service.

I have found if you use a job cluster from a pool in ADF, it starts to create a cluster per data bricks adf activity, and you end up with more than 1 cluster running.

I have a shared computer cluster for ADF with Photon/Unity enabled, and a fixed worker count. I start the Databricks cluster via the REST API before the ETL runs, saving 5/10 mins of cluster start-up time.

Once the ETL finishes, it runs the notebooks via the Databricks ADF activity and stops the cluster after the ETL has finished using the REST API.

It works well and gives you control over what gets spun up. You can also use spot instances to save resource costs.

API Reference : https://docs.databricks.com/api/workspace/clusters/ (start & terminate)

Regards
Toby
https://thedatacrew.com

Anonymous
Not applicable

@Martin Huige​ :

To enable Photon for your Databricks Python workbooks that are scheduled/invoked by Azure Data Factory, you will need to use an existing Databricks cluster that is configured with Photon. At this time, creating a new job cluster in Azure Data Factory does not provide an option to enable Photon.

To use an existing Databricks cluster that is configured with Photon, you can specify the cluster ID in the Databricks Linked Service configuration in Azure Data Factory. To do this, follow these steps:

  1. In your Azure Data Factory pipeline, click on the Databricks activity that runs the Python workbook.
  2. Click on the "Settings" tab, and then click on the "Databricks Linked Service" dropdown.
  3. Select your Databricks Linked Service from the dropdown, or create a new one if you haven't already.
  4. In the "Cluster ID" field, enter the ID of the Databricks cluster that is configured with Photon. You can find the cluster ID in the Databricks workspace UI, or you can use the Databricks API to retrieve it.
  5. Save your changes and run your pipeline.

With this configuration, the Databricks activity in your Azure Data Factory pipeline will use the specified Databricks cluster, which is configured with Photon, to run your Python workbook.

MartinH
New Contributor II

thanks you very much!

SteveG2
New Contributor II

You can enable photon on a Databricks cluster via an ADF linked service. Simply set the cluster version in the linked service to a photon enabled version (i.e. 13.3.x-photon-scala2.12).

CharlesReily
New Contributor III

When you create a cluster on Databricks, you can enable Photon by selecting the "Photon" option in the cluster configuration settings. This is typically done when creating a new cluster, and you would find the option in the advanced cluster configuration settings.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.