cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Can we pass Databricks output to Azure function body?

Yogi
New Contributor III

Hi,

Can anyone help me with Databricks and Azure function.

I'm trying to pass databricks json output to azure function body in ADF job, is it possible?

If yes, How?

If No, what other alternative to do the same?

1 ACCEPTED SOLUTION

Accepted Solutions

AbhishekNarain_
New Contributor III

You can now pass values back to ADF from a notebook.@@Yogi​ 

Though there is a size limit, so if you are passing dataset of larger than 2MB then rather write it on storage, and consume it directly with Azure Functions. You can pass the file path/ reference to ADF --> Azure Functions.

View solution in original post

15 REPLIES 15

girivaratharaja
New Contributor III

Hi @Yogi​  Can you explain what do you mean by Databricks json output? Are you referring to a dataframe output written as json format?

Yogi
New Contributor III

Hi @girivaratharajan​ , I'm reading data from table and converting to JSON in databricks. The JSON object then has to be passed to function.

I'm trying to achieve this in ADF. Where i'm linking the components and trying to pass Databricks JSON output to function as an argument/function body.

Hope that was clear.

DonatienTessier
New Contributor III

Hi,

One way is to write the json with Databricks in a azure storage account:

https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html

Then, in the Azure Function you can access to the storage account:

https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob

You can orchestrate this with Azure Data Factory (https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-databricks-notebook):

1) Call Databricks activity that will produce the JSON

2) If first activity is successful, call azure function activity

Yogi
New Contributor III

Thanks for the reply @Donatien Tessier​ . Will i be able to pass the value to function body using "dbutils.notebook.exit(return value)"..?

IvanIvan_Grunev
New Contributor II

Hi @Yogi​  ,

Unfortunately, there is no way to receiving databricks output in Data Factory. We are using Blob/Data Lake storages for it or some database. If there are no a lot of data generated, you can try save data in databricks cluster local database with JDBC connection and then read it with data factory.

Yogi
New Contributor III

Thanks @Ivan Ivan Grunev​ . Was trying to use "dbutils.notebook.exit(return value)" which outputs json.

Can't it be consumed by function?

We've tried to do that, but no luck. We've tried about a few months ago, but something has been changed.

Yogi
New Contributor III

Thanks @Ivan Ivan Grunev​ . If i understand it correctly, Databricks processes the data and saves it locally. Using ADF we read the JSON and pass it to the function?

@Yogi​ , actually you can send it on remote Blob or Data Lake and then read in Data Factory. Data Factory can read different formats I would recommend using Parquet bcz JSON cannot manipulate with all data types.

Databricks FileSystem (DBFS) can handle some files locally or you can mount a point to a blob storage or a Data Lake. If you are using a Data Lake gen2, there are not yet an sdk for using Azure Function.

First, you will write the content of a dataframe in a blob storage and then you will be able to access to the files with the Azure Functions

Yogi
New Contributor III

Thanks @Ivan Ivan Grunev​ & @Donatien Tessier​  for the help.

I'm jst concerned about using too many resources. I did try putting the JSON output into local using the following command.

" dbutils.fs.put("/FileStore/my-stuff/my-file.json", jsonobject)"

I guess there's a command to overwrite the file too. Wondering if i can access the output from ADF and pass it to body of function.

If this isn't gonna work then i'll have to use blob storage. In that case can i pass the blob data as JSON to function.

DonatienTessier
New Contributor III

Hi,

Finally, I did what you want.

You just have to write at the end of your notebook:

dbutils.notebook.exit(<json or string content>)

Then you set up a notebook activity in data factory. And in the azure function activity, you pass a string like this in Body section:

string(activity('<name of notebook activity>').output.runOutput)

It works well with small data.

@Donatien Tessier​  Do you know if there a way to pass an output from an embedded notebook (using dbutil.notebook.exit) to the parent notebook, before accessing in Data Factory

Yes, you can:

https://docs.databricks.com/user-guide/notebooks/notebook-workflows.html#example

You will get the return value as you will do with a function.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.