cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Can we pass Databricks output to Azure function body?

Yogi
New Contributor III

Hi,

Can anyone help me with Databricks and Azure function.

I'm trying to pass databricks json output to azure function body in ADF job, is it possible?

If yes, How?

If No, what other alternative to do the same?

1 ACCEPTED SOLUTION

Accepted Solutions

AbhishekNarain_
New Contributor III

You can now pass values back to ADF from a notebook.@@Yogiโ€‹ 

Though there is a size limit, so if you are passing dataset of larger than 2MB then rather write it on storage, and consume it directly with Azure Functions. You can pass the file path/ reference to ADF --> Azure Functions.

View solution in original post

15 REPLIES 15

girivaratharaja
New Contributor III

Hi @Yogiโ€‹  Can you explain what do you mean by Databricks json output? Are you referring to a dataframe output written as json format?

Yogi
New Contributor III

Hi @girivaratharajanโ€‹ , I'm reading data from table and converting to JSON in databricks. The JSON object then has to be passed to function.

I'm trying to achieve this in ADF. Where i'm linking the components and trying to pass Databricks JSON output to function as an argument/function body.

Hope that was clear.

DonatienTessier
Contributor

Hi,

One way is to write the json with Databricks in a azure storage account:

https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html

Then, in the Azure Function you can access to the storage account:

https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob

You can orchestrate this with Azure Data Factory (https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-databricks-notebook):

1) Call Databricks activity that will produce the JSON

2) If first activity is successful, call azure function activity

Yogi
New Contributor III

Thanks for the reply @Donatien Tessierโ€‹ . Will i be able to pass the value to function body using "dbutils.notebook.exit(return value)"..?

IvanIvan_Grunev
New Contributor II

Hi @Yogiโ€‹  ,

Unfortunately, there is no way to receiving databricks output in Data Factory. We are using Blob/Data Lake storages for it or some database. If there are no a lot of data generated, you can try save data in databricks cluster local database with JDBC connection and then read it with data factory.

Yogi
New Contributor III

Thanks @Ivan Ivan Grunevโ€‹ . Was trying to use "dbutils.notebook.exit(return value)" which outputs json.

Can't it be consumed by function?

We've tried to do that, but no luck. We've tried about a few months ago, but something has been changed.

Yogi
New Contributor III

Thanks @Ivan Ivan Grunevโ€‹ . If i understand it correctly, Databricks processes the data and saves it locally. Using ADF we read the JSON and pass it to the function?

@Yogiโ€‹ , actually you can send it on remote Blob or Data Lake and then read in Data Factory. Data Factory can read different formats I would recommend using Parquet bcz JSON cannot manipulate with all data types.

Databricks FileSystem (DBFS) can handle some files locally or you can mount a point to a blob storage or a Data Lake. If you are using a Data Lake gen2, there are not yet an sdk for using Azure Function.

First, you will write the content of a dataframe in a blob storage and then you will be able to access to the files with the Azure Functions

Yogi
New Contributor III

Thanks @Ivan Ivan Grunevโ€‹ & @Donatien Tessierโ€‹  for the help.

I'm jst concerned about using too many resources. I did try putting the JSON output into local using the following command.

" dbutils.fs.put("/FileStore/my-stuff/my-file.json", jsonobject)"

I guess there's a command to overwrite the file too. Wondering if i can access the output from ADF and pass it to body of function.

If this isn't gonna work then i'll have to use blob storage. In that case can i pass the blob data as JSON to function.

DonatienTessier
Contributor

Hi,

Finally, I did what you want.

You just have to write at the end of your notebook:

dbutils.notebook.exit(<json or string content>)

Then you set up a notebook activity in data factory. And in the azure function activity, you pass a string like this in Body section:

string(activity('<name of notebook activity>').output.runOutput)

It works well with small data.

@Donatien Tessierโ€‹  Do you know if there a way to pass an output from an embedded notebook (using dbutil.notebook.exit) to the parent notebook, before accessing in Data Factory

Yes, you can:

https://docs.databricks.com/user-guide/notebooks/notebook-workflows.html#example

You will get the return value as you will do with a function.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group