04-17-2019 04:50 AM
Hi,
Can anyone help me with Databricks and Azure function.
I'm trying to pass databricks json output to azure function body in ADF job, is it possible?
If yes, How?
If No, what other alternative to do the same?
09-10-2019 09:02 PM
You can now pass values back to ADF from a notebook.@@Yogi
Though there is a size limit, so if you are passing dataset of larger than 2MB then rather write it on storage, and consume it directly with Azure Functions. You can pass the file path/ reference to ADF --> Azure Functions.
04-17-2019 02:33 PM
Hi @Yogi Can you explain what do you mean by Databricks json output? Are you referring to a dataframe output written as json format?
04-18-2019 02:54 AM
Hi @girivaratharajan , I'm reading data from table and converting to JSON in databricks. The JSON object then has to be passed to function.
I'm trying to achieve this in ADF. Where i'm linking the components and trying to pass Databricks JSON output to function as an argument/function body.
Hope that was clear.
04-18-2019 05:55 AM
Hi,
One way is to write the json with Databricks in a azure storage account:
https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html
Then, in the Azure Function you can access to the storage account:
https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob
You can orchestrate this with Azure Data Factory (https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-databricks-notebook):
1) Call Databricks activity that will produce the JSON
2) If first activity is successful, call azure function activity
04-18-2019 06:52 AM
Thanks for the reply @Donatien Tessier . Will i be able to pass the value to function body using "dbutils.notebook.exit(return value)"..?
04-18-2019 11:20 PM
Hi @Yogi ,
Unfortunately, there is no way to receiving databricks output in Data Factory. We are using Blob/Data Lake storages for it or some database. If there are no a lot of data generated, you can try save data in databricks cluster local database with JDBC connection and then read it with data factory.
04-19-2019 12:12 AM
Thanks @Ivan Ivan Grunev . Was trying to use "dbutils.notebook.exit(return value)" which outputs json.
Can't it be consumed by function?
04-19-2019 01:53 AM
We've tried to do that, but no luck. We've tried about a few months ago, but something has been changed.
04-19-2019 05:06 AM
Thanks @Ivan Ivan Grunev . If i understand it correctly, Databricks processes the data and saves it locally. Using ADF we read the JSON and pass it to the function?
04-19-2019 05:10 AM
@Yogi , actually you can send it on remote Blob or Data Lake and then read in Data Factory. Data Factory can read different formats I would recommend using Parquet bcz JSON cannot manipulate with all data types.
04-19-2019 05:18 AM
Databricks FileSystem (DBFS) can handle some files locally or you can mount a point to a blob storage or a Data Lake. If you are using a Data Lake gen2, there are not yet an sdk for using Azure Function.
First, you will write the content of a dataframe in a blob storage and then you will be able to access to the files with the Azure Functions
04-19-2019 05:57 AM
Thanks @Ivan Ivan Grunev & @Donatien Tessier for the help.
I'm jst concerned about using too many resources. I did try putting the JSON output into local using the following command.
" dbutils.fs.put("/FileStore/my-stuff/my-file.json", jsonobject)"
I guess there's a command to overwrite the file too. Wondering if i can access the output from ADF and pass it to body of function.
If this isn't gonna work then i'll have to use blob storage. In that case can i pass the blob data as JSON to function.
04-30-2019 01:24 AM
Hi,
Finally, I did what you want.
You just have to write at the end of your notebook:
dbutils.notebook.exit(<json or string content>)
Then you set up a notebook activity in data factory. And in the azure function activity, you pass a string like this in Body section:
string(activity('<name of notebook activity>').output.runOutput)
It works well with small data.
08-22-2019 08:26 AM
@Donatien Tessier Do you know if there a way to pass an output from an embedded notebook (using dbutil.notebook.exit) to the parent notebook, before accessing in Data Factory
09-02-2019 08:38 AM
Yes, you can:
https://docs.databricks.com/user-guide/notebooks/notebook-workflows.html#example
You will get the return value as you will do with a function.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group