โ04-17-2019 04:50 AM
Hi,
Can anyone help me with Databricks and Azure function.
I'm trying to pass databricks json output to azure function body in ADF job, is it possible?
If yes, How?
If No, what other alternative to do the same?
โ09-10-2019 09:02 PM
You can now pass values back to ADF from a notebook.@@Yogiโ
Though there is a size limit, so if you are passing dataset of larger than 2MB then rather write it on storage, and consume it directly with Azure Functions. You can pass the file path/ reference to ADF --> Azure Functions.
โ04-17-2019 02:33 PM
Hi @Yogiโ Can you explain what do you mean by Databricks json output? Are you referring to a dataframe output written as json format?
โ04-18-2019 02:54 AM
Hi @girivaratharajanโ , I'm reading data from table and converting to JSON in databricks. The JSON object then has to be passed to function.
I'm trying to achieve this in ADF. Where i'm linking the components and trying to pass Databricks JSON output to function as an argument/function body.
Hope that was clear.
โ04-18-2019 05:55 AM
Hi,
One way is to write the json with Databricks in a azure storage account:
https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html
Then, in the Azure Function you can access to the storage account:
https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob
You can orchestrate this with Azure Data Factory (https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-databricks-notebook):
1) Call Databricks activity that will produce the JSON
2) If first activity is successful, call azure function activity
โ04-18-2019 06:52 AM
Thanks for the reply @Donatien Tessierโ . Will i be able to pass the value to function body using "dbutils.notebook.exit(return value)"..?
โ04-18-2019 11:20 PM
Hi @Yogiโ ,
Unfortunately, there is no way to receiving databricks output in Data Factory. We are using Blob/Data Lake storages for it or some database. If there are no a lot of data generated, you can try save data in databricks cluster local database with JDBC connection and then read it with data factory.
โ04-19-2019 12:12 AM
Thanks @Ivan Ivan Grunevโ . Was trying to use "dbutils.notebook.exit(return value)" which outputs json.
Can't it be consumed by function?
โ04-19-2019 01:53 AM
We've tried to do that, but no luck. We've tried about a few months ago, but something has been changed.
โ04-19-2019 05:06 AM
Thanks @Ivan Ivan Grunevโ . If i understand it correctly, Databricks processes the data and saves it locally. Using ADF we read the JSON and pass it to the function?
โ04-19-2019 05:10 AM
@Yogiโ , actually you can send it on remote Blob or Data Lake and then read in Data Factory. Data Factory can read different formats I would recommend using Parquet bcz JSON cannot manipulate with all data types.
โ04-19-2019 05:18 AM
Databricks FileSystem (DBFS) can handle some files locally or you can mount a point to a blob storage or a Data Lake. If you are using a Data Lake gen2, there are not yet an sdk for using Azure Function.
First, you will write the content of a dataframe in a blob storage and then you will be able to access to the files with the Azure Functions
โ04-19-2019 05:57 AM
Thanks @Ivan Ivan Grunevโ & @Donatien Tessierโ for the help.
I'm jst concerned about using too many resources. I did try putting the JSON output into local using the following command.
" dbutils.fs.put("/FileStore/my-stuff/my-file.json", jsonobject)"
I guess there's a command to overwrite the file too. Wondering if i can access the output from ADF and pass it to body of function.
If this isn't gonna work then i'll have to use blob storage. In that case can i pass the blob data as JSON to function.
โ04-30-2019 01:24 AM
Hi,
Finally, I did what you want.
You just have to write at the end of your notebook:
dbutils.notebook.exit(<json or string content>)
Then you set up a notebook activity in data factory. And in the azure function activity, you pass a string like this in Body section:
string(activity('<name of notebook activity>').output.runOutput)
It works well with small data.
โ08-22-2019 08:26 AM
@Donatien Tessierโ Do you know if there a way to pass an output from an embedded notebook (using dbutil.notebook.exit) to the parent notebook, before accessing in Data Factory
โ09-02-2019 08:38 AM
Yes, you can:
https://docs.databricks.com/user-guide/notebooks/notebook-workflows.html#example
You will get the return value as you will do with a function.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group