<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: capture return value from databricks job to local machine  by CLI in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/capture-return-value-from-databricks-job-to-local-machine-by-cli/m-p/70553#M7300</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/90838"&gt;@pshuk&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;You could check the below CLI commands:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;get-run-output&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Get the output for a single run. This is the REST API reference, which relates to the CLI command:&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://docs.databricks.com/api/workspace/jobs/getrunoutput" target="_blank"&gt;https://docs.databricks.com/api/workspace/jobs/getrunoutput&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;export-run&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;There's also the option of downloading the run output:&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://docs.databricks.com/api/workspace/jobs/exportrun" target="_self"&gt;https://docs.databricks.com/api/workspace/jobs/exportrun&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;If you want to run notebooks locally and lively observe the notebook status, as in the Databricks UI, you could also setup &lt;A href="https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html" target="_self"&gt;Databricks Connect&lt;/A&gt;&amp;nbsp;in your local IDE. More information at: &lt;A href="https://docs.databricks.com/en/dev-tools/databricks-connect/index.html#what-is-databricks-connect" target="_self"&gt;What is Databricks Connect?&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 23 May 2024 23:18:21 GMT</pubDate>
    <dc:creator>raphaelblg</dc:creator>
    <dc:date>2024-05-23T23:18:21Z</dc:date>
    <item>
      <title>capture return value from databricks job to local machine  by CLI</title>
      <link>https://community.databricks.com/t5/get-started-discussions/capture-return-value-from-databricks-job-to-local-machine-by-cli/m-p/70551#M7299</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I want to run a python code on databricks notebook and return the value to my local machine.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is the summary:&lt;/P&gt;&lt;P&gt;I upload files to volumes on databricks. I generate a md5 for local file. Once the upload is finished, I create a python script with that filename locally and upload it to my workspace at databricks. Then I have a job already created with that filename in the pipe, that I execute using "databricks job" CLI command. Now the issue is, if I want to get the output of python code running on databricks, to my local computer this will close the loop but I am not able to. Can anyone point me in the right direction.&lt;/P&gt;&lt;P&gt;here is the snippet of the code.&lt;/P&gt;&lt;P&gt;---------------------------&lt;/P&gt;&lt;P&gt;#!/usr/bin/env python3&lt;/P&gt;&lt;P&gt;def execute_dbcli(my_cmd):&lt;/P&gt;&lt;P&gt;run_args = {"shell":True, "check":True, "capture_output":True, "text":True}&lt;BR /&gt;try:&lt;BR /&gt;subprocess.run(my_cmd, **run_args)&lt;BR /&gt;flag = 1&lt;BR /&gt;except:&lt;BR /&gt;flag = 0&lt;/P&gt;&lt;P&gt;return(flag)&lt;BR /&gt;#-----------------------------------------------------------&lt;BR /&gt;def create_md5_file(md5_ip_file,md5_op_file,ip_file):&lt;/P&gt;&lt;P&gt;search_text = "ip_file"&lt;BR /&gt;target_text = ip_file&lt;BR /&gt;# change in the python code locally&lt;BR /&gt;with open(md5_ip_file,"r") as file:&lt;BR /&gt;data = file.read()&lt;BR /&gt;data = data.replace(search_text,target_text)&lt;BR /&gt;with open(md5_op_file,"w") as file:&lt;BR /&gt;file.write(data)&lt;/P&gt;&lt;P&gt;return&lt;BR /&gt;#-----------------------------------------------------------&lt;BR /&gt;def check_md5(ip_file):&lt;/P&gt;&lt;P&gt;md5 = hashlib.md5()&lt;/P&gt;&lt;P&gt;with open(ip_file,'rb')as fip:&lt;BR /&gt;fil_has = md5&lt;BR /&gt;data = fip.read()&lt;/P&gt;&lt;P&gt;fil_has.update(data)&lt;BR /&gt;ip_md5 = fil_has.hexdigest()&lt;/P&gt;&lt;P&gt;return(ip_md5)&lt;BR /&gt;#-----------------------------------------------------------&lt;BR /&gt;import hashlib&lt;BR /&gt;from databricks.sdk import WorkspaceClient&lt;BR /&gt;import subprocess&lt;/P&gt;&lt;P&gt;w = WorkspaceClient()&lt;/P&gt;&lt;P&gt;ip_file = "Upload_Summary.csv"&lt;/P&gt;&lt;P&gt;loc_md5 = check_md5(ip_file)&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;dbfs_loc="dbfs:/Volumes/bgem_dev/wgs_live_data/test/"&lt;BR /&gt;my_cmd=f"databricks fs cp {ip_file} {dbfs_loc}{ip_file}"&lt;/P&gt;&lt;P&gt;flag_transfer = execute_dbcli(my_cmd)&lt;/P&gt;&lt;P&gt;if flag_transfer:&lt;BR /&gt;print("File:{ip_file} transferred to Databricks successfully\n")&lt;BR /&gt;print(f"let's work on MD5 checksum\n")&lt;/P&gt;&lt;P&gt;db_md5_gen = "dbfs_md5_generic.py"&lt;BR /&gt;db_md5_file = "dbfs_md5.py"&lt;BR /&gt;create_md5_file(db_md5_gen,db_md5_file,ip_file)&lt;BR /&gt;print(f"file ready to be transferred to Databricks for MD5 checksum\n")&lt;/P&gt;&lt;P&gt;# if MD5 workspace is there, delete it.&lt;BR /&gt;my_cmd = f"/usr/local/bin/databricks workspace list /Workspace/Users/prs0223@baylorgenetics.com/MD5"&lt;BR /&gt;flag_workspace = execute_dbcli(my_cmd)&lt;/P&gt;&lt;P&gt;if flag_workspace:&lt;BR /&gt;print(f"MD5 workspace exists, so delete it\n")&lt;BR /&gt;my_cmd = f"/usr/local/bin/databricks workspace delete /Workspace/Users/prs0223@baylorgenetics.com/MD5"&lt;BR /&gt;flag_workspace_delete = execute_dbcli(my_cmd)&lt;/P&gt;&lt;P&gt;if flag_workspace_delete:&lt;BR /&gt;print(f"Workspace MD5 deleted, now transfer the MD5 file and recreate workspace\n")&lt;/P&gt;&lt;P&gt;my_cmd = f"/usr/local/bin/databricks workspace import /Workspace/Users/prs0223@baylorgenetics.com/MD5 --file {db_md5_file} --language PYTHON"&lt;BR /&gt;flag_workspace_create = execute_dbcli(my_cmd)&lt;/P&gt;&lt;P&gt;if flag_workspace_create:&lt;BR /&gt;print("MD5 workspace recreated\n")&lt;BR /&gt;job_ID = 887420801374114&lt;BR /&gt;my_cmd = f"/usr/local/bin/databricks jobs run-now {job_ID}"&lt;BR /&gt;flag_job_run = execute_dbcli(my_cmd)&lt;/P&gt;&lt;P&gt;if flag_job_run:&lt;BR /&gt;print(f"job successful")&lt;BR /&gt;else:&lt;BR /&gt;print(f"job run not successful")&lt;/P&gt;&lt;P&gt;---------------------------------------------------------------&lt;/P&gt;</description>
      <pubDate>Thu, 23 May 2024 21:03:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/capture-return-value-from-databricks-job-to-local-machine-by-cli/m-p/70551#M7299</guid>
      <dc:creator>pshuk</dc:creator>
      <dc:date>2024-05-23T21:03:08Z</dc:date>
    </item>
    <item>
      <title>Re: capture return value from databricks job to local machine  by CLI</title>
      <link>https://community.databricks.com/t5/get-started-discussions/capture-return-value-from-databricks-job-to-local-machine-by-cli/m-p/70553#M7300</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/90838"&gt;@pshuk&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;You could check the below CLI commands:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;get-run-output&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Get the output for a single run. This is the REST API reference, which relates to the CLI command:&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://docs.databricks.com/api/workspace/jobs/getrunoutput" target="_blank"&gt;https://docs.databricks.com/api/workspace/jobs/getrunoutput&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;export-run&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;There's also the option of downloading the run output:&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://docs.databricks.com/api/workspace/jobs/exportrun" target="_self"&gt;https://docs.databricks.com/api/workspace/jobs/exportrun&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;If you want to run notebooks locally and lively observe the notebook status, as in the Databricks UI, you could also setup &lt;A href="https://docs.databricks.com/en/dev-tools/databricks-connect/python/index.html" target="_self"&gt;Databricks Connect&lt;/A&gt;&amp;nbsp;in your local IDE. More information at: &lt;A href="https://docs.databricks.com/en/dev-tools/databricks-connect/index.html#what-is-databricks-connect" target="_self"&gt;What is Databricks Connect?&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 23 May 2024 23:18:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/capture-return-value-from-databricks-job-to-local-machine-by-cli/m-p/70553#M7300</guid>
      <dc:creator>raphaelblg</dc:creator>
      <dc:date>2024-05-23T23:18:21Z</dc:date>
    </item>
  </channel>
</rss>

