<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Databrikcs job cli in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databrikcs-job-cli/m-p/18568#M12343</link>
    <description>&lt;P&gt;An example-&lt;/P&gt;&lt;P&gt;Package name is test_wheel and the entry point is hello_world:&lt;/P&gt;&lt;P&gt;The package name refers to the folder in my project that contains the __init__.py&amp;nbsp;and the entry point is the method to call.&lt;/P&gt;&lt;P&gt;&lt;A href="https://code.py/" alt="https://code.py/" target="_blank"&gt;code.py&lt;/A&gt;&amp;nbsp;(under test_wheel) contains a method named hello_world which just prints helloWorld. We import hello_world in __init__.py so that it is available at the root of the package.&lt;/P&gt;&lt;P&gt;In&amp;nbsp;&lt;A href="https://setup.py/" alt="https://setup.py/" target="_blank"&gt;setup.py&lt;/A&gt;&amp;nbsp;we include the test_wheel package. After building it, we upload the wheel as part of the job task. The job will print "helloWorld" in its logs.&lt;/P&gt;&lt;P&gt;In your case, in&amp;nbsp;&lt;A href="https://setup.py/" alt="https://setup.py/" target="_blank"&gt;setup.py&lt;/A&gt;&amp;nbsp;you could add test_wheel.code:hello_world for entry points.&lt;/P&gt;</description>
    <pubDate>Wed, 08 Jun 2022 17:53:37 GMT</pubDate>
    <dc:creator>Vivian_Wilfred</dc:creator>
    <dc:date>2022-06-08T17:53:37Z</dc:date>
    <item>
      <title>Databrikcs job cli</title>
      <link>https://community.databricks.com/t5/data-engineering/databrikcs-job-cli/m-p/18563#M12338</link>
      <description>&lt;P&gt;Hey guys, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm trying to create a job via databricks cli, This job is going to use a wheell file that I already upload to dbfs and exported from this package the entry point that needed for the job.&lt;/P&gt;&lt;P&gt;In the UI I can see that the job has been created, But when I'm trying to run this job Im getting an error that i need Manage access in order to install libraries on a cluster( Cluster scooped libraries) .&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My questions are - &lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Is there a way to create a job via databricks cli and to install packages but in a Notebook scooped manner? ( Without getting a Manage access for a cluster) &lt;/LI&gt;&lt;LI&gt;Let's say instead of using existing cluster, while creating the job i will create a new cluster - should there be any problems to install libraries with this way?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;my job_config.json file:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;{
  "name": "test_databricks_cli_jobs",
  "tasks": [
    {"task_key": "Test_train_entrypoint",
      "description": "test print in train entrypoint",
      "depends_on": [],
      "existing_cluster_id": "Myicluster-id",
      "python_wheel_task": {
        "package_name":  "testpack",
        "entry_point": "train",
        "parameters": ["Random", "This is a test message"]
      },
      "libraries": [
      {"whl": "/dbfs/FileStore/jars/test/testpack-0.0.1-py3-none-any.whl"}
  ]
&amp;nbsp;
    }
  ]
&amp;nbsp;
}&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;command to deploy the job:&lt;/P&gt;&lt;P&gt;databricks jobs create --json-file job_config.json --version=2.1&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope some one can help me.&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Jun 2022 15:28:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databrikcs-job-cli/m-p/18563#M12338</guid>
      <dc:creator>Orianh</dc:creator>
      <dc:date>2022-06-06T15:28:26Z</dc:date>
    </item>
    <item>
      <title>Re: Databrikcs job cli</title>
      <link>https://community.databricks.com/t5/data-engineering/databrikcs-job-cli/m-p/18565#M12340</link>
      <description>&lt;P&gt;Hey Kaniz, Thanks for your answer.&lt;/P&gt;&lt;P&gt;I'm not sure you understood me - I will try to make it more clear.&lt;/P&gt;&lt;P&gt;I'm trying to automate ML training process for our developers using databricks.&lt;/P&gt;&lt;P&gt;When developer finish with all his code - I packaging it into a wheel file and upload it into databricks file system, This package have an entry points let's call it train for now.&lt;/P&gt;&lt;P&gt;I managed to create a job using the CLI with all the needed configuration but when im trying to run the job im getting an error - cluster Manage access needed to install cluster libraries.&lt;/P&gt;&lt;P&gt;All the code in the wheel file are .py files.   &lt;/P&gt;&lt;P&gt;In the job_config.json file i declared on libraries needed for the job to run ( e.g. the wheel file that have been already uploaded) &lt;/P&gt;&lt;P&gt;Is there a way to run the job without getting Manage access error? - to install the library just for the job scope and not for the cluster? ( like notebook scope lib) &lt;/P&gt;&lt;P&gt;Hope its more clear now, If not let me know and i will try to explain better&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jun 2022 14:42:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databrikcs-job-cli/m-p/18565#M12340</guid>
      <dc:creator>Orianh</dc:creator>
      <dc:date>2022-06-07T14:42:33Z</dc:date>
    </item>
    <item>
      <title>Re: Databrikcs job cli</title>
      <link>https://community.databricks.com/t5/data-engineering/databrikcs-job-cli/m-p/18566#M12341</link>
      <description>&lt;P&gt;Hi @orian hindi​&amp;nbsp;, adding the wheel package in the "libraries" section of json file will always try to install the whl on a cluster level that requires manage access, irrespective of job cluster or an existing interactive cluster. You cannot achieve it this way without having the necessary permission on the cluster.&lt;/P&gt;&lt;P&gt;Have you tried to install the whl directly on your code/notebook that is attached to the job run?&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/libraries/notebooks-python-libraries.html#install-a-package-from-dbfs-with-pip" alt="https://docs.databricks.com/libraries/notebooks-python-libraries.html#install-a-package-from-dbfs-with-pip" target="_blank"&gt;https://docs.databricks.com/libraries/notebooks-python-libraries.html#install-a-package-from-dbfs-with-pip&lt;/A&gt;&lt;/P&gt;&lt;P&gt;%pip install /dbfs/mypackage-0.0.1-py3-none-any.whl&lt;/P&gt;&lt;P&gt;This will install the library just for the job run scope and not on the cluster.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2022 14:36:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databrikcs-job-cli/m-p/18566#M12341</guid>
      <dc:creator>Vivian_Wilfred</dc:creator>
      <dc:date>2022-06-08T14:36:51Z</dc:date>
    </item>
    <item>
      <title>Re: Databrikcs job cli</title>
      <link>https://community.databricks.com/t5/data-engineering/databrikcs-job-cli/m-p/18567#M12342</link>
      <description>&lt;P&gt;Hey Vivian, Thanks for the answer.&lt;/P&gt;&lt;P&gt;I got permission to create clusters for now, Instead of using existing cluster - each job will be linked to new cluster for the run - Its solves the problem of the permissions to install lib on the cluster. ( in the config_job.json instead of using existing passed cluster spec to new_cluster key).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;After i managed to install the library i faced a new problem and you might help me with that..&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I set an entry point name train - train its a function in my package that gets 2 params -(name, message) &lt;/P&gt;&lt;P&gt;should this entrypoint need to set with setup.py entry_points filed? or inside my init module should i export the function? -- from .file import train?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When i tried to export function that don't get any params its works fine only by exporting the function in the init file -- from .file import print_name&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope i explained my problem and you can help me,&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2022 14:48:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databrikcs-job-cli/m-p/18567#M12342</guid>
      <dc:creator>Orianh</dc:creator>
      <dc:date>2022-06-08T14:48:30Z</dc:date>
    </item>
    <item>
      <title>Re: Databrikcs job cli</title>
      <link>https://community.databricks.com/t5/data-engineering/databrikcs-job-cli/m-p/18568#M12343</link>
      <description>&lt;P&gt;An example-&lt;/P&gt;&lt;P&gt;Package name is test_wheel and the entry point is hello_world:&lt;/P&gt;&lt;P&gt;The package name refers to the folder in my project that contains the __init__.py&amp;nbsp;and the entry point is the method to call.&lt;/P&gt;&lt;P&gt;&lt;A href="https://code.py/" alt="https://code.py/" target="_blank"&gt;code.py&lt;/A&gt;&amp;nbsp;(under test_wheel) contains a method named hello_world which just prints helloWorld. We import hello_world in __init__.py so that it is available at the root of the package.&lt;/P&gt;&lt;P&gt;In&amp;nbsp;&lt;A href="https://setup.py/" alt="https://setup.py/" target="_blank"&gt;setup.py&lt;/A&gt;&amp;nbsp;we include the test_wheel package. After building it, we upload the wheel as part of the job task. The job will print "helloWorld" in its logs.&lt;/P&gt;&lt;P&gt;In your case, in&amp;nbsp;&lt;A href="https://setup.py/" alt="https://setup.py/" target="_blank"&gt;setup.py&lt;/A&gt;&amp;nbsp;you could add test_wheel.code:hello_world for entry points.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jun 2022 17:53:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databrikcs-job-cli/m-p/18568#M12343</guid>
      <dc:creator>Vivian_Wilfred</dc:creator>
      <dc:date>2022-06-08T17:53:37Z</dc:date>
    </item>
    <item>
      <title>Re: Databrikcs job cli</title>
      <link>https://community.databricks.com/t5/data-engineering/databrikcs-job-cli/m-p/18569#M12344</link>
      <description>&lt;P&gt;Hey @Vivian Wilfred​&amp;nbsp;, &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When I'm setting a function as entry points without any parameters, everything works.&lt;/P&gt;&lt;P&gt;I'M GETTING AN ERROR when I'm trying to set a function with params as an entry point.&lt;/P&gt;&lt;P&gt;How do Databricks pass those parameters to the entry point?&lt;/P&gt;&lt;P&gt;My package structure is minimal:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;-testpack&lt;/P&gt;&lt;P&gt;--&lt;A href="https://init.py" alt="https://init.py" target="_blank"&gt;init.py&lt;/A&gt;&lt;/P&gt;&lt;P&gt;--&lt;A href="https://main.py" alt="https://main.py" target="_blank"&gt;main.py&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;sharing some code:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;__init.py__ file:
&amp;nbsp;
from .main import train
&amp;nbsp;
---
main.py file:
&amp;nbsp;
def train(name, message):
  print(f"{name} said {message}")
&amp;nbsp;
&amp;nbsp;
---
tried in setup.py to add entry_points few things that didnt worked( not sure im currect):
&amp;nbsp;
setuptools.setup(
...,
 entry_points={
            'train': [
                'train=testpack.main:train' / 'train=main:train'
            ]
        }&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Hope you can help me, Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jun 2022 08:30:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databrikcs-job-cli/m-p/18569#M12344</guid>
      <dc:creator>Orianh</dc:creator>
      <dc:date>2022-06-09T08:30:56Z</dc:date>
    </item>
    <item>
      <title>Re: Databrikcs job cli</title>
      <link>https://community.databricks.com/t5/data-engineering/databrikcs-job-cli/m-p/18572#M12347</link>
      <description>&lt;P&gt;Hey Kaniz, Sorry for the late response.&lt;/P&gt;&lt;P&gt;I think i figured it out, Databricks pass parameters to entry point via cmd.&lt;/P&gt;&lt;P&gt;There is two ways to set an entry point for a job -&lt;/P&gt;&lt;P&gt;1) Using entry point in setup.py - like Vivian mentioned  in the above answer.&lt;/P&gt;&lt;P&gt;2) export function from init file of the package ( e.g. from .main import func)&lt;/P&gt;&lt;P&gt;Those two approaches must don't get any parameters, the parameters passed via cmd so you can get the params using argparser or from sys.argv&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Jun 2022 10:45:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databrikcs-job-cli/m-p/18572#M12347</guid>
      <dc:creator>Orianh</dc:creator>
      <dc:date>2022-06-13T10:45:16Z</dc:date>
    </item>
  </channel>
</rss>

