cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

GitHub Actions workflow cannot find the Databricks Unity Catalog and its tables

sagarb
New Contributor

Context: Running the train_model_py.py file stored in Databricks through GitHub Actions. The notebook reads the Unity Catalog tables for pre-processing and works fine when run through the Databricks UI. However, it gives an error when run through GitHub Actions.

Setup Details:

  • Serverless compute on the Free-Tier Databricks workspace. More info is in the Readme.md of this repo.

Resolutions Tried:

  • Verified that the Host URL and Personal Access Token are as per the Databricks documentation.
  • Verified that Unity Catalog is enabled for the workspace.
  • Verified that Free-Tier serverless compute by default allows Unity Catalog.
  • Explicitly granted permission to my email address and personal tokens.
  • Explicitly set up and enabled Unity Catalog in the notebook.
  • Tried providing the fully qualified table name (catalog.schema.tablename), but it generates a namespace error, i.e., it expects two-part names.

Interesting Observation:
Upon further investigation, I found that the GitHub Actions workflow can find the traditional Hive metastore (a.k.a. spark_catalog) tables. This is strange because I do not see this catalog or tables in the Databricks UI.

I want to be able to access the Unity Catalog and its tables when I run the file through the GitHub Actions workflow.

2 REPLIES 2

Alberto_Umana
Databricks Employee
Databricks Employee

Hi @sagarb,

It sounds like a permission issue or setup issue... what is the error you are hitting?

Since it doesn't recognize the catalog, it says unexpected word or extra word. 'workspace' is my unity catalog name. See below:

Error:

"/home/runner/work/Wine_Quality_Prediction_Model/Wine_Quality_Prediction_Model/notebooks/train_model_py.py", line 65, in load_data
26 spark.sql("USE CATALOG workspace;")
27 File "/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyspark/sql/session.py", line 1631, in sql
28 return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)
29 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
30 File "/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/py4j/java_gateway.py", line 1322, in __call__
31 return_value = get_return_value(
32 ^^^^^^^^^^^^^^^^^
33 File "/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyspark/errors/exceptions/captured.py", line 185, in deco
34 raise converted from None
35pyspark.errors.exceptions.captured.ParseException:
36[PARSE_SYNTAX_ERROR] Syntax error at or near 'workspace': extra input 'workspace'.(line 1, pos 12)
38== SQL ==
39USE CATALOG workspace;
40------------^^^

I also printed what catalogs and schemas it can see. This is the output I got in the GitHub Actions output:

spark version: 3.5.5
Current Schema: default
Catalogs: ['spark_catalog']



 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now