Databricks Community

sagarb · a month ago

Context: Running the train_model_py.py file stored in Databricks through GitHub Actions. The notebook reads the Unity Catalog tables for pre-processing and works fine when run through the Databricks UI. However, it gives an error when run through GitHub Actions.

Setup Details:

Serverless compute on the Free-Tier Databricks workspace. More info is in the Readme.md of this repo.

Resolutions Tried:

Verified that the Host URL and Personal Access Token are as per the Databricks documentation.
Verified that Unity Catalog is enabled for the workspace.
Verified that Free-Tier serverless compute by default allows Unity Catalog.
Explicitly granted permission to my email address and personal tokens.
Explicitly set up and enabled Unity Catalog in the notebook.
Tried providing the fully qualified table name (catalog.schema.tablename), but it generates a namespace error, i.e., it expects two-part names.

Interesting Observation:
Upon further investigation, I found that the GitHub Actions workflow can find the traditional Hive metastore (a.k.a. spark_catalog) tables. This is strange because I do not see this catalog or tables in the Databricks UI.

I want to be able to access the Unity Catalog and its tables when I run the file through the GitHub Actions workflow.

Alberto_Umana · a month ago

Hi @sagarb,

It sounds like a permission issue or setup issue... what is the error you are hitting?

sagarb · a month ago

Since it doesn't recognize the catalog, it says unexpected word or extra word. 'workspace' is my unity catalog name. See below:

Error:

"/home/runner/work/Wine_Quality_Prediction_Model/Wine_Quality_Prediction_Model/notebooks/train_model_py.py", line 65, in load_data

26 spark.sql("USE CATALOG workspace;")

27 File "/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyspark/sql/session.py", line 1631, in sql

28 return DataFrame(self._jsparkSession.sql(sqlQuery, litArgs), self)

29 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

30 File "/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/py4j/java_gateway.py", line 1322, in __call__

31 return_value = get_return_value(

32 ^^^^^^^^^^^^^^^^^

33 File "/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pyspark/errors/exceptions/captured.py", line 185, in deco

34 raise converted from None

35pyspark.errors.exceptions.captured.ParseException:

36[PARSE_SYNTAX_ERROR] Syntax error at or near 'workspace': extra input 'workspace'.(line 1, pos 12)

37

38== SQL ==

39USE CATALOG workspace;

40------------^^^

I also printed what catalogs and schemas it can see. This is the output I got in the GitHub Actions output:

spark version: 3.5.5

Current Schema: default

Catalogs: ['spark_catalog']

Databricks Community

GitHub Actions workflow cannot find the Databricks Unity Catalog and its tables

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!