02-01-2023 03:24 AM
Hiya,
I'm trying to run `pyspark` with `databricks-connect==11.30.b0`, but am failing.
The trace I see is
```
File "/home/agagrins/databricks9/lib/python3.9/site-packages/py4j/java_gateway.py", line 1321, in __call__
return_value = get_return_value(
File "/home/agagrins/databricks9/lib/python3.9/site-packages/pyspark/sql/utils.py", line 196, in deco
return f(*a, **kw)
File "/home/agagrins/databricks9/lib/python3.9/site-packages/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o33.sql.
: org.apache.spark.SparkException: There is no Credential Scope.
at com.databricks.unity.UCSDriver$Manager.$anonfun$currentScopeId$1(UCSDriver.scala:94)
at scala.Option.getOrElse(Option.scala:189)
at com.databricks.unity.UCSDriver$Manager.currentScopeId(UCSDriver.scala:94)
at com.databricks.unity.UCSDriver$Manager.currentScope(UCSDriver.scala:97)
at com.databricks.unity.UnityCredentialScope$.currentScope(UnityCredentialScope.scala:100)
at com.databricks.unity.UnityCredentialScope$.getCredentialManager(UnityCredentialScope.scala:128)
at com.databricks.unity.CredentialManager$.getUnityApiTokenOpt(CredentialManager.scala:456)
at com.databricks.unity.UnityCatalogClientHelper$.getToken(UnityCatalogClientHelper.scala:34)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$getCatalog$1(ManagedCatalogClientImpl.scala:163)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$1(ManagedCatalogClientImpl.scala:2904)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException(ErrorDetailsHandler.scala:25)
at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException$(ErrorDetailsHandler.scala:23)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.wrapServiceException(ManagedCatalogClientImpl.scala:77)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.recordAndWrapException(ManagedCatalogClientImpl.scala:2903)
at com.databricks.managedcatalog.ManagedCatalogClientImpl.getCatalog(ManagedCatalogClientImpl.scala:156)
at com.databricks.sql.managedcatalog.ManagedCatalogCommon.catalogExists(ManagedCatalogCommon.scala:94)
at com.databricks.sql.managedcatalog.PermissionEnforcingManagedCatalog.catalogExists(PermissionEnforcingManagedCatalog.scala:177)
at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.catalogExists(ManagedCatalogSessionCatalog.scala:384)
at com.databricks.sql.DatabricksCatalogManager.isCatalogRegistered(DatabricksCatalogManager.scala:104)
at org.apache.spark.sql.SparkServiceCatalogV2Handler$.catalogOperationV2(SparkServiceCatalogV2Handler.scala:58)
at com.databricks.service.SparkServiceImpl$.$anonfun$catalogOperationV2$1(SparkServiceImpl.scala:165)
```
I've tried to Google "There is no Credential Scope", but to no avail. Anyone have a clue of where to start to look?
02-01-2023 03:28 AM
where you are running?
02-01-2023 03:32 AM
I'm starting the run locally, with Python 3.9.1 under WSL, but the idea then is to run the job in Databricks on AWS
02-05-2023 11:58 PM
Hello @Aigars Grins. Can you tell me a bit more about what you are trying to run via Databricks Connect? Generally, we recommend using dbx for local development over Databricks Connect.
Could you also provide more information on what type of compute you are connecting to? Such as runtime and whether it is running on Unity Catalog or the legacy Hive Metastore?
02-06-2023 05:23 AM
My understanding is that there are three main ways for me to work with Databricks: `databricks-connect`, `databricks-sql-connector`, and `dbx`. I'm trying out all three, for slightly different purposes, to see what fits our worksflows best where.
02-06-2023 05:27 AM
As for the problem above it seems to have gone away. While I'm not sure, it felt a bit like I didn't do anything different. Buy instead I'm faced with a much more mundane situation.
Again, I'm here trying to make `databricks-connect` work.
I simply do
```
$ python3 -m venv ~/databricks11
$ . ~/databricks11/bin/activate
$ pip install --upgrade pip
$ pip install --upgrade setuptools
$ pip install databricks-connect==11.3.0b0
$ databricks-connect configure
$ databricks-connect test
```
My `.databricks-connect` looks like
```
{
"host": "https://dbc-****.cloud.databricks.com",
"token": "dapi****",
"cluster_id": "0110-****,
"port": "15001"
}
```
I also have some environment variables, just in case
```
DATABRICKS_ADDRESS=https://dbc-****.cloud.databricks.com
DATABRICKS_API_TOKEN=dapi****
DATABRICKS_CLUSTER_ID=0110-****
DATABRICKS_PORT=15001
```
But I get an error
```
23/02/03 11:47:17 ERROR SparkClientManager: Fail to get the SparkClient
java.util.concurrent.ExecutionException: com.databricks.service.SparkServiceConnectionException: Invalid token
To connect to a Databricks cluster, you must specify an API token.
API Token: The API token used to confirm your identity to Databricks
- Learn more about API tokens here: https://docs.databricks.com/api/latest/authentication.html#generate-a-token
- Get current value: spark.conf.get("spark.databricks.service.token")
- Set via conf: spark.conf.set("spark.databricks.service.token", <your API token>)
- Set via environment variable: export DATABRICKS_API_TOKEN=<your API token>
```
02-06-2023 05:28 AM
The cluster I'm connecting to runs "11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12)"
02-06-2023 06:13 AM
Hmm, the connect info looks good to me. Can you try either of the following, see if you still get the error:
OR
Lastly, as stated in the documentation, we recommend running dbx for local development, over databricks-connect. Is there anything specific you believe you can do with databricks-connect, which you cannot achieve with dbx?
02-09-2023 01:28 AM
I tried with creating a new cluster, for 10.4, but that didn't get my anywhere either. The steps I followed where:
```
$ databricks clusters create --json-file cluster.json
```
Where `cluster.json` looks like
```
{
"cluster_name": "test50",
"spark_version": "10.4.x-scala2.12",
"spark_conf": {
"spark.databricks.service.client.enabled": true,
"spark.databricks.service.server.enabled": true,
"spark.speculation": true,
"spark.sql.session.timeZone": "UTC"
},
"spark_env_vars": {
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
},
"node_type_id": "i3.xlarge",
"autoscale": {
"min_workers": 1,
"max_workers": 8
},
"autotermination_minutes": 10,
"aws_attributes": {
"first_on_demand": 0,
"availability": "SPOT_WITH_FALLBACK",
"zone_id": "eu-west-1b",
"spot_bid_price_percent": 100
},
"enable_elastic_disk": false,
"data_security_mode": "SINGLE_USER",
"single_user_name": "****"
}
```
And then
```
$ python3 -m venv ~/databricks12
$ . ~/databricks12/bin/activate
$ pip install --upgrade pip
$ pip install --upgrade setuptools
$ pip install databricks-connect==10.4.18
$ databricks-connect test
```
And the result is as before
```
23/02/09 10:22:14 ERROR SparkServiceRPCClient: Failed to sync with the spark cluster. This could be a intermittent issue, please check your cluster's state and retry.
com.databricks.service.SparkServiceConnectionException: Invalid token
To connect to a Databricks cluster, you must specify an API token.
API Token: The API token used to confirm your identity to Databricks
- Learn more about API tokens here: https://docs.databricks.com/api/latest/authentication.html#generate-a-token
- Get current value: spark.conf.get("spark.databricks.service.token")
- Set via conf: spark.conf.set("spark.databricks.service.token", <your API token>)
- Set via environment variable: export DATABRICKS_API_TOKEN=<your API token>
```
02-09-2023 05:33 AM
I tried your exact code on my environment and it worked without issue.
Could it be something about the token you are using and its permissions? Is it the same token you are using for the databricks CLI? What workspace permissions does the principal have?
02-09-2023 01:30 AM
I'm not sure how to test the "Run against a cluster unconnected to Unity Catalog (put it in Access mode - No isolation shared)" thingy. Could you provide a `cluster.json` with the corresponding settings?
02-09-2023 05:34 AM
Change the data_security_mode field in the cluster config to NO_ISOLATION. It's unlikely that it is related to the issues you are facing, it's more likely an issue with the configuration.
But it might be worth double checking.
02-09-2023 01:54 AM
Why then `databricks-connect` and not `dbx`? Well, I'm trying to get both to work.
I posted a related question about `dbx` https://community.databricks.com/s/feed/0D58Y00009qtFLrSAM.
My hope here is that `databricks-connect` can have a much quicker turnaround time, compared to `dbx`, since no new environments have to be set up.
02-09-2023 05:47 AM
I use the same token when working with `dbx`, and that works, so I suspect the token itself isn't a problem. I'll check the permissions
02-09-2023 06:11 AM
These are the permissions on the cluster. Is that what you wanted?
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group