Databricks Community

agagrins · ‎02-01-2023

Hiya,

I'm trying to run `pyspark` with `databricks-connect==11.30.b0`, but am failing.

The trace I see is

```

File "/home/agagrins/databricks9/lib/python3.9/site-packages/py4j/java_gateway.py", line 1321, in __call__

return_value = get_return_value(

File "/home/agagrins/databricks9/lib/python3.9/site-packages/pyspark/sql/utils.py", line 196, in deco

return f(*a, **kw)

File "/home/agagrins/databricks9/lib/python3.9/site-packages/py4j/protocol.py", line 326, in get_return_value

raise Py4JJavaError(

py4j.protocol.Py4JJavaError: An error occurred while calling o33.sql.

: org.apache.spark.SparkException: There is no Credential Scope.

at com.databricks.unity.UCSDriver$Manager.$anonfun$currentScopeId$1(UCSDriver.scala:94)

at scala.Option.getOrElse(Option.scala:189)

at com.databricks.unity.UCSDriver$Manager.currentScopeId(UCSDriver.scala:94)

at com.databricks.unity.UCSDriver$Manager.currentScope(UCSDriver.scala:97)

at com.databricks.unity.UnityCredentialScope$.currentScope(UnityCredentialScope.scala:100)

at com.databricks.unity.UnityCredentialScope$.getCredentialManager(UnityCredentialScope.scala:128)

at com.databricks.unity.CredentialManager$.getUnityApiTokenOpt(CredentialManager.scala:456)

at com.databricks.unity.UnityCatalogClientHelper$.getToken(UnityCatalogClientHelper.scala:34)

at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$getCatalog$1(ManagedCatalogClientImpl.scala:163)

at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)

at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$1(ManagedCatalogClientImpl.scala:2904)

at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException(ErrorDetailsHandler.scala:25)

at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException$(ErrorDetailsHandler.scala:23)

at com.databricks.managedcatalog.ManagedCatalogClientImpl.wrapServiceException(ManagedCatalogClientImpl.scala:77)

at com.databricks.managedcatalog.ManagedCatalogClientImpl.recordAndWrapException(ManagedCatalogClientImpl.scala:2903)

at com.databricks.managedcatalog.ManagedCatalogClientImpl.getCatalog(ManagedCatalogClientImpl.scala:156)

at com.databricks.sql.managedcatalog.ManagedCatalogCommon.catalogExists(ManagedCatalogCommon.scala:94)

at com.databricks.sql.managedcatalog.PermissionEnforcingManagedCatalog.catalogExists(PermissionEnforcingManagedCatalog.scala:177)

at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.catalogExists(ManagedCatalogSessionCatalog.scala:384)

at com.databricks.sql.DatabricksCatalogManager.isCatalogRegistered(DatabricksCatalogManager.scala:104)

at org.apache.spark.sql.SparkServiceCatalogV2Handler$.catalogOperationV2(SparkServiceCatalogV2Handler.scala:58)

at com.databricks.service.SparkServiceImpl$.$anonfun$catalogOperationV2$1(SparkServiceImpl.scala:165)

```

I've tried to Google "There is no Credential Scope", but to no avail. Anyone have a clue of where to start to look?

sher · ‎02-01-2023

where you are running?

agagrins · ‎02-01-2023

I'm starting the run locally, with Python 3.9.1 under WSL, but the idea then is to run the job in Databricks on AWS

sergiu · ‎02-05-2023

Hello @Aigars Grins. Can you tell me a bit more about what you are trying to run via Databricks Connect? Generally, we recommend using dbx for local development over Databricks Connect.

Could you also provide more information on what type of compute you are connecting to? Such as runtime and whether it is running on Unity Catalog or the legacy Hive Metastore?

agagrins · ‎02-06-2023

My understanding is that there are three main ways for me to work with Databricks: `databricks-connect`, `databricks-sql-connector`, and `dbx`. I'm trying out all three, for slightly different purposes, to see what fits our worksflows best where.

agagrins · ‎02-06-2023

As for the problem above it seems to have gone away. While I'm not sure, it felt a bit like I didn't do anything different. Buy instead I'm faced with a much more mundane situation.

Again, I'm here trying to make `databricks-connect` work.

I simply do

```

$ python3 -m venv ~/databricks11

$ . ~/databricks11/bin/activate

$ pip install --upgrade pip

$ pip install --upgrade setuptools

$ pip install databricks-connect==11.3.0b0

$ databricks-connect configure

$ databricks-connect test

```

My `.databricks-connect` looks like

```

{

"host": "https://dbc-****.cloud.databricks.com",

"token": "dapi****",

"cluster_id": "0110-****,

"port": "15001"

}

```

I also have some environment variables, just in case

```

DATABRICKS_ADDRESS=https://dbc-****.cloud.databricks.com

DATABRICKS_API_TOKEN=dapi****

DATABRICKS_CLUSTER_ID=0110-****

DATABRICKS_PORT=15001

```

But I get an error

```

23/02/03 11:47:17 ERROR SparkClientManager: Fail to get the SparkClient

java.util.concurrent.ExecutionException: com.databricks.service.SparkServiceConnectionException: Invalid token

To connect to a Databricks cluster, you must specify an API token.

API Token: The API token used to confirm your identity to Databricks

- Learn more about API tokens here: https://docs.databricks.com/api/latest/authentication.html#generate-a-token

- Get current value: spark.conf.get("spark.databricks.service.token")

- Set via conf: spark.conf.set("spark.databricks.service.token", <your API token>)

- Set via environment variable: export DATABRICKS_API_TOKEN=<your API token>

```

agagrins · ‎02-06-2023

The cluster I'm connecting to runs "11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12)"

sergiu · ‎02-06-2023

Hmm, the connect info looks good to me. Can you try either of the following, see if you still get the error:

Run against a cluster unconnected to Unity Catalog (put it in Access mode - No isolation shared)

OR

Try with an earlier runtime, like 10.4 (and appropriate version of the connector, do pip install -U "databricks-connect==10.4.*"

Lastly, as stated in the documentation, we recommend running dbx for local development, over databricks-connect. Is there anything specific you believe you can do with databricks-connect, which you cannot achieve with dbx?

agagrins · ‎02-09-2023

I tried with creating a new cluster, for 10.4, but that didn't get my anywhere either. The steps I followed where:

```

$ databricks clusters create --json-file cluster.json

```

Where `cluster.json` looks like

```

{

"cluster_name": "test50",

"spark_version": "10.4.x-scala2.12",

"spark_conf": {

"spark.databricks.service.client.enabled": true,

"spark.databricks.service.server.enabled": true,

"spark.speculation": true,

"spark.sql.session.timeZone": "UTC"

},

"spark_env_vars": {

"PYSPARK_PYTHON": "/databricks/python3/bin/python3"

},

"node_type_id": "i3.xlarge",

"autoscale": {

"min_workers": 1,

"max_workers": 8

},

"autotermination_minutes": 10,

"aws_attributes": {

"first_on_demand": 0,

"availability": "SPOT_WITH_FALLBACK",

"zone_id": "eu-west-1b",

"spot_bid_price_percent": 100

},

"enable_elastic_disk": false,

"data_security_mode": "SINGLE_USER",

"single_user_name": "****"

}

```

And then

```

$ python3 -m venv ~/databricks12

$ . ~/databricks12/bin/activate

$ pip install --upgrade pip

$ pip install --upgrade setuptools

$ pip install databricks-connect==10.4.18

$ databricks-connect test

```

And the result is as before

```

23/02/09 10:22:14 ERROR SparkServiceRPCClient: Failed to sync with the spark cluster. This could be a intermittent issue, please check your cluster's state and retry.

com.databricks.service.SparkServiceConnectionException: Invalid token

To connect to a Databricks cluster, you must specify an API token.

API Token: The API token used to confirm your identity to Databricks

- Learn more about API tokens here: https://docs.databricks.com/api/latest/authentication.html#generate-a-token

- Get current value: spark.conf.get("spark.databricks.service.token")

- Set via conf: spark.conf.set("spark.databricks.service.token", <your API token>)

- Set via environment variable: export DATABRICKS_API_TOKEN=<your API token>

```

sergiu · ‎02-09-2023

I tried your exact code on my environment and it worked without issue.

Could it be something about the token you are using and its permissions? Is it the same token you are using for the databricks CLI? What workspace permissions does the principal have?

agagrins · ‎02-09-2023

I'm not sure how to test the "Run against a cluster unconnected to Unity Catalog (put it in Access mode - No isolation shared)" thingy. Could you provide a `cluster.json` with the corresponding settings?

sergiu · ‎02-09-2023

Change the data_security_mode field in the cluster config to NO_ISOLATION. It's unlikely that it is related to the issues you are facing, it's more likely an issue with the configuration.

But it might be worth double checking.

agagrins · ‎02-09-2023

Why then `databricks-connect` and not `dbx`? Well, I'm trying to get both to work.

I posted a related question about `dbx` https://community.databricks.com/s/feed/0D58Y00009qtFLrSAM.

My hope here is that `databricks-connect` can have a much quicker turnaround time, compared to `dbx`, since no new environments have to be set up.

agagrins · ‎02-09-2023

I use the same token when working with `dbx`, and that works, so I suspect the token itself isn't a problem. I'll check the permissions

agagrins · ‎02-09-2023

These are the permissions on the cluster. Is that what you wanted?

Databricks Community

Running `pyspark` with `databricks-connect`

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences