cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Running `pyspark` with `databricks-connect`

agagrins
New Contributor III

Hiya,

I'm trying to run `pyspark` with `databricks-connect==11.30.b0`, but am failing.

The trace I see is

```

 File "/home/agagrins/databricks9/lib/python3.9/site-packages/py4j/java_gateway.py", line 1321, in __call__

  return_value = get_return_value(

 File "/home/agagrins/databricks9/lib/python3.9/site-packages/pyspark/sql/utils.py", line 196, in deco

  return f(*a, **kw)

 File "/home/agagrins/databricks9/lib/python3.9/site-packages/py4j/protocol.py", line 326, in get_return_value

  raise Py4JJavaError(

py4j.protocol.Py4JJavaError: An error occurred while calling o33.sql.

: org.apache.spark.SparkException: There is no Credential Scope.

    at com.databricks.unity.UCSDriver$Manager.$anonfun$currentScopeId$1(UCSDriver.scala:94)

    at scala.Option.getOrElse(Option.scala:189)

    at com.databricks.unity.UCSDriver$Manager.currentScopeId(UCSDriver.scala:94)

    at com.databricks.unity.UCSDriver$Manager.currentScope(UCSDriver.scala:97)

    at com.databricks.unity.UnityCredentialScope$.currentScope(UnityCredentialScope.scala:100)

    at com.databricks.unity.UnityCredentialScope$.getCredentialManager(UnityCredentialScope.scala:128)

    at com.databricks.unity.CredentialManager$.getUnityApiTokenOpt(CredentialManager.scala:456)

    at com.databricks.unity.UnityCatalogClientHelper$.getToken(UnityCatalogClientHelper.scala:34)

    at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$getCatalog$1(ManagedCatalogClientImpl.scala:163)

    at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)

    at com.databricks.managedcatalog.ManagedCatalogClientImpl.$anonfun$recordAndWrapException$1(ManagedCatalogClientImpl.scala:2904)

    at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException(ErrorDetailsHandler.scala:25)

    at com.databricks.managedcatalog.ErrorDetailsHandler.wrapServiceException$(ErrorDetailsHandler.scala:23)

    at com.databricks.managedcatalog.ManagedCatalogClientImpl.wrapServiceException(ManagedCatalogClientImpl.scala:77)

    at com.databricks.managedcatalog.ManagedCatalogClientImpl.recordAndWrapException(ManagedCatalogClientImpl.scala:2903)

    at com.databricks.managedcatalog.ManagedCatalogClientImpl.getCatalog(ManagedCatalogClientImpl.scala:156)

    at com.databricks.sql.managedcatalog.ManagedCatalogCommon.catalogExists(ManagedCatalogCommon.scala:94)

    at com.databricks.sql.managedcatalog.PermissionEnforcingManagedCatalog.catalogExists(PermissionEnforcingManagedCatalog.scala:177)

    at com.databricks.sql.managedcatalog.ManagedCatalogSessionCatalog.catalogExists(ManagedCatalogSessionCatalog.scala:384)

    at com.databricks.sql.DatabricksCatalogManager.isCatalogRegistered(DatabricksCatalogManager.scala:104)

    at org.apache.spark.sql.SparkServiceCatalogV2Handler$.catalogOperationV2(SparkServiceCatalogV2Handler.scala:58)

    at com.databricks.service.SparkServiceImpl$.$anonfun$catalogOperationV2$1(SparkServiceImpl.scala:165)

```

I've tried to Google "There is no Credential Scope", but to no avail. Anyone have a clue of where to start to look?

17 REPLIES 17

sher
Valued Contributor II

where you are running?

agagrins
New Contributor III

I'm starting the run locally, with Python 3.9.1 under WSL, but the idea then is to run the job in Databricks on AWS

sergiu
New Contributor III
New Contributor III

Hello @Aigars Grins​. Can you tell me a bit more about what you are trying to run via Databricks Connect? Generally, we recommend using dbx for local development over Databricks Connect.

Could you also provide more information on what type of compute you are connecting to? Such as runtime and whether it is running on Unity Catalog or the legacy Hive Metastore?

agagrins
New Contributor III

My understanding is that there are three main ways for me to work with Databricks: `databricks-connect`, `databricks-sql-connector`, and `dbx`. I'm trying out all three, for slightly different purposes, to see what fits our worksflows best where.

agagrins
New Contributor III

As for the problem above it seems to have gone away. While I'm not sure, it felt a bit like I didn't do anything different. Buy instead I'm faced with a much more mundane situation.

Again, I'm here trying to make `databricks-connect` work.

I simply do

```

$ python3 -m venv ~/databricks11

$ . ~/databricks11/bin/activate

$ pip install --upgrade pip

$ pip install --upgrade setuptools

$ pip install databricks-connect==11.3.0b0

$ databricks-connect configure

$ databricks-connect test

```

My `.databricks-connect` looks like

```

{

"host": "https://dbc-****.cloud.databricks.com",

"token": "dapi****",

"cluster_id": "0110-****,

"port": "15001"

}

```

I also have some environment variables, just in case

```

DATABRICKS_ADDRESS=https://dbc-****.cloud.databricks.com

DATABRICKS_API_TOKEN=dapi****

DATABRICKS_CLUSTER_ID=0110-****

DATABRICKS_PORT=15001

```

But I get an error

```

23/02/03 11:47:17 ERROR SparkClientManager: Fail to get the SparkClient

java.util.concurrent.ExecutionException: com.databricks.service.SparkServiceConnectionException: Invalid token

To connect to a Databricks cluster, you must specify an API token.

API Token: The API token used to confirm your identity to Databricks

- Learn more about API tokens here: https://docs.databricks.com/api/latest/authentication.html#generate-a-token

- Get current value: spark.conf.get("spark.databricks.service.token")

- Set via conf: spark.conf.set("spark.databricks.service.token", <your API token>)

- Set via environment variable: export DATABRICKS_API_TOKEN=<your API token>

```

agagrins
New Contributor III

The cluster I'm connecting to runs "11.3 LTS (includes Apache Spark 3.3.0, Scala 2.12)"

sergiu
New Contributor III
New Contributor III

Hmm, the connect info looks good to me. Can you try either of the following, see if you still get the error:

  • Run against a cluster unconnected to Unity Catalog (put it in Access mode - No isolation shared)

OR

  • Try with an earlier runtime, like 10.4 (and appropriate version of the connector, do pip install -U "databricks-connect==10.4.*"

Lastly, as stated in the documentation, we recommend running dbx for local development, over databricks-connect. Is there anything specific you believe you can do with databricks-connect, which you cannot achieve with dbx?

agagrins
New Contributor III

I tried with creating a new cluster, for 10.4, but that didn't get my anywhere either. The steps I followed where:

```

$ databricks clusters create --json-file cluster.json

```

Where `cluster.json` looks like

```

{

  "cluster_name": "test50",

  "spark_version": "10.4.x-scala2.12",

  "spark_conf": {

    "spark.databricks.service.client.enabled": true,

    "spark.databricks.service.server.enabled": true,

    "spark.speculation": true,

    "spark.sql.session.timeZone": "UTC"

  },

  "spark_env_vars": {

    "PYSPARK_PYTHON": "/databricks/python3/bin/python3"

  },

  "node_type_id": "i3.xlarge",

  "autoscale": {

    "min_workers": 1,

    "max_workers": 8

  },

  "autotermination_minutes": 10,

  "aws_attributes": {

    "first_on_demand": 0,

    "availability": "SPOT_WITH_FALLBACK",

    "zone_id": "eu-west-1b",

    "spot_bid_price_percent": 100

  },

  "enable_elastic_disk": false,

  "data_security_mode": "SINGLE_USER",

  "single_user_name": "****"

}

```

And then

```

$ python3 -m venv ~/databricks12

$ . ~/databricks12/bin/activate

$ pip install --upgrade pip

$ pip install --upgrade setuptools

$ pip install databricks-connect==10.4.18

$ databricks-connect test

```

And the result is as before

```

23/02/09 10:22:14 ERROR SparkServiceRPCClient: Failed to sync with the spark cluster. This could be a intermittent issue, please check your cluster's state and retry.

com.databricks.service.SparkServiceConnectionException: Invalid token

To connect to a Databricks cluster, you must specify an API token.

API Token: The API token used to confirm your identity to Databricks

 - Learn more about API tokens here: https://docs.databricks.com/api/latest/authentication.html#generate-a-token

 - Get current value: spark.conf.get("spark.databricks.service.token")

 - Set via conf: spark.conf.set("spark.databricks.service.token", <your API token>)

 - Set via environment variable: export DATABRICKS_API_TOKEN=<your API token>

```

sergiu
New Contributor III
New Contributor III

I tried your exact code on my environment and it worked without issue.

Could it be something about the token you are using and its permissions? Is it the same token you are using for the databricks CLI? What workspace permissions does the principal have?

agagrins
New Contributor III

I'm not sure how to test the "Run against a cluster unconnected to Unity Catalog (put it in Access mode - No isolation shared)" thingy. Could you provide a `cluster.json` with the corresponding settings?

sergiu
New Contributor III
New Contributor III

Change the data_security_mode field in the cluster config to NO_ISOLATION. It's unlikely that it is related to the issues you are facing, it's more likely an issue with the configuration.

But it might be worth double checking.

agagrins
New Contributor III

Why then `databricks-connect` and not `dbx`? Well, I'm trying to get both to work.

I posted a related question about `dbx` https://community.databricks.com/s/feed/0D58Y00009qtFLrSAM.

My hope here is that `databricks-connect` can have a much quicker turnaround time, compared to `dbx`, since no new environments have to be set up.

agagrins
New Contributor III

I use the same token when working with `dbx`, and that works, so I suspect the token itself isn't a problem. I'll check the permissions

agagrins
New Contributor III

These are the permissions on the cluster. Is that what you wanted?