cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Driver context not found for python spark for spark_submit_task using Jobs API submit run endpoint

umarkhan
New Contributor II

I am trying to run a multi file python job in databricks without using notebooks. I have tried setting this up by:

  • creating a docker image using the DBRT 10.4 LTS as a base and adding the zipped python application to that.
  • make a call to the run submit endpoint with this payload:
{
    "tasks": {
        "task_key": "test-run-8",
        "spark_submit_task": {
            "parameters": [
                "--py-files",
                "/app.zip",
                "/app.zip/__main__.py"
            ]
        },
        "new_cluster": {
            "num_workers": 1,
            "spark_version": "11.1.x-scala2.12",
            "aws_attributes": {
                "first_on_demand": 1,
                "availability": "SPOT_WITH_FALLBACK",
                "zone_id": "us-west-2a",
                "instance_profile_arn": "<instance profile ARN>",
                "spot_bid_price_percent": 100,
                "ebs_volume_count": 0
            },
            "node_type_id": "i3.xlarge",
            "docker_image": {
                "url": "<aws-account-number>.dkr.ecr.us-west-2.amazonaws.com/spark-app:0.1.17"
            }
        }
    }
}

The application tries to read a JSON file and load it into a new delta lake table. Unfortunately this does not work as intended. Here is what I have found:

  • When I run the code in the application out of a notebook, it works normally
  • when running using the the jobs endpoint I don't see the table at all in the databricks UI
  • When checking the S3 bucket I do see a folder created for the database and some parquet files of the table.
  • running any queries on this table fails with a not found error.
  • when checking the driver logs I see the following:
...
22/08/19 02:38:28 WARN DefaultTableOwnerAclClient: failed to update the table owner when create/drop table.
java.lang.IllegalStateException: Driver context not found
...

some extra considerations:

  • I'd like to be able to use docker images for our deployment if possible since it matches our current CI/CD pattern
  • Failing that I'd be ok with a spark_python_task, but I have not been able to get this to work when I have multiple python files
  • I want to avoid using notebooks for deploying applications.

Any help with understanding and fixing this error would be very appreciated.

Regards,

Umar

4 REPLIES 4

Kaniz
Community Manager
Community Manager

Hi @Umar Khan​, Have you enabled table access control for your workspace?

Here is the doc :- https://docs.microsoft.com/en-us/azure/databricks/administration-guide/access-control/table-acl

umarkhan
New Contributor II

Hello @Kaniz Fatma (Databricks)​, thanks for the response. No, table access control has not been enabled. As I understand it this should allow anyone to access the table by default.

Also, in case it helps, we are using AWS.

Kaniz
Community Manager
Community Manager

Hi @Umar Khan​, Please enable table ACL and let us know if that helps. If this answer helps you, please feel free to mark it as the best.

Here is the doc:- https://docs.microsoft.com/en-us/azure/databricks/administration-guide/access-control/table-acl

Vidula
Honored Contributor

Hi @Umar Khan​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.