cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Driver context not found for python spark for spark_submit_task using Jobs API submit run endpoint

umarkhan
New Contributor II

I am trying to run a multi file python job in databricks without using notebooks. I have tried setting this up by:

  • creating a docker image using the DBRT 10.4 LTS as a base and adding the zipped python application to that.
  • make a call to the run submit endpoint with this payload:
{
    "tasks": {
        "task_key": "test-run-8",
        "spark_submit_task": {
            "parameters": [
                "--py-files",
                "/app.zip",
                "/app.zip/__main__.py"
            ]
        },
        "new_cluster": {
            "num_workers": 1,
            "spark_version": "11.1.x-scala2.12",
            "aws_attributes": {
                "first_on_demand": 1,
                "availability": "SPOT_WITH_FALLBACK",
                "zone_id": "us-west-2a",
                "instance_profile_arn": "<instance profile ARN>",
                "spot_bid_price_percent": 100,
                "ebs_volume_count": 0
            },
            "node_type_id": "i3.xlarge",
            "docker_image": {
                "url": "<aws-account-number>.dkr.ecr.us-west-2.amazonaws.com/spark-app:0.1.17"
            }
        }
    }
}

The application tries to read a JSON file and load it into a new delta lake table. Unfortunately this does not work as intended. Here is what I have found:

  • When I run the code in the application out of a notebook, it works normally
  • when running using the the jobs endpoint I don't see the table at all in the databricks UI
  • When checking the S3 bucket I do see a folder created for the database and some parquet files of the table.
  • running any queries on this table fails with a not found error.
  • when checking the driver logs I see the following:
...
22/08/19 02:38:28 WARN DefaultTableOwnerAclClient: failed to update the table owner when create/drop table.
java.lang.IllegalStateException: Driver context not found
...

some extra considerations:

  • I'd like to be able to use docker images for our deployment if possible since it matches our current CI/CD pattern
  • Failing that I'd be ok with a spark_python_task, but I have not been able to get this to work when I have multiple python files
  • I want to avoid using notebooks for deploying applications.

Any help with understanding and fixing this error would be very appreciated.

Regards,

Umar

2 REPLIES 2

Hello @Kaniz Fatma (Databricks)​, thanks for the response. No, table access control has not been enabled. As I understand it this should allow anyone to access the table by default.

Also, in case it helps, we are using AWS.

Vidula
Honored Contributor

Hi @Umar Khan​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group