cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

curl: (26) Failed to open/read local data from file/application in DBFS

kavya08
New Contributor

Hi all,

I am trying to upload a parquet file from S3 to dbfs with airflow bash operator curl command using Databricks python Rest API's as shown below

 

 

 

databricks_load_task = BashOperator(
        task_id="upload_to_databricks",
        bash_command = """
   
        curl --location --request POST {{task_instance.xcom_pull(task_ids='get_creds', key='DATABRICKS_HOST')}}/api/2.0/dbfs/put \
        --header "Authorization: Bearer {{task_instance.xcom_pull(task_ids='get_creds', key='DATABRICKS_TOKEN')}}" \
        --form contents="@s3://bucket/test/file.parquet"\
        --form path="{{task_instance.xcom_pull(task_ids='get_creds', key='UPLOAD_PATH')}}" \
        --form overwrite="true"
        """
)

 

 

 

 Parquet files stores the dataframe result. I am unable to upload  the file as it gives me the below error 

curl: (26) Failed to open/read local data from file/application

I tried to replace the content from s3 path to text(--form contents="test text") this works for me. Please help me with this.

#dbfs 

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @kavya08, There might be an issue with how the file path is specified in your curl command. 

File Path Issue:

  • The --form contents="@s3://bucket/test/file.parquet" part of your curl command specifies the file to be uploaded. Ensure that the path to the Parquet file is correctly formatted.
  • Ensure the file exists in the specified S3 bucket and the path is accessible.

Authentication and Permissions:

  • Verify that the Databricks token ({{task_instance.xcom_pull(task_ids='get_creds', key='DATABRICKS_TOKEN')}}) is valid and has the necessary permissions to read from S3 and write to DBFS.
  • Confirm that the token has the appropriate scope for accessing DBFS.

Content-Type:

  • The --form contents parameter expects the actual content of the file. If youโ€™re trying to upload a Parquet file, ensure the content is provided correctly.
  • If youโ€™re using --form contents="test text" as a workaround, it suggests the issue lies with the file content itself.

Alternative Approach:

Instead of curl, consider using the Databricks Python SDK or the Boto3 library (for S3 operations) directly within your Airflow DAG.

For example, you can use the following Python code to upload a file from DBFS to an S3 bucket:

  • import boto3 
    from botocore.client import Config 
    ACCESS_KEY = โ€™YOUR_ACCESS_KEY' 
    SECRET_KEY = 'YOUR_SECRET_KEY' 
    AWS_BUCKET_NAME = "BUCKET_NAME" 
    s3 = boto3.resource('s3', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, config=Config(signature_version='s3v4')) s3.meta.client.upload_file('/dbfs/FileStore/filename.parquet', AWS_BUCKET_NAME, "filename.parquet")

Replace the placeholders (YOUR_ACCESS_KEY, YOUR_SECRET_KEY, BUCKET_NAME, and filename.parquet) with your actual values.

 

Remember to adapt the solution based on your specific environment and requirements. 

If you continue to face issues, please ask for further assistance! ๐Ÿš€

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group