cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

curl: (26) Failed to open/read local data from file/application in DBFS

kavya08
New Contributor

Hi all,

I am trying to upload a parquet file from S3 to dbfs with airflow bash operator curl command using Databricks python Rest API's as shown below

 

 

 

databricks_load_task = BashOperator(
        task_id="upload_to_databricks",
        bash_command = """
   
        curl --location --request POST {{task_instance.xcom_pull(task_ids='get_creds', key='DATABRICKS_HOST')}}/api/2.0/dbfs/put \
        --header "Authorization: Bearer {{task_instance.xcom_pull(task_ids='get_creds', key='DATABRICKS_TOKEN')}}" \
        --form contents="@s3://bucket/test/file.parquet"\
        --form path="{{task_instance.xcom_pull(task_ids='get_creds', key='UPLOAD_PATH')}}" \
        --form overwrite="true"
        """
)

 

 

 

 Parquet files stores the dataframe result. I am unable to upload  the file as it gives me the below error 

curl: (26) Failed to open/read local data from file/application

I tried to replace the content from s3 path to text(--form contents="test text") this works for me. Please help me with this.

#dbfs 

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @kavya08, There might be an issue with how the file path is specified in your curl command. 

File Path Issue:

  • The --form contents="@s3://bucket/test/file.parquet" part of your curl command specifies the file to be uploaded. Ensure that the path to the Parquet file is correctly formatted.
  • Ensure the file exists in the specified S3 bucket and the path is accessible.

Authentication and Permissions:

  • Verify that the Databricks token ({{task_instance.xcom_pull(task_ids='get_creds', key='DATABRICKS_TOKEN')}}) is valid and has the necessary permissions to read from S3 and write to DBFS.
  • Confirm that the token has the appropriate scope for accessing DBFS.

Content-Type:

  • The --form contents parameter expects the actual content of the file. If youโ€™re trying to upload a Parquet file, ensure the content is provided correctly.
  • If youโ€™re using --form contents="test text" as a workaround, it suggests the issue lies with the file content itself.

Alternative Approach:

Instead of curl, consider using the Databricks Python SDK or the Boto3 library (for S3 operations) directly within your Airflow DAG.

For example, you can use the following Python code to upload a file from DBFS to an S3 bucket:

  • import boto3 
    from botocore.client import Config 
    ACCESS_KEY = โ€™YOUR_ACCESS_KEY' 
    SECRET_KEY = 'YOUR_SECRET_KEY' 
    AWS_BUCKET_NAME = "BUCKET_NAME" 
    s3 = boto3.resource('s3', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, config=Config(signature_version='s3v4')) s3.meta.client.upload_file('/dbfs/FileStore/filename.parquet', AWS_BUCKET_NAME, "filename.parquet")

Replace the placeholders (YOUR_ACCESS_KEY, YOUR_SECRET_KEY, BUCKET_NAME, and filename.parquet) with your actual values.

 

Remember to adapt the solution based on your specific environment and requirements. 

If you continue to face issues, please ask for further assistance! ๐Ÿš€

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.