Hi @kavya08, There might be an issue with how the file path is specified in your curl command.
File Path Issue:
- The --form contents="@s3://bucket/test/file.parquet" part of your curl command specifies the file to be uploaded. Ensure that the path to the Parquet file is correctly formatted.
- Ensure the file exists in the specified S3 bucket and the path is accessible.
Authentication and Permissions:
- Verify that the Databricks token ({{task_instance.xcom_pull(task_ids='get_creds', key='DATABRICKS_TOKEN')}}) is valid and has the necessary permissions to read from S3 and write to DBFS.
- Confirm that the token has the appropriate scope for accessing DBFS.
Content-Type:
- The --form contents parameter expects the actual content of the file. If youโre trying to upload a Parquet file, ensure the content is provided correctly.
- If youโre using --form contents="test text" as a workaround, it suggests the issue lies with the file content itself.
Alternative Approach:
Instead of curl, consider using the Databricks Python SDK or the Boto3 library (for S3 operations) directly within your Airflow DAG.
For example, you can use the following Python code to upload a file from DBFS to an S3 bucket:
import boto3
from botocore.client import Config
ACCESS_KEY = โYOUR_ACCESS_KEY'
SECRET_KEY = 'YOUR_SECRET_KEY'
AWS_BUCKET_NAME = "BUCKET_NAME"
s3 = boto3.resource('s3', aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY, config=Config(signature_version='s3v4')) s3.meta.client.upload_file('/dbfs/FileStore/filename.parquet', AWS_BUCKET_NAME, "filename.parquet")
Replace the placeholders (YOUR_ACCESS_KEY, YOUR_SECRET_KEY, BUCKET_NAME, and filename.parquet) with your actual values.
Remember to adapt the solution based on your specific environment and requirements.
If you continue to face issues, please ask for further assistance! ๐