Write data Frame into Azure Data Lake Storage
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-03-2018 07:00 AM
It happens that I am manipulating some data using Azure Databricks. Such data is in an Azure Data Lake Storage Gen1. I mounted the data into DBFS, but now, after transforming the data I would like to write it back into my data lake.
To mount the data I used the following:
configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential", "dfs.adls.oauth2.client.id": "<your-service-client-id>", "dfs.adls.oauth2.credential": "<your-service-credentials>", "dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/<your-directory-id>/oauth2/token"}dbutils.fs.mount( source = "adl://<your-data-lake-store-account-name>.azuredatalakestore.net/<your-directory-name>", mount_point = "/mnt/<mount-name>", extra_configs = configs)
I want to write back a .csv file. For this task I am using the following line
dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").csv("adl://<your-data-lake-store-account-name>.azuredatalakestore.net/<your-directory-name>")
However, I get the following error:
IllegalArgumentException: u'No value for dfs.adls.oauth2.access.token.provider found in conf file.'
Any piece of code, suggestions that can help me? Or link that walks me through.
Thanks.
- Labels:
-
Azure data lake
-
Azure databricks
-
Pyspark
-
Python
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-04-2018 04:12 AM
Hi there,
Did you try writing to your mount point location?
- dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header","true").csv("/mnt/<mount-name>")
There is a related post here to configure the appropriate Hadoop properties if you want to use ADL syntax.
cheers,
Andrew
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-29-2018 03:36 AM
I am new in Azure Data Bricks..and I am trying to write the Data frame in mounted ADLS file. But in below command
dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header","true").csv("/mnt/<mount-name>")

