- 3429 Views
- 1 replies
- 1 kudos
I have a scenario where I need to read a pdf file from "Azure Datalake blob storage to Databricks", where connection is done through AD access.Generating the SAS token has been restricted in our environment due to security issues. The below script ca...
- 3429 Views
- 1 replies
- 1 kudos
Latest Reply
Hey @Punith raj​ ,Not sure about Azure but in AWS there is one service known as AWS Transact Please try to explore that onces
- 1181 Views
- 2 replies
- 4 kudos
I'm retrieving two files from container1, transforming them and merging before writing to a container2 within the same Storage Account in Azure. I'm mounting container1, unmouting and mounting countainer2 before writing. My code for writing the parqu...
- 1181 Views
- 2 replies
- 4 kudos
Latest Reply
Pat
Honored Contributor III
Hi @Magnus Asperud​ ,1 mounting container12 you should persist the data somewhere, creating df doesnt mean that you are reading data from container and have it accessible after unmounting. Make sure to store this merged data somewhere. Not sure if th...
1 More Replies
- 766 Views
- 2 replies
- 0 kudos
Hi All,How to write to a folder in a Azure Data Lake container using Delta?When I run:write_mode = 'overwrite'
write_format = 'delta'
save_path = '/mnt/container-name/folder-name'
df.write \
.mode(write_mode) \
.format(write_format) \
....
- 766 Views
- 2 replies
- 0 kudos
Latest Reply
Hi @Kamalen Reddy​ ,Could you share the error message please?
1 More Replies
- 951 Views
- 3 replies
- 3 kudos
Upfront, I want to let you know that I'm not a veteran data jockey, so I apologize if this topic has been covered already or is simply too basic or narrow for this community. That said, I do need help so please feel free to point me in another direc...
- 951 Views
- 3 replies
- 3 kudos
Latest Reply
Hi @Marc Barnett​ , Just a friendly follow-up. Do you still need help, or @Aashita Ramteke​ 's response help you to find the solution? Please let us know.
2 More Replies
by
Bhanu1
• New Contributor III
- 2855 Views
- 4 replies
- 6 kudos
We have a development and a production data lake. Is it possible to have a production or development cluster access only respective mounts using init scripts?
- 2855 Views
- 4 replies
- 6 kudos
Latest Reply
Hi @Bhanu Patlolla​ ​, We haven’t heard from you on the last response from @Werner Stinckens​ and @Hubert Dudek​ and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be h...
3 More Replies
- 2692 Views
- 8 replies
- 5 kudos
Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...
- 2692 Views
- 8 replies
- 5 kudos
Latest Reply
@Kaniz Fatma​ @Parker Temple​ I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing serialization issue ....
7 More Replies
- 4526 Views
- 3 replies
- 4 kudos
I have zip file on SFTP location. I want to copy that file from SFTP location and put it into Azure Data lake and want to unzip there using spark notebook. Please help me to solve this.
- 4526 Views
- 3 replies
- 4 kudos
Latest Reply
Hi @heta desai​ , Did our suggestions help you?
2 More Replies
- 1227 Views
- 2 replies
- 13 kudos
Does anyone know alternative for AWS Macie in Azure?AWS Macie scan S3 buckets for files with sensitive data (personal address, credit card etc...).I would like to use the same style ready scanner for Azure Data Lake.
- 1227 Views
- 2 replies
- 13 kudos
Latest Reply
thank you, I checked and yes it is definitely the way to go
1 More Replies
- 9566 Views
- 4 replies
- 7 kudos
I wanted to mount a ADLG2 on databricks and take advantage on the abfss driver which should be better for large analytical workloads (is that even true in the context of DB?).Setting an OAuth is a bit of a pain so I wanted to take the simpler approac...
- 9566 Views
- 4 replies
- 7 kudos
Latest Reply
Hi @Fernando Mendez​ ,The below document will help you to mount the ADLS gen2 using abfss:https://docs.databricks.com/data/data-sources/azure/adls-gen2/azure-datalake-gen2-get-started.htmlCould you please check if this helps?
3 More Replies
- 1995 Views
- 3 replies
- 0 kudos
I'm trying to run a Detectron2 model in Databricks and cannot figure out how to register my train, val and test datasets. My datasets live in an Azure data lake. I have tried the following with no luck. Any help is appreciated.
1) Specifying full p...
- 1995 Views
- 3 replies
- 0 kudos
Latest Reply
Register your dataset Optionally, register metadata for your dataset.
2 More Replies
- 4870 Views
- 1 replies
- 0 kudos
I am storing excel files in Azure data lake (gen 1). They follow filenames follow the same pattern "2021-06-18T09_00_07ONR_Usage_Dataset", "2021-06-18T09_00_07DSS_Usage_Dataset", etc. depending on the date and time. I want to read all the files in th...
- 4870 Views
- 1 replies
- 0 kudos
Latest Reply
If you are attempting to read all the files in a directory you should be able to use a wild card and filter using the extension. For example: df = (spark
.read
.format("com.crealytics.spark.excel")
.option("header", "True")
.option("inferSchema", "tr...
- 8838 Views
- 12 replies
- 0 kudos
Hi,I have files hosted on an Azure Data Lake Store which I can connect from Azure Databricks configured as per instructions here.I can read JSON files fine, however, I'm getting the following error when I try to read an Avro file.spark.read.format("c...
- 8838 Views
- 12 replies
- 0 kudos
Latest Reply
Taras's answer is correct. Because spark-avro is based on the RDD APIs, the properties must be set in the hadoopConfiguration options.
Please note these docs for configuration using the RDD API: https://docs.azuredatabricks.net/spark/latest/data-sou...
11 More Replies
- 10833 Views
- 2 replies
- 0 kudos
It happens that I am manipulating some data using Azure Databricks. Such data is in an Azure Data Lake Storage Gen1. I mounted the data into DBFS, but now, after transforming the data I would like to write it back into my data lake.
To mount the dat...
- 10833 Views
- 2 replies
- 0 kudos
Latest Reply
I am new in Azure Data Bricks..and I am trying to write the Data frame in mounted ADLS file. But in below command
dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header","true").csv("/mnt/<mount-name>")
1 More Replies