- 5559 Views
- 1 replies
- 1 kudos
I have a scenario where I need to read a pdf file from "Azure Datalake blob storage to Databricks", where connection is done through AD access.Generating the SAS token has been restricted in our environment due to security issues. The below script ca...
- 5559 Views
- 1 replies
- 1 kudos
Latest Reply
Hey @Punith raj ,Not sure about Azure but in AWS there is one service known as AWS Transact Please try to explore that onces
- 2347 Views
- 2 replies
- 4 kudos
I'm retrieving two files from container1, transforming them and merging before writing to a container2 within the same Storage Account in Azure. I'm mounting container1, unmouting and mounting countainer2 before writing. My code for writing the parqu...
- 2347 Views
- 2 replies
- 4 kudos
Latest Reply
Pat
Honored Contributor III
Hi @Magnus Asperud ,1 mounting container12 you should persist the data somewhere, creating df doesnt mean that you are reading data from container and have it accessible after unmounting. Make sure to store this merged data somewhere. Not sure if th...
1 More Replies
- 1378 Views
- 2 replies
- 0 kudos
Hi All,How to write to a folder in a Azure Data Lake container using Delta?When I run:write_mode = 'overwrite'
write_format = 'delta'
save_path = '/mnt/container-name/folder-name'
df.write \
.mode(write_mode) \
.format(write_format) \
....
- 1378 Views
- 2 replies
- 0 kudos
Latest Reply
Hi @Kamalen Reddy ,Could you share the error message please?
1 More Replies
- 1695 Views
- 2 replies
- 3 kudos
Upfront, I want to let you know that I'm not a veteran data jockey, so I apologize if this topic has been covered already or is simply too basic or narrow for this community. That said, I do need help so please feel free to point me in another direc...
- 1695 Views
- 2 replies
- 3 kudos
Latest Reply
@Marc Barnett , Databricks’ Lakehouse architecture is the ideal data architecture for data-driven organizations. It combines the best qualities of data warehouses and data lakes to provide a single solution for all major data workloads and supports ...
1 More Replies
- 5422 Views
- 5 replies
- 5 kudos
Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...
- 5422 Views
- 5 replies
- 5 kudos
Latest Reply
@Kaniz Fatma @Parker Temple I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing serialization issue ....
4 More Replies
by
Bhanu1
• New Contributor III
- 4694 Views
- 3 replies
- 6 kudos
We have a development and a production data lake. Is it possible to have a production or development cluster access only respective mounts using init scripts?
- 4694 Views
- 3 replies
- 6 kudos
Latest Reply
Yes it is possible. Additionally mount is permanent and done in dbfs so it is enough to run it one time. you can have for example following configuration:In Azure you can have 2 databricks workspace,cluster in every workspace can have env variable is...
2 More Replies
- 6043 Views
- 1 replies
- 3 kudos
I have zip file on SFTP location. I want to copy that file from SFTP location and put it into Azure Data lake and want to unzip there using spark notebook. Please help me to solve this.
- 6043 Views
- 1 replies
- 3 kudos
Latest Reply
I would go with @Kaniz Fatma approach and download data in Data Factory and after is downloaded on success trigger databricks spark notebook. With spark you can read also compressed data so maybe you will not need to do even separate unzip.
- 2439 Views
- 2 replies
- 13 kudos
Does anyone know alternative for AWS Macie in Azure?AWS Macie scan S3 buckets for files with sensitive data (personal address, credit card etc...).I would like to use the same style ready scanner for Azure Data Lake.
- 2439 Views
- 2 replies
- 13 kudos
Latest Reply
thank you, I checked and yes it is definitely the way to go
1 More Replies
- 15072 Views
- 3 replies
- 6 kudos
I wanted to mount a ADLG2 on databricks and take advantage on the abfss driver which should be better for large analytical workloads (is that even true in the context of DB?).Setting an OAuth is a bit of a pain so I wanted to take the simpler approac...
- 15072 Views
- 3 replies
- 6 kudos
Latest Reply
Hi @Fernando Mendez ,The below document will help you to mount the ADLS gen2 using abfss:https://docs.databricks.com/data/data-sources/azure/adls-gen2/azure-datalake-gen2-get-started.htmlCould you please check if this helps?
2 More Replies
- 3459 Views
- 3 replies
- 0 kudos
I'm trying to run a Detectron2 model in Databricks and cannot figure out how to register my train, val and test datasets. My datasets live in an Azure data lake. I have tried the following with no luck. Any help is appreciated.
1) Specifying full p...
- 3459 Views
- 3 replies
- 0 kudos
Latest Reply
Register your dataset Optionally, register metadata for your dataset.
2 More Replies
- 7186 Views
- 1 replies
- 0 kudos
I am storing excel files in Azure data lake (gen 1). They follow filenames follow the same pattern "2021-06-18T09_00_07ONR_Usage_Dataset", "2021-06-18T09_00_07DSS_Usage_Dataset", etc. depending on the date and time. I want to read all the files in th...
- 7186 Views
- 1 replies
- 0 kudos
Latest Reply
If you are attempting to read all the files in a directory you should be able to use a wild card and filter using the extension. For example: df = (spark
.read
.format("com.crealytics.spark.excel")
.option("header", "True")
.option("inferSchema", "tr...
- 13052 Views
- 12 replies
- 0 kudos
Hi,I have files hosted on an Azure Data Lake Store which I can connect from Azure Databricks configured as per instructions here.I can read JSON files fine, however, I'm getting the following error when I try to read an Avro file.spark.read.format("c...
- 13052 Views
- 12 replies
- 0 kudos
Latest Reply
Taras's answer is correct. Because spark-avro is based on the RDD APIs, the properties must be set in the hadoopConfiguration options.
Please note these docs for configuration using the RDD API: https://docs.azuredatabricks.net/spark/latest/data-sou...
11 More Replies
- 14571 Views
- 2 replies
- 0 kudos
It happens that I am manipulating some data using Azure Databricks. Such data is in an Azure Data Lake Storage Gen1. I mounted the data into DBFS, but now, after transforming the data I would like to write it back into my data lake.
To mount the dat...
- 14571 Views
- 2 replies
- 0 kudos
Latest Reply
I am new in Azure Data Bricks..and I am trying to write the Data frame in mounted ADLS file. But in below command
dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header","true").csv("/mnt/<mount-name>")
1 More Replies