Topics with Label: Azure data lake

Forum Posts

Sorted by:

by PunithRaj • New Contributor

12-15-2022 6:24:42 AM

3429 Views
1 replies
1 kudos

How to read a PDF file from Azure Datalake blob storage to Databricks

I have a scenario where I need to read a pdf file from "Azure Datalake blob storage to Databricks", where connection is done through AD access.Generating the SAS token has been restricted in our environment due to security issues. The below script ca...

Data Engineering

3429 Views
1 replies
1 kudos

12-15-2022 6:24:42 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-20-2022 5:59:14 AM

1 kudos

Hey @Punith raj ,Not sure about Azure but in AWS there is one service known as AWS Transact Please try to explore that onces

1 kudos

12-20-2022 5:59:14 AM

by magnus778 • New Contributor III

11-22-2022 7:50:42 AM

1181 Views
2 replies
4 kudos

Resolved! Error writing parquet to specific container in Azure Data Lake

I'm retrieving two files from container1, transforming them and merging before writing to a container2 within the same Storage Account in Azure. I'm mounting container1, unmouting and mounting countainer2 before writing. My code for writing the parqu...

Data Engineering

1181 Views
2 replies
4 kudos

11-22-2022 7:50:42 AM

View Replies

Latest Reply

Pat
Honored Contributor III

11-22-2022 10:06:35 AM

4 kudos

Hi @Magnus Asperud ,1 mounting container12 you should persist the data somewhere, creating df doesnt mean that you are reading data from container and have it accessible after unmounting. Make sure to store this merged data somewhere. Not sure if th...

4 kudos

11-22-2022 10:06:35 AM

1 More Replies

by nancy_g • New Contributor III

05-25-2022 8:03:28 AM

2630 Views
6 replies
5 kudos

Resolved! Are Jobs not supported on cluster with Azure Data Lake Storage credential passthrough enabled cluster?

Data Engineering

2630 Views
6 replies
5 kudos

05-25-2022 8:03:28 AM

View Replies

Latest Reply

Rostislaw
New Contributor III

09-16-2022 1:40:33 AM

5 kudos

Right now the feature seems to be public available. It is possible to schedule jobs with ADLS passthough enabled and do not have to provide service principal credentials.However I ask myself how that works behind the scenses. When working interactive...

5 kudos

09-16-2022 1:40:33 AM

5 More Replies

by KamKam • New Contributor

04-26-2022 1:52:28 AM

766 Views
2 replies
0 kudos

How to write to a folder in a Azure Data Lake container using Delta?

Hi All,How to write to a folder in a Azure Data Lake container using Delta?When I run:write_mode = 'overwrite' write_format = 'delta' save_path = '/mnt/container-name/folder-name' df.write \ .mode(write_mode) \ .format(write_format) \ ....

Data Engineering

766 Views
2 replies
0 kudos

04-26-2022 1:52:28 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

06-01-2022 5:07:28 PM

0 kudos

Hi @Kamalen Reddy ,Could you share the error message please?

0 kudos

06-01-2022 5:07:28 PM

1 More Replies

by MarcJustice • New Contributor

04-05-2022 6:15:04 PM

951 Views
3 replies
3 kudos

Is the promise of a data lake simply about data science, data analytics and data quality or can it also be an integral part of core transaction processing also?

Upfront, I want to let you know that I'm not a veteran data jockey, so I apologize if this topic has been covered already or is simply too basic or narrow for this community. That said, I do need help so please feel free to point me in another direc...

Data Engineering

951 Views
3 replies
3 kudos

04-05-2022 6:15:04 PM

View Replies

Latest Reply

Kaniz
Community Manager

04-26-2022 8:46:25 AM

3 kudos

Hi @Marc Barnett , Just a friendly follow-up. Do you still need help, or @Aashita Ramteke 's response help you to find the solution? Please let us know.

3 kudos

04-26-2022 8:46:25 AM

2 More Replies

by Bhanu1 • New Contributor III

03-14-2022 2:51:46 PM

2855 Views
4 replies
6 kudos

Resolved! Is it possible to mount different Azure Storage Accounts for different clusters in the same workspace?

We have a development and a production data lake. Is it possible to have a production or development cluster access only respective mounts using init scripts?

Data Engineering

2855 Views
4 replies
6 kudos

03-14-2022 2:51:46 PM

View Replies

Latest Reply

Kaniz
Community Manager

07-12-2022 5:31:43 AM

6 kudos

Hi @Bhanu Patlolla , We haven’t heard from you on the last response from @Werner Stinckens and @Hubert Dudek and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be h...

6 kudos

07-12-2022 5:31:43 AM

3 More Replies

by Development • New Contributor III

04-12-2022 11:25:00 PM

2692 Views
8 replies
5 kudos

Delta Table with 130 columns taking time

Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...

Data Engineering

2692 Views
8 replies
5 kudos

04-12-2022 11:25:00 PM

View Replies

Latest Reply

Development
New Contributor III

04-27-2022 8:27:46 AM

5 kudos

@Kaniz Fatma @Parker Temple I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing serialization issue ....

5 kudos

04-27-2022 8:27:46 AM

7 More Replies

by hetadesai • New Contributor II

01-18-2022 6:10:19 AM

4526 Views
3 replies
4 kudos

Resolved! How to download zip file from SFTP location and put that file into Azure Data Lake and unzip there ?

I have zip file on SFTP location. I want to copy that file from SFTP location and put it into Azure Data lake and want to unzip there using spark notebook. Please help me to solve this.

Data Engineering

4526 Views
3 replies
4 kudos

01-18-2022 6:10:19 AM

View Replies

Latest Reply

Kaniz
Community Manager

01-20-2022 4:01:43 AM

4 kudos

Hi @heta desai , Did our suggestions help you?

4 kudos

01-20-2022 4:01:43 AM

2 More Replies

by Hubert-Dudek • Esteemed Contributor III

11-15-2021 3:48:22 AM

1227 Views
2 replies
13 kudos

Resolved! something like AWS Macie to perform scans on Azure Data Lake

Does anyone know alternative for AWS Macie in Azure?AWS Macie scan S3 buckets for files with sensitive data (personal address, credit card etc...).I would like to use the same style ready scanner for Azure Data Lake.

Data Engineering

1227 Views
2 replies
13 kudos

11-15-2021 3:48:22 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

11-15-2021 4:58:17 AM

13 kudos

thank you, I checked and yes it is definitely the way to go

13 kudos

11-15-2021 4:58:17 AM

1 More Replies

by FMendez • New Contributor III

09-10-2021 4:29:06 AM

9566 Views
4 replies
7 kudos

Resolved! How can you mount an Azure Data Lake (gen2) using abfss and Shared Key?

I wanted to mount a ADLG2 on databricks and take advantage on the abfss driver which should be better for large analytical workloads (is that even true in the context of DB?).Setting an OAuth is a bit of a pain so I wanted to take the simpler approac...

Data Engineering

9566 Views
4 replies
7 kudos

09-10-2021 4:29:06 AM

View Replies

Latest Reply

User16753724663
Valued Contributor

10-04-2021 2:57:28 AM

7 kudos

Hi @Fernando Mendez ,The below document will help you to mount the ADLS gen2 using abfss:https://docs.databricks.com/data/data-sources/azure/adls-gen2/azure-datalake-gen2-get-started.htmlCould you please check if this helps?

7 kudos

10-04-2021 2:57:28 AM

3 More Replies

by SarahDorich • New Contributor II

08-16-2021 12:23:31 PM

1995 Views
3 replies
0 kudos

How to register datasets for Detectron2

I'm trying to run a Detectron2 model in Databricks and cannot figure out how to register my train, val and test datasets. My datasets live in an Azure data lake. I have tried the following with no luck. Any help is appreciated. 1) Specifying full p...

Data Engineering

1995 Views
3 replies
0 kudos

08-16-2021 12:23:31 PM

View Replies

Latest Reply

Thurman
New Contributor II

08-17-2021 9:58:35 PM

0 kudos

0 kudos

08-17-2021 9:58:35 PM

2 More Replies

by User16765131552 • Contributor III

06-18-2021 12:39:40 PM

4870 Views
1 replies
0 kudos

Read excel files and append to make one data frame in Databricks from azure data lake without specific file names

I am storing excel files in Azure data lake (gen 1). They follow filenames follow the same pattern "2021-06-18T09_00_07ONR_Usage_Dataset", "2021-06-18T09_00_07DSS_Usage_Dataset", etc. depending on the date and time. I want to read all the files in th...

Data Engineering

4870 Views
1 replies
0 kudos

06-18-2021 12:39:40 PM

View Replies

Latest Reply

Ryan_Chynoweth
Honored Contributor III

06-21-2021 11:33:00 AM

0 kudos

If you are attempting to read all the files in a directory you should be able to use a wild card and filter using the extension. For example: df = (spark .read .format("com.crealytics.spark.excel") .option("header", "True") .option("inferSchema", "tr...

0 kudos

06-21-2021 11:33:00 AM

by microamp • New Contributor II

01-26-2018 2:52:59 AM

8838 Views
12 replies
0 kudos

Azure Data Lake Config Issue: No value for dfs.adls.oauth2.access.token.provider found in conf file.

Hi,I have files hosted on an Azure Data Lake Store which I can connect from Azure Databricks configured as per instructions here.I can read JSON files fine, however, I'm getting the following error when I try to read an Avro file.spark.read.format("c...

Data Engineering

8838 Views
12 replies
0 kudos

01-26-2018 2:52:59 AM

View Replies

Latest Reply

User16301467523
New Contributor II

06-11-2018 3:46:47 PM

0 kudos

Taras's answer is correct. Because spark-avro is based on the RDD APIs, the properties must be set in the hadoopConfiguration options. Please note these docs for configuration using the RDD API: https://docs.azuredatabricks.net/spark/latest/data-sou...

0 kudos

06-11-2018 3:46:47 PM

11 More Replies

by juan_perez • New Contributor

08-03-2018 7:00:20 AM

10833 Views
2 replies
0 kudos

Write data Frame into Azure Data Lake Storage

It happens that I am manipulating some data using Azure Databricks. Such data is in an Azure Data Lake Storage Gen1. I mounted the data into DBFS, but now, after transforming the data I would like to write it back into my data lake. To mount the dat...

Data Engineering

10833 Views
2 replies
0 kudos

08-03-2018 7:00:20 AM

View Replies

Latest Reply

PawanShukla
New Contributor III

09-29-2018 3:36:27 AM

0 kudos

I am new in Azure Data Bricks..and I am trying to write the Data frame in mounted ADLS file. But in below command dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header","true").csv("/mnt/<mount-name>")

0 kudos

09-29-2018 3:36:27 AM

1 More Replies