cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

PunithRaj
by New Contributor
  • 3195 Views
  • 1 replies
  • 1 kudos

How to read a PDF file from Azure Datalake blob storage to Databricks

I have a scenario where I need to read a pdf file from "Azure Datalake blob storage to Databricks", where connection is done through AD access.Generating the SAS token has been restricted in our environment due to security issues. The below script ca...

  • 3195 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

Hey @Punith raj​ ,Not sure about Azure but in AWS there is one service known as AWS Transact Please try to explore that onces

  • 1 kudos
magnus778
by New Contributor III
  • 1051 Views
  • 2 replies
  • 4 kudos

Resolved! Error writing parquet to specific container in Azure Data Lake

I'm retrieving two files from container1, transforming them and merging before writing to a container2 within the same Storage Account in Azure. I'm mounting container1, unmouting and mounting countainer2 before writing. My code for writing the parqu...

  • 1051 Views
  • 2 replies
  • 4 kudos
Latest Reply
Pat
Honored Contributor III
  • 4 kudos

Hi @Magnus Asperud​ ,1 mounting container12 you should persist the data somewhere, creating df doesnt mean that you are reading data from container and have it accessible after unmounting. Make sure to store this merged data somewhere. Not sure if th...

  • 4 kudos
1 More Replies
nancy_g
by New Contributor III
  • 2389 Views
  • 6 replies
  • 5 kudos
  • 2389 Views
  • 6 replies
  • 5 kudos
Latest Reply
Rostislaw
New Contributor III
  • 5 kudos

Right now the feature seems to be public available. It is possible to schedule jobs with ADLS passthough enabled and do not have to provide service principal credentials.However I ask myself how that works behind the scenses. When working interactive...

  • 5 kudos
5 More Replies
KamKam
by New Contributor
  • 682 Views
  • 2 replies
  • 0 kudos

How to write to a folder in a Azure Data Lake container using Delta?

Hi All,How to write to a folder in a Azure Data Lake container using Delta?When I run:write_mode = 'overwrite' write_format = 'delta' save_path = '/mnt/container-name/folder-name'   df.write \ .mode(write_mode) \ .format(write_format) \ ....

  • 682 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Kamalen Reddy​ ,Could you share the error message please?

  • 0 kudos
1 More Replies
MarcJustice
by New Contributor
  • 868 Views
  • 3 replies
  • 3 kudos

Is the promise of a data lake simply about data science, data analytics and data quality or can it also be an integral part of core transaction processing also?

Upfront, I want to let you know that I'm not a veteran data jockey, so I apologize if this topic has been covered already or is simply too basic or narrow for this community. That said, I do need help so please feel free to point me in another direc...

  • 868 Views
  • 3 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Marc Barnett​ , Just a friendly follow-up. Do you still need help, or @Aashita Ramteke​ 's response help you to find the solution? Please let us know.

  • 3 kudos
2 More Replies
Bhanu1
by New Contributor III
  • 2598 Views
  • 4 replies
  • 6 kudos

Resolved! Is it possible to mount different Azure Storage Accounts for different clusters in the same workspace?

We have a development and a production data lake. Is it possible to have a production or development cluster access only respective mounts using init scripts?

  • 2598 Views
  • 4 replies
  • 6 kudos
Latest Reply
Kaniz
Community Manager
  • 6 kudos

Hi @Bhanu Patlolla​ ​, We haven’t heard from you on the last response from @Werner Stinckens​ and @Hubert Dudek​  and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be h...

  • 6 kudos
3 More Replies
Development
by New Contributor III
  • 2381 Views
  • 8 replies
  • 5 kudos

Delta Table with 130 columns taking time

Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...

  • 2381 Views
  • 8 replies
  • 5 kudos
Latest Reply
Development
New Contributor III
  • 5 kudos

@Kaniz Fatma​ @Parker Temple​  I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing  serialization issue ....

  • 5 kudos
7 More Replies
hetadesai
by New Contributor II
  • 4349 Views
  • 3 replies
  • 4 kudos

Resolved! How to download zip file from SFTP location and put that file into Azure Data Lake and unzip there ?

I have zip file on SFTP location. I want to copy that file from SFTP location and put it into Azure Data lake and want to unzip there using spark notebook. Please help me to solve this.

  • 4349 Views
  • 3 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @heta desai​ , Did our suggestions help you?

  • 4 kudos
2 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 1091 Views
  • 2 replies
  • 13 kudos

Resolved! something like AWS Macie to perform scans on Azure Data Lake

Does anyone know alternative for AWS Macie in Azure?AWS Macie scan S3 buckets for files with sensitive data (personal address, credit card etc...).I would like to use the same style ready scanner for Azure Data Lake.

  • 1091 Views
  • 2 replies
  • 13 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 13 kudos

thank you, I checked and yes it is definitely the way to go

  • 13 kudos
1 More Replies
FMendez
by New Contributor III
  • 9069 Views
  • 4 replies
  • 7 kudos

Resolved! How can you mount an Azure Data Lake (gen2) using abfss and Shared Key?

I wanted to mount a ADLG2 on databricks and take advantage on the abfss driver which should be better for large analytical workloads (is that even true in the context of DB?).Setting an OAuth is a bit of a pain so I wanted to take the simpler approac...

  • 9069 Views
  • 4 replies
  • 7 kudos
Latest Reply
User16753724663
Valued Contributor
  • 7 kudos

Hi @Fernando Mendez​ ,The below document will help you to mount the ADLS gen2 using abfss:https://docs.databricks.com/data/data-sources/azure/adls-gen2/azure-datalake-gen2-get-started.htmlCould you please check if this helps?

  • 7 kudos
3 More Replies
SarahDorich
by New Contributor II
  • 1850 Views
  • 3 replies
  • 0 kudos

How to register datasets for Detectron2

I'm trying to run a Detectron2 model in Databricks and cannot figure out how to register my train, val and test datasets. My datasets live in an Azure data lake. I have tried the following with no luck. Any help is appreciated. 1) Specifying full p...

  • 1850 Views
  • 3 replies
  • 0 kudos
Latest Reply
Thurman
New Contributor II
  • 0 kudos

Register your dataset Optionally, register metadata for your dataset.

  • 0 kudos
2 More Replies
User16765131552
by Contributor III
  • 4600 Views
  • 1 replies
  • 0 kudos

Read excel files and append to make one data frame in Databricks from azure data lake without specific file names

I am storing excel files in Azure data lake (gen 1). They follow filenames follow the same pattern "2021-06-18T09_00_07ONR_Usage_Dataset", "2021-06-18T09_00_07DSS_Usage_Dataset", etc. depending on the date and time. I want to read all the files in th...

  • 4600 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

If you are attempting to read all the files in a directory you should be able to use a wild card and filter using the extension. For example: df = (spark .read .format("com.crealytics.spark.excel") .option("header", "True") .option("inferSchema", "tr...

  • 0 kudos
microamp
by New Contributor II
  • 8297 Views
  • 12 replies
  • 0 kudos

Azure Data Lake Config Issue: No value for dfs.adls.oauth2.access.token.provider found in conf file.

Hi,I have files hosted on an Azure Data Lake Store which I can connect from Azure Databricks configured as per instructions here.I can read JSON files fine, however, I'm getting the following error when I try to read an Avro file.spark.read.format("c...

  • 8297 Views
  • 12 replies
  • 0 kudos
Latest Reply
User16301467523
New Contributor II
  • 0 kudos

Taras's answer is correct. Because spark-avro is based on the RDD APIs, the properties must be set in the hadoopConfiguration options. Please note these docs for configuration using the RDD API: https://docs.azuredatabricks.net/spark/latest/data-sou...

  • 0 kudos
11 More Replies
juan_perez
by New Contributor
  • 10436 Views
  • 2 replies
  • 0 kudos

Write data Frame into Azure Data Lake Storage

It happens that I am manipulating some data using Azure Databricks. Such data is in an Azure Data Lake Storage Gen1. I mounted the data into DBFS, but now, after transforming the data I would like to write it back into my data lake. To mount the dat...

  • 10436 Views
  • 2 replies
  • 0 kudos
Latest Reply
PawanShukla
New Contributor III
  • 0 kudos

I am new in Azure Data Bricks..and I am trying to write the Data frame in mounted ADLS file. But in below command dfGPS.write.mode("overwrite").format("com.databricks.spark.csv").option("header","true").csv("/mnt/<mount-name>")

  • 0 kudos
1 More Replies
Labels