cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

az38
by New Contributor II
  • 3206 Views
  • 2 replies
  • 3 kudos

load files filtered by last_modified in PySpark

Hi, community!How do you think what is the best way to load from Azure ADLS (actually, filesystem doesn't matter) into df onli files modified after some point in time?Is there any function like input_file_name() but for last_modified to use it in a w...

  • 3206 Views
  • 2 replies
  • 3 kudos
Latest Reply
venkatcrc
New Contributor III
  • 3 kudos

_metadata will provide file modification timestamp. I tried on dbfs but not sure for ADLS.https://docs.databricks.com/ingestion/file-metadata-column.html

  • 3 kudos
1 More Replies
Krish1
by New Contributor II
  • 2340 Views
  • 4 replies
  • 0 kudos

Error while mounting ADLS in python using AccountKey

I'm using the below code using Account key to mount ADLS in python but running into error:shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.lang.IllegalArgumentException: The String is not a valid Base64-encoded string. Can you pleas...

  • 2340 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Krish Lam​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 0 kudos
3 More Replies
Josh_Stafford
by New Contributor II
  • 1007 Views
  • 2 replies
  • 1 kudos

Using dbutils.fs.ls on URI with square brackets results in error

Square brackets in ADLS are accepted, so why can't I list the files in the folder? I have tried escaping the square brackets manually, but then the escaped values are re-escaped from %5B to %255B and %5D to %255D. I get:URISyntaxException: Illegal ...

  • 1007 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Joshua Stafford​ :The URISyntaxException error you are encountering is likely due to the fact that square brackets are reserved characters in URIs (Uniform Resource Identifiers) and need to be properly encoded when used in a URL. In this case, it ap...

  • 1 kudos
1 More Replies
pankajBhatt
by New Contributor II
  • 887 Views
  • 2 replies
  • 1 kudos

Databricks not able to access latest files in Azure ADLS Gen1

I have mounted my path from Databricks to AzureADLS Gen1. using SPN as service accuntuntill yesterday everything was ok, but today I see, I can view all older deleted folders. I can not see them in ADLS . but my databricks dbutils.fs.ls() shows them....

  • 887 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @pankaj bhatt​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...

  • 1 kudos
1 More Replies
farooqurrehman
by New Contributor
  • 1143 Views
  • 3 replies
  • 2 kudos

Unable to connect/read files from ADLS Gen2 using account key

It gives error[RequestId=5e57b66f-b69f-4e8b-8706-3fe5baeb77a0 ErrorClass=METASTORE_DOES_NOT_EXIST] No metastore assigned for the current workspace.using the following codespark.conf.set(  "fs.azure.account.key.mystorageaccount.dfs.core.windows.net", ...

  • 1143 Views
  • 3 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Moderator
  • 2 kudos

Hi @Farooq ur rehman​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 2 kudos
2 More Replies
riccostamendes
by New Contributor II
  • 1043 Views
  • 3 replies
  • 0 kudos

Reading a adls gen 2 csv file from db using standard pandas

i have mounted the adls container where the data are, but I cannot read the files with pandas ('pd.read_csv') even though I have attached the prefix '/dbfs/' to the path . Instead if I use 'spark.read.csv', I have no problems.Anyone knows why this i...

  • 1043 Views
  • 3 replies
  • 0 kudos
Latest Reply
Debayan
Esteemed Contributor III
  • 0 kudos

Hi, When you are not able to read the files, what is the error you get?

  • 0 kudos
2 More Replies
databicky
by Contributor II
  • 9237 Views
  • 12 replies
  • 4 kudos
  • 9237 Views
  • 12 replies
  • 4 kudos
Latest Reply
FerArribas
Contributor
  • 4 kudos

Hi @Hubert Dudek​,​Pandas API doesn't support abfss protocol.You have three options:​If you need to use pandas, you can write the excel to the local file system (dbfs) and then move it to ABFSS (for example with dbutils)Write as csv directly in abfss...

  • 4 kudos
11 More Replies
APol
by New Contributor II
  • 1195 Views
  • 2 replies
  • 2 kudos

Read/Write concurrency issue

Hi. I assume that it can be concurrency issue. (a Read thread from Databricks and a Write thread from another system)From the start:I read 12-16 csv files (approximately 250Mb each of them) to dataframe. df = spark.read.option("header", "False").opti...

  • 1195 Views
  • 2 replies
  • 2 kudos
Latest Reply
FerArribas
Contributor
  • 2 kudos

Hi @Anastasiia Polianska​,I agree, it looks like a concurrency issue. Very possibly this concurrency problem will be caused by an erroneous ETAG in the HTTP call to the Azure Storage API (https://azure.microsoft.com/de-de/blog/managing-concurrency-in...

  • 2 kudos
1 More Replies
kkumar
by New Contributor III
  • 792 Views
  • 2 replies
  • 2 kudos

ADLS Gen 2 Delta Tables memory allocation

if i mount a gen2(ADLS 1) to another gen2(ADLS2) account and create a delta table on ADLS2 will it copy the data or just create something link External table.i don't want to duplicate the the data.

  • 792 Views
  • 2 replies
  • 2 kudos
Latest Reply
Pat
Honored Contributor III
  • 2 kudos

Hi @keerthi kumar​ ,so basically you can CREATE EXTERNAL TABLES on top of the data stored somewhere - in your case ADLS. Data won't be copied, it will stay where it is, by creating external tables you are actually storing the metadata in your metasto...

  • 2 kudos
1 More Replies
Chris_Konsur
by New Contributor III
  • 1231 Views
  • 3 replies
  • 3 kudos

Resolved! to configure Autoloader in File notification mode to access the Premium BlobStorage

First, I tried to configure Autoloader in File notification mode to access the Premium BlobStorage 'databrickspoc1' (PREMIUM , ADLS Gen2). I get this Error: I get this errorcom.microsoft.azure.storage.StorageException: I checked my storage account->N...

  • 1231 Views
  • 3 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

When you created a premium account, have you chosen "Premium account type" as "File shares"? It should be "Block blobs".

  • 3 kudos
2 More Replies
dataexplorer
by New Contributor III
  • 4221 Views
  • 6 replies
  • 5 kudos

Resolved! COPY INTO generating duplicate rows in Delta table

Hello Everyone,I'm trying to bulk load tables from a SQL server database into ADLS as parquet files and then loading these files into Delta tables (raw/bronze). I had done a one off history/base load but my subsequent incremental loads (which had a d...

  • 4221 Views
  • 6 replies
  • 5 kudos
Latest Reply
dataexplorer
New Contributor III
  • 5 kudos

thanks for the guidance!

  • 5 kudos
5 More Replies
pgaddam
by New Contributor II
  • 1542 Views
  • 2 replies
  • 5 kudos

Error while mounting ADLS Gen 2 storage account to Az Databricks

Hello TeamI am facing troubles while mounting storage account onto my databricks. Some background on my setup:Storage Account - stgAcc1 - attached to vnet1 and it's subnetsDatabricks - databricks1 - attached to 'workers-vnet' and subnets - these were...

  • 1542 Views
  • 2 replies
  • 5 kudos
Latest Reply
Vidula
Honored Contributor
  • 5 kudos

Hi @Pranith Gaddam​ Does @Debayan Mukherjee​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 5 kudos
1 More Replies
MattM
by New Contributor III
  • 1247 Views
  • 1 replies
  • 0 kudos

Unstructured Data - PDF and a semi-structured data

I have a scenario where one source is unstructered pdf files and another source is semi-structered JSON files. I get files from these two sources on a daily basis into an ADLS storage. What is the best way to load this into a medallion structure by s...

  • 1247 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Matt M​, Please import this notebook and read this excellent Medallion Architecture article. Let us know how it goes. Thanks.

  • 0 kudos
oussamak
by New Contributor II
  • 1699 Views
  • 2 replies
  • 3 kudos

How to install JAR libraries from ADLS? I'm having an error

I mounted the ADLS to my Azure Databricks resource and I keep on getting this error when I try to install a JAR from a container:Library installation attempted on the driver node of cluster 0331-121709-buk0nvsq and failed. Please refer to the followi...

  • 1699 Views
  • 2 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Oussama KIASSI​ , The error message says :- Failure to initialize configurationInvalid configuration value detected for fs.azure.account.keyYou can't use the storage account access key to access data using the abfss protocol. You need to provide ...

  • 3 kudos
1 More Replies
Ashley1
by Contributor
  • 1211 Views
  • 5 replies
  • 1 kudos

Resolved! Can ADLS be mounted in DBFS using only ADLS account key?

I realise this is not an optimal configuration but I'm trying to pull together a POC and I'm not at the point that I wish to ask the AAD admins to create an application for OAuth authentication.I have been able to use direct references to the ADLS co...

  • 1211 Views
  • 5 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Ashley Betts​ Thank you for posting your question. And you found the solution.This is awesome!Would you be happy to mark the answer as best so that other members can find the solution more quickly?Cheers!

  • 1 kudos
4 More Replies
Labels