cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

databicky
by Contributor II
  • 17993 Views
  • 13 replies
  • 4 kudos
  • 17993 Views
  • 13 replies
  • 4 kudos
Latest Reply
FerArribas
Contributor
  • 4 kudos

Hi @Hubert Dudek​,​Pandas API doesn't support abfss protocol.You have three options:​If you need to use pandas, you can write the excel to the local file system (dbfs) and then move it to ABFSS (for example with dbutils)Write as csv directly in abfss...

  • 4 kudos
12 More Replies
az38
by New Contributor II
  • 7131 Views
  • 2 replies
  • 3 kudos

load files filtered by last_modified in PySpark

Hi, community!How do you think what is the best way to load from Azure ADLS (actually, filesystem doesn't matter) into df onli files modified after some point in time?Is there any function like input_file_name() but for last_modified to use it in a w...

  • 7131 Views
  • 2 replies
  • 3 kudos
Latest Reply
venkatcrc
New Contributor III
  • 3 kudos

_metadata will provide file modification timestamp. I tried on dbfs but not sure for ADLS.https://docs.databricks.com/ingestion/file-metadata-column.html

  • 3 kudos
1 More Replies
Krish1
by New Contributor II
  • 9013 Views
  • 4 replies
  • 0 kudos

Error while mounting ADLS in python using AccountKey

I'm using the below code using Account key to mount ADLS in python but running into error:shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.lang.IllegalArgumentException: The String is not a valid Base64-encoded string. Can you pleas...

  • 9013 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Krish Lam​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

  • 0 kudos
3 More Replies
Josh_Stafford
by New Contributor II
  • 2409 Views
  • 2 replies
  • 1 kudos

Using dbutils.fs.ls on URI with square brackets results in error

Square brackets in ADLS are accepted, so why can't I list the files in the folder? I have tried escaping the square brackets manually, but then the escaped values are re-escaped from %5B to %255B and %5D to %255D. I get:URISyntaxException: Illegal ...

  • 2409 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Joshua Stafford​ :The URISyntaxException error you are encountering is likely due to the fact that square brackets are reserved characters in URIs (Uniform Resource Identifiers) and need to be properly encoded when used in a URL. In this case, it ap...

  • 1 kudos
1 More Replies
pankajBhatt
by New Contributor II
  • 1752 Views
  • 1 replies
  • 1 kudos

Databricks not able to access latest files in Azure ADLS Gen1

I have mounted my path from Databricks to AzureADLS Gen1. using SPN as service accuntuntill yesterday everything was ok, but today I see, I can view all older deleted folders. I can not see them in ADLS . but my databricks dbutils.fs.ls() shows them....

  • 1752 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @pankaj bhatt​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...

  • 1 kudos
farooqurrehman
by New Contributor
  • 2297 Views
  • 3 replies
  • 2 kudos

Unable to connect/read files from ADLS Gen2 using account key

It gives error[RequestId=5e57b66f-b69f-4e8b-8706-3fe5baeb77a0 ErrorClass=METASTORE_DOES_NOT_EXIST] No metastore assigned for the current workspace.using the following codespark.conf.set(  "fs.azure.account.key.mystorageaccount.dfs.core.windows.net", ...

  • 2297 Views
  • 3 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Hi @Farooq ur rehman​,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

  • 2 kudos
2 More Replies
riccostamendes
by New Contributor II
  • 2259 Views
  • 3 replies
  • 0 kudos

Reading a adls gen 2 csv file from db using standard pandas

i have mounted the adls container where the data are, but I cannot read the files with pandas ('pd.read_csv') even though I have attached the prefix '/dbfs/' to the path . Instead if I use 'spark.read.csv', I have no problems.Anyone knows why this i...

  • 2259 Views
  • 3 replies
  • 0 kudos
Latest Reply
Debayan
Databricks Employee
  • 0 kudos

Hi, When you are not able to read the files, what is the error you get?

  • 0 kudos
2 More Replies
APol
by New Contributor II
  • 3027 Views
  • 2 replies
  • 2 kudos

Read/Write concurrency issue

Hi. I assume that it can be concurrency issue. (a Read thread from Databricks and a Write thread from another system)From the start:I read 12-16 csv files (approximately 250Mb each of them) to dataframe. df = spark.read.option("header", "False").opti...

  • 3027 Views
  • 2 replies
  • 2 kudos
Latest Reply
FerArribas
Contributor
  • 2 kudos

Hi @Anastasiia Polianska​,I agree, it looks like a concurrency issue. Very possibly this concurrency problem will be caused by an erroneous ETAG in the HTTP call to the Azure Storage API (https://azure.microsoft.com/de-de/blog/managing-concurrency-in...

  • 2 kudos
1 More Replies
kkumar
by New Contributor III
  • 1645 Views
  • 2 replies
  • 2 kudos

ADLS Gen 2 Delta Tables memory allocation

if i mount a gen2(ADLS 1) to another gen2(ADLS2) account and create a delta table on ADLS2 will it copy the data or just create something link External table.i don't want to duplicate the the data.

  • 1645 Views
  • 2 replies
  • 2 kudos
Latest Reply
Pat
Honored Contributor III
  • 2 kudos

Hi @keerthi kumar​ ,so basically you can CREATE EXTERNAL TABLES on top of the data stored somewhere - in your case ADLS. Data won't be copied, it will stay where it is, by creating external tables you are actually storing the metadata in your metasto...

  • 2 kudos
1 More Replies
Chris_Konsur
by New Contributor III
  • 2608 Views
  • 2 replies
  • 3 kudos

Resolved! to configure Autoloader in File notification mode to access the Premium BlobStorage

First, I tried to configure Autoloader in File notification mode to access the Premium BlobStorage 'databrickspoc1' (PREMIUM , ADLS Gen2). I get this Error: I get this errorcom.microsoft.azure.storage.StorageException: I checked my storage account->N...

  • 2608 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

When you created a premium account, have you chosen "Premium account type" as "File shares"? It should be "Block blobs".

  • 3 kudos
1 More Replies
dataexplorer
by New Contributor III
  • 8476 Views
  • 6 replies
  • 5 kudos

Resolved! COPY INTO generating duplicate rows in Delta table

Hello Everyone,I'm trying to bulk load tables from a SQL server database into ADLS as parquet files and then loading these files into Delta tables (raw/bronze). I had done a one off history/base load but my subsequent incremental loads (which had a d...

  • 8476 Views
  • 6 replies
  • 5 kudos
Latest Reply
dataexplorer
New Contributor III
  • 5 kudos

thanks for the guidance!

  • 5 kudos
5 More Replies
pgaddam
by New Contributor II
  • 3168 Views
  • 2 replies
  • 5 kudos

Error while mounting ADLS Gen 2 storage account to Az Databricks

Hello TeamI am facing troubles while mounting storage account onto my databricks. Some background on my setup:Storage Account - stgAcc1 - attached to vnet1 and it's subnetsDatabricks - databricks1 - attached to 'workers-vnet' and subnets - these were...

  • 3168 Views
  • 2 replies
  • 5 kudos
Latest Reply
Vidula
Honored Contributor
  • 5 kudos

Hi @Pranith Gaddam​ Does @Debayan Mukherjee​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?We'd love to hear from you.Thanks!

  • 5 kudos
1 More Replies
MattM
by New Contributor III
  • 2477 Views
  • 0 replies
  • 0 kudos

Unstructured Data - PDF and a semi-structured data

I have a scenario where one source is unstructered pdf files and another source is semi-structered JSON files. I get files from these two sources on a daily basis into an ADLS storage. What is the best way to load this into a medallion structure by s...

  • 2477 Views
  • 0 replies
  • 0 kudos
Ashley1
by Contributor
  • 2829 Views
  • 5 replies
  • 1 kudos

Resolved! Can ADLS be mounted in DBFS using only ADLS account key?

I realise this is not an optimal configuration but I'm trying to pull together a POC and I'm not at the point that I wish to ask the AAD admins to create an application for OAuth authentication.I have been able to use direct references to the ADLS co...

  • 2829 Views
  • 5 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Ashley Betts​ Thank you for posting your question. And you found the solution.This is awesome!Would you be happy to mark the answer as best so that other members can find the solution more quickly?Cheers!

  • 1 kudos
4 More Replies
Labels