Data Engineering

Forum Posts

Sorted by:

by Christine • Contributor II

05-24-2022 11:42:57 PM

9435 Views
9 replies
5 kudos

Resolved! pyspark dataframe empties after it has been saved to delta lake.

Hi, I am facing a problem that I hope to get some help to understand. I have created a function that is supposed to check if the input data already exist in a saved delta table and if not, it should create some calculations and append the new data to...

Data Engineering

9435 Views
9 replies
5 kudos

05-24-2022 11:42:57 PM

View Replies

Latest Reply

SharathE
New Contributor III

09-23-2023 11:04:59 AM

5 kudos

Hi,im also having similar issue ..does creating temp view and reading it again after saving to a table works?? /

5 kudos

09-23-2023 11:04:59 AM

8 More Replies

by iptkrisna • New Contributor III

06-14-2023 3:26:53 AM

1279 Views
1 replies
2 kudos

Jobs Data Pipeline Runtime Increase Significantly

Hi, I am facing an issue where one of my jobs taking so long since certain time, previously its only needs less than 1 hour to run a batch job that load json data and do a truncate and load to a delta table, but since june 2nd, it become so long that...

Data Engineering

1279 Views
1 replies
2 kudos

06-14-2023 3:26:53 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-15-2023 11:49:47 PM

2 kudos

Hi @krisna math Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

2 kudos

06-15-2023 11:49:47 PM

by az38 • New Contributor II

05-19-2023 4:26:04 AM

7418 Views
2 replies
3 kudos

load files filtered by last_modified in PySpark

Hi, community!How do you think what is the best way to load from Azure ADLS (actually, filesystem doesn't matter) into df onli files modified after some point in time?Is there any function like input_file_name() but for last_modified to use it in a w...

Data Engineering

7418 Views
2 replies
3 kudos

05-19-2023 4:26:04 AM

View Replies

Latest Reply

venkatcrc
New Contributor III

05-22-2023 10:57:26 AM

3 kudos

_metadata will provide file modification timestamp. I tried on dbfs but not sure for ADLS.https://docs.databricks.com/ingestion/file-metadata-column.html

3 kudos

05-22-2023 10:57:26 AM

1 More Replies

by guostong • New Contributor III

05-01-2023 8:34:43 AM

2038 Views
3 replies
1 kudos

Issues to load from ADLS in DLT

I am using DLT to load csv in ADLS, below is my sql query in notebook:CREATE OR REFRESH STREAMING LIVE TABLE test_account_raw AS SELECT * FROM cloud_files( "abfss://my_container@my_storageaccount.dfs.core.windows.net/test_csv/", "csv", map("h...

Data Engineering

2038 Views
3 replies
1 kudos

05-01-2023 8:34:43 AM

View Replies

Latest Reply

guostong
New Contributor III

05-18-2023 6:57:24 AM

1 kudos

thank you every one, the problem is resolved, problem is gone when I have workspace admin access.

1 kudos

05-18-2023 6:57:24 AM

2 More Replies

by THIAM_HUATTAN • Valued Contributor

10-04-2022 8:35:56 PM

4129 Views
3 replies
3 kudos

Using R, how do we write csv file to say dbfs:/tmp?

let us say I already have the data 'TotalData'write.csv(TotalData,file='/tmp/TotalData.csv',row.names = FALSE)I do not see any error from abovewhen I list files below:%fs ls /tmpI do not see any files written there. Why?

Data Engineering

4129 Views
3 replies
3 kudos

10-04-2022 8:35:56 PM

View Replies

Latest Reply

Cedric
Databricks Employee

10-05-2022 5:40:52 AM

3 kudos

Hi Thiam,Thank you for reaching out to us. In this case it seems that you have written a file to the OS /tmp and tried to fetch the same folder in DBFS.Written >> /tmp/TotalData.csvReading >> /dbfs/tmp/TotalData.csvPlease try to execute write.csv wit...

3 kudos

10-05-2022 5:40:52 AM

2 More Replies

by Hola1801 • New Contributor

11-12-2021 1:15:18 PM

2224 Views
3 replies
3 kudos

Resolved! Float Value change when Load with spark? Full Path?

Hello,I have created my table in Databricks, at this point everything is perfect i got the same value than in my CSV. for my column "Exposure" I have :0 0,00 1 0,00 2 0,00 3 0,00 4 0,00 ...But when I load my fi...

Data Engineering

2224 Views
3 replies
3 kudos

11-12-2021 1:15:18 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

11-22-2021 2:57:12 PM

3 kudos

Hi @Anis Ben Salem ,How do you read your CSV file? do you use Pandas or Pyspark APIs? also, how do you created your table?could you share more details on the code you are trying to run?

3 kudos

11-22-2021 2:57:12 PM

2 More Replies

Databricks Community

Resolved! pyspark dataframe empties after it has been saved to delta lake.

Jobs Data Pipeline Runtime Increase Significantly

load files filtered by last_modified in PySpark

Issues to load from ADLS in DLT

Using R, how do we write csv file to say dbfs:/tmp?

Resolved! Float Value change when Load with spark? Full Path?