cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Christine
by Contributor II
  • 8616 Views
  • 9 replies
  • 5 kudos

Resolved! pyspark dataframe empties after it has been saved to delta lake.

Hi, I am facing a problem that I hope to get some help to understand. I have created a function that is supposed to check if the input data already exist in a saved delta table and if not, it should create some calculations and append the new data to...

  • 8616 Views
  • 9 replies
  • 5 kudos
Latest Reply
SharathE
New Contributor III
  • 5 kudos

Hi,im also having similar issue ..does creating temp view and reading it again after saving to a table works?? /

  • 5 kudos
8 More Replies
iptkrisna
by New Contributor III
  • 1131 Views
  • 1 replies
  • 2 kudos

Jobs Data Pipeline Runtime Increase Significantly

Hi, I am facing an issue where one of my jobs taking so long since certain time, previously its only needs less than 1 hour to run a batch job that load json data and do a truncate and load to a delta table, but since june 2nd, it become so long that...

  • 1131 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @krisna math​  Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 2 kudos
az38
by New Contributor II
  • 6598 Views
  • 2 replies
  • 3 kudos

load files filtered by last_modified in PySpark

Hi, community!How do you think what is the best way to load from Azure ADLS (actually, filesystem doesn't matter) into df onli files modified after some point in time?Is there any function like input_file_name() but for last_modified to use it in a w...

  • 6598 Views
  • 2 replies
  • 3 kudos
Latest Reply
venkatcrc
New Contributor III
  • 3 kudos

_metadata will provide file modification timestamp. I tried on dbfs but not sure for ADLS.https://docs.databricks.com/ingestion/file-metadata-column.html

  • 3 kudos
1 More Replies
guostong
by New Contributor III
  • 1877 Views
  • 3 replies
  • 1 kudos

Issues to load from ADLS in DLT

I am using DLT to load csv in ADLS, below is my sql query in notebook:CREATE OR REFRESH STREAMING LIVE TABLE test_account_raw AS SELECT * FROM cloud_files( "abfss://my_container@my_storageaccount.dfs.core.windows.net/test_csv/", "csv", map("h...

  • 1877 Views
  • 3 replies
  • 1 kudos
Latest Reply
guostong
New Contributor III
  • 1 kudos

thank you every one, the problem is resolved, problem is gone when I have workspace admin access.

  • 1 kudos
2 More Replies
THIAM_HUATTAN
by Valued Contributor
  • 3886 Views
  • 3 replies
  • 3 kudos

Using R, how do we write csv file to say dbfs:/tmp?

let us say I already have the data 'TotalData'write.csv(TotalData,file='/tmp/TotalData.csv',row.names = FALSE)I do not see any error from abovewhen I list files below:%fs ls /tmpI do not see any files written there. Why?

  • 3886 Views
  • 3 replies
  • 3 kudos
Latest Reply
Cedric
Databricks Employee
  • 3 kudos

Hi Thiam,Thank you for reaching out to us. In this case it seems that you have written a file to the OS /tmp and tried to fetch the same folder in DBFS.Written >> /tmp/TotalData.csvReading >> /dbfs/tmp/TotalData.csvPlease try to execute write.csv wit...

  • 3 kudos
2 More Replies
Hola1801
by New Contributor
  • 2016 Views
  • 3 replies
  • 3 kudos

Resolved! Float Value change when Load with spark? Full Path?

Hello,I have created my table in Databricks, at this point everything is perfect i got the same value than in my CSV. for my column "Exposure" I have :0 0,00 1 0,00 2 0,00 3 0,00 4 0,00 ...But when I load my fi...

  • 2016 Views
  • 3 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 3 kudos

Hi @Anis Ben Salem​ ,How do you read your CSV file? do you use Pandas or Pyspark APIs? also, how do you created your table?could you share more details on the code you are trying to run?

  • 3 kudos
2 More Replies
Labels