cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Christine
by Contributor
  • 4386 Views
  • 9 replies
  • 5 kudos

Resolved! pyspark dataframe empties after it has been saved to delta lake.

Hi, I am facing a problem that I hope to get some help to understand. I have created a function that is supposed to check if the input data already exist in a saved delta table and if not, it should create some calculations and append the new data to...

  • 4386 Views
  • 9 replies
  • 5 kudos
Latest Reply
SharathE
New Contributor II
  • 5 kudos

Hi,im also having similar issue ..does creating temp view and reading it again after saving to a table works?? /

  • 5 kudos
8 More Replies
iptkrisna
by New Contributor III
  • 523 Views
  • 1 replies
  • 2 kudos

Jobs Data Pipeline Runtime Increase Significantly

Hi, I am facing an issue where one of my jobs taking so long since certain time, previously its only needs less than 1 hour to run a batch job that load json data and do a truncate and load to a delta table, but since june 2nd, it become so long that...

  • 523 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @krisna math​  Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 2 kudos
az38
by New Contributor II
  • 3584 Views
  • 2 replies
  • 3 kudos

load files filtered by last_modified in PySpark

Hi, community!How do you think what is the best way to load from Azure ADLS (actually, filesystem doesn't matter) into df onli files modified after some point in time?Is there any function like input_file_name() but for last_modified to use it in a w...

  • 3584 Views
  • 2 replies
  • 3 kudos
Latest Reply
venkatcrc
New Contributor III
  • 3 kudos

_metadata will provide file modification timestamp. I tried on dbfs but not sure for ADLS.https://docs.databricks.com/ingestion/file-metadata-column.html

  • 3 kudos
1 More Replies
guostong
by New Contributor III
  • 989 Views
  • 3 replies
  • 1 kudos

Issues to load from ADLS in DLT

I am using DLT to load csv in ADLS, below is my sql query in notebook:CREATE OR REFRESH STREAMING LIVE TABLE test_account_raw AS SELECT * FROM cloud_files( "abfss://my_container@my_storageaccount.dfs.core.windows.net/test_csv/", "csv", map("h...

  • 989 Views
  • 3 replies
  • 1 kudos
Latest Reply
guostong
New Contributor III
  • 1 kudos

thank you every one, the problem is resolved, problem is gone when I have workspace admin access.

  • 1 kudos
2 More Replies
THIAM_HUATTAN
by Valued Contributor
  • 2242 Views
  • 5 replies
  • 4 kudos

Resolved! Using R, how do we write csv file to say dbfs:/tmp?

let us say I already have the data 'TotalData'write.csv(TotalData,file='/tmp/TotalData.csv',row.names = FALSE)I do not see any error from abovewhen I list files below:%fs ls /tmpI do not see any files written there. Why?

  • 2242 Views
  • 5 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @THIAM HUAT TAN​  We haven’t heard from you since the last response from @Cedric Law Hing Ping​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to oth...

  • 4 kudos
4 More Replies
Hola1801
by New Contributor
  • 1060 Views
  • 3 replies
  • 3 kudos

Resolved! Float Value change when Load with spark? Full Path?

Hello,I have created my table in Databricks, at this point everything is perfect i got the same value than in my CSV. for my column "Exposure" I have :0 0,00 1 0,00 2 0,00 3 0,00 4 0,00 ...But when I load my fi...

  • 1060 Views
  • 3 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Moderator
  • 3 kudos

Hi @Anis Ben Salem​ ,How do you read your CSV file? do you use Pandas or Pyspark APIs? also, how do you created your table?could you share more details on the code you are trying to run?

  • 3 kudos
2 More Replies
Kaniz
by Community Manager
  • 1047 Views
  • 3 replies
  • 2 kudos
  • 1047 Views
  • 3 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

as @Kaniz Fatma​ wrote you can use native functions for it:df = spark.read.format("csv").option("header", "true").load("file.csv")Alternative really nice way is to use sql syntax for that:%sql CREATE TEMPORARY VIEW diamonds USING CSV OPTIONS (path "/...

  • 2 kudos
2 More Replies
Labels