cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Christine
by Contributor
  • 3844 Views
  • 8 replies
  • 5 kudos

Resolved! pyspark dataframe empties after it has been saved to delta lake.

Hi, I am facing a problem that I hope to get some help to understand. I have created a function that is supposed to check if the input data already exist in a saved delta table and if not, it should create some calculations and append the new data to...

  • 3844 Views
  • 8 replies
  • 5 kudos
Latest Reply
SharathE
New Contributor II
  • 5 kudos

Hi,im also having similar issue ..does creating temp view and reading it again after saving to a table works?? /

  • 5 kudos
7 More Replies
iptkrisna
by New Contributor III
  • 467 Views
  • 1 replies
  • 2 kudos

Jobs Data Pipeline Runtime Increase Significantly

Hi, I am facing an issue where one of my jobs taking so long since certain time, previously its only needs less than 1 hour to run a batch job that load json data and do a truncate and load to a delta table, but since june 2nd, it become so long that...

  • 467 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @krisna math​  Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 2 kudos
az38
by New Contributor II
  • 3199 Views
  • 2 replies
  • 3 kudos

load files filtered by last_modified in PySpark

Hi, community!How do you think what is the best way to load from Azure ADLS (actually, filesystem doesn't matter) into df onli files modified after some point in time?Is there any function like input_file_name() but for last_modified to use it in a w...

  • 3199 Views
  • 2 replies
  • 3 kudos
Latest Reply
venkatcrc
New Contributor III
  • 3 kudos

_metadata will provide file modification timestamp. I tried on dbfs but not sure for ADLS.https://docs.databricks.com/ingestion/file-metadata-column.html

  • 3 kudos
1 More Replies
guostong
by New Contributor III
  • 897 Views
  • 3 replies
  • 1 kudos

Issues to load from ADLS in DLT

I am using DLT to load csv in ADLS, below is my sql query in notebook:CREATE OR REFRESH STREAMING LIVE TABLE test_account_raw AS SELECT * FROM cloud_files( "abfss://my_container@my_storageaccount.dfs.core.windows.net/test_csv/", "csv", map("h...

  • 897 Views
  • 3 replies
  • 1 kudos
Latest Reply
guostong
New Contributor III
  • 1 kudos

thank you every one, the problem is resolved, problem is gone when I have workspace admin access.

  • 1 kudos
2 More Replies
THIAM_HUATTAN
by Valued Contributor
  • 2012 Views
  • 5 replies
  • 4 kudos

Resolved! Using R, how do we write csv file to say dbfs:/tmp?

let us say I already have the data 'TotalData'write.csv(TotalData,file='/tmp/TotalData.csv',row.names = FALSE)I do not see any error from abovewhen I list files below:%fs ls /tmpI do not see any files written there. Why?

  • 2012 Views
  • 5 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @THIAM HUAT TAN​  We haven’t heard from you since the last response from @Cedric Law Hing Ping​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to oth...

  • 4 kudos
4 More Replies
Hola1801
by New Contributor
  • 943 Views
  • 3 replies
  • 3 kudos

Resolved! Float Value change when Load with spark? Full Path?

Hello,I have created my table in Databricks, at this point everything is perfect i got the same value than in my CSV. for my column "Exposure" I have :0 0,00 1 0,00 2 0,00 3 0,00 4 0,00 ...But when I load my fi...

  • 943 Views
  • 3 replies
  • 3 kudos
Latest Reply
jose_gonzalez
Moderator
  • 3 kudos

Hi @Anis Ben Salem​ ,How do you read your CSV file? do you use Pandas or Pyspark APIs? also, how do you created your table?could you share more details on the code you are trying to run?

  • 3 kudos
2 More Replies
Kaniz
by Community Manager
  • 934 Views
  • 3 replies
  • 2 kudos
  • 934 Views
  • 3 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

as @Kaniz Fatma​ wrote you can use native functions for it:df = spark.read.format("csv").option("header", "true").load("file.csv")Alternative really nice way is to use sql syntax for that:%sql CREATE TEMPORARY VIEW diamonds USING CSV OPTIONS (path "/...

  • 2 kudos
2 More Replies
Labels