cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Bilal1
by New Contributor III
  • 16063 Views
  • 6 replies
  • 2 kudos

Resolved! Simply writing a dataframe to a CSV file (non-partitioned)

When writing a dataframe in Pyspark to a CSV file, a folder is created and a partitioned CSV file is created. I have then rename this file in order to distribute it my end user.Is there any way I can simply write my data to a CSV file, with the name ...

  • 16063 Views
  • 6 replies
  • 2 kudos
Latest Reply
Bilal1
New Contributor III
  • 2 kudos

Thanks for confirming that that's the only way

  • 2 kudos
5 More Replies
William_Scardua
by Valued Contributor
  • 1830 Views
  • 3 replies
  • 1 kudos

Resolved! How to integrate pipeline with Dynatrace ?

Hi guys,Do you know how I integrate pipeline some data to Dynatrace ?Have any idea ?Thank you

  • 1830 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @William Scardua​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

  • 1 kudos
2 More Replies
alejandrofm
by Valued Contributor
  • 1508 Views
  • 2 replies
  • 2 kudos

Resolved! Lot of write shuffle on optimize + ZORDER, is it normal?

Hi! I'm optimizing several Tb of partitioned data on ZSTD lvl 9.It surprises me the level of shuffle write, it could make sense because of ZORDER but I want to be sure that I'm not missing something, here is some context: Could I be missing something...

image image.png image
  • 1508 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Alejandro Martinez​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

  • 2 kudos
1 More Replies
RengarLee
by Contributor
  • 3949 Views
  • 11 replies
  • 3 kudos

Resolved! Databricks write to Azure Data Explorer writes suddenly become slower

Now, I write to Azure Data explorer using Spark streaming. one day, writes suddenly become slower. restart is no effect.I have a questions about Spark Streaming to Azure Data explorer.Q1: What should I do to get performance to reply?Figure 1 shows th...

  • 3949 Views
  • 11 replies
  • 3 kudos
Latest Reply
RengarLee
Contributor
  • 3 kudos

I'm so sorry, I just thought the issue wasn't resolvedSolutionSet maxFilesPerTrigger and maxBytesPerTrigger Enable autpoptimizeReason for the first day, it processes larger files and then eventually process smaller files。Detailed reason B...

  • 3 kudos
10 More Replies
KamKam
by New Contributor
  • 697 Views
  • 2 replies
  • 0 kudos

How to write to a folder in a Azure Data Lake container using Delta?

Hi All,How to write to a folder in a Azure Data Lake container using Delta?When I run:write_mode = 'overwrite' write_format = 'delta' save_path = '/mnt/container-name/folder-name'   df.write \ .mode(write_mode) \ .format(write_format) \ ....

  • 697 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Hi @Kamalen Reddy​ ,Could you share the error message please?

  • 0 kudos
1 More Replies
boskicl
by New Contributor III
  • 13131 Views
  • 5 replies
  • 10 kudos

Resolved! Table write command stuck "Filtering files for query."

Hello all,Background:I am having an issue today with databricks using pyspark-sql and writing a delta table. The dataframe is made by doing an inner join between two tables and that is the table which I am trying to write to a delta table. The table ...

filtering job_info spill_memory
  • 13131 Views
  • 5 replies
  • 10 kudos
Latest Reply
Anonymous
Not applicable
  • 10 kudos

@Ljuboslav Boskic​ there can be multiple reasons why the query is taking more time , during this phase metadata look-up activity happens, can you please check on the below things Ensuring the tables are z-ordered properly, and that the merge key (on ...

  • 10 kudos
4 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 714 Views
  • 1 replies
  • 15 kudos

Resolved! Write to Azure Delta Lake - optimization request

Databricks/Delta team could optimize some commands which writes to Azure Blob Storage as Azure display that message:

image
  • 714 Views
  • 1 replies
  • 15 kudos
Latest Reply
Anonymous
Not applicable
  • 15 kudos

Hey there. Thank you for your suggestion. I'll pass this up to the team.

  • 15 kudos
Data_Bricks1
by New Contributor III
  • 1905 Views
  • 7 replies
  • 0 kudos

data from 10 BLOB containers and multiple hierarchical folders(every day and every hour folders) in each container to Delta lake table in parquet format - Incremental loading for latest data only insert no updates

I am able to load data for single container by hard coding, but not able to load from multiple containers. I used for loop, but data frame is loading only last container's last folder record only.Here one more issue is I have to flatten data, when I ...

  • 1905 Views
  • 7 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

for sure function (def) should be declared outside loop, move it after importing libraries,logic is a bit complicated you need to debug it using display(Flatten_df2) (or .show()) and validating json after each iteration (using break or sleep etc.)

  • 0 kudos
6 More Replies
Anonymous
by Not applicable
  • 728 Views
  • 1 replies
  • 0 kudos
  • 728 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Honored Contributor III
  • 0 kudos

In this scenario, the best option would be to have a single readStream reading a source delta table. Since checkpoint logs are controlled when writing to delta tables you would be able to maintain separate logs for each of your writeStreams. I would...

  • 0 kudos
KiranRastogi
by New Contributor
  • 26745 Views
  • 2 replies
  • 1 kudos

Pandas dataframe to a table

I want to write a pandas dataframe to a table, how can I do this ? Write command is not working, please help.

  • 26745 Views
  • 2 replies
  • 1 kudos
Latest Reply
amy_wang
New Contributor II
  • 1 kudos

Hey Kiran, Just taking a stab in the dark but do you want to convert the Pandas DataFrame to a Spark DataFrame and then write out the Spark DataFrame as a non-temporary SQL table? import pandas as pd ## Create Pandas Frame pd_df = pd.DataFrame({u'20...

  • 1 kudos
1 More Replies
Labels