cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

boskicl
by New Contributor III
  • 27746 Views
  • 6 replies
  • 10 kudos

Resolved! Table write command stuck "Filtering files for query."

Hello all,Background:I am having an issue today with databricks using pyspark-sql and writing a delta table. The dataframe is made by doing an inner join between two tables and that is the table which I am trying to write to a delta table. The table ...

filtering job_info spill_memory
  • 27746 Views
  • 6 replies
  • 10 kudos
Latest Reply
timo199
New Contributor II
  • 10 kudos

Even if I vacuum and optimize, it keeps getting stuck.cluster type is r6gd.xlarge min:4, max:6driver type is r6gd.2xlarge

  • 10 kudos
5 More Replies
Bilal1
by New Contributor III
  • 29400 Views
  • 7 replies
  • 2 kudos

Resolved! Simply writing a dataframe to a CSV file (non-partitioned)

When writing a dataframe in Pyspark to a CSV file, a folder is created and a partitioned CSV file is created. I have then rename this file in order to distribute it my end user.Is there any way I can simply write my data to a CSV file, with the name ...

  • 29400 Views
  • 7 replies
  • 2 kudos
Latest Reply
chris0706
New Contributor II
  • 2 kudos

I know this post is a little old, but Chat GPT actually put together a very clean and straightforward solution for me (in scala): // Set the temporary output directory and the desired final file pathval tempDir = "/tmp/your_file_name"val finalOutputP...

  • 2 kudos
6 More Replies
William_Scardua
by Valued Contributor
  • 4923 Views
  • 3 replies
  • 1 kudos

Resolved! How to integrate pipeline with Dynatrace ?

Hi guys,Do you know how I integrate pipeline some data to Dynatrace ?Have any idea ?Thank you

  • 4923 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @William Scardua​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

  • 1 kudos
2 More Replies
alejandrofm
by Valued Contributor
  • 3707 Views
  • 2 replies
  • 2 kudos

Resolved! Lot of write shuffle on optimize + ZORDER, is it normal?

Hi! I'm optimizing several Tb of partitioned data on ZSTD lvl 9.It surprises me the level of shuffle write, it could make sense because of ZORDER but I want to be sure that I'm not missing something, here is some context: Could I be missing something...

image image.png image
  • 3707 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Alejandro Martinez​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best an...

  • 2 kudos
1 More Replies
RengarLee
by Contributor
  • 8309 Views
  • 10 replies
  • 3 kudos

Resolved! Databricks write to Azure Data Explorer writes suddenly become slower

Now, I write to Azure Data explorer using Spark streaming. one day, writes suddenly become slower. restart is no effect.I have a questions about Spark Streaming to Azure Data explorer.Q1: What should I do to get performance to reply?Figure 1 shows th...

  • 8309 Views
  • 10 replies
  • 3 kudos
Latest Reply
RengarLee
Contributor
  • 3 kudos

I'm so sorry, I just thought the issue wasn't resolvedSolutionSet maxFilesPerTrigger and maxBytesPerTrigger Enable autpoptimizeReason for the first day, it processes larger files and then eventually process smaller files。Detailed reason B...

  • 3 kudos
9 More Replies
KamKam
by New Contributor
  • 1304 Views
  • 2 replies
  • 0 kudos

How to write to a folder in a Azure Data Lake container using Delta?

Hi All,How to write to a folder in a Azure Data Lake container using Delta?When I run:write_mode = 'overwrite' write_format = 'delta' save_path = '/mnt/container-name/folder-name'   df.write \ .mode(write_mode) \ .format(write_format) \ ....

  • 1304 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @Kamalen Reddy​ ,Could you share the error message please?

  • 0 kudos
1 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 1510 Views
  • 1 replies
  • 15 kudos

Resolved! Write to Azure Delta Lake - optimization request

Databricks/Delta team could optimize some commands which writes to Azure Blob Storage as Azure display that message:

image
  • 1510 Views
  • 1 replies
  • 15 kudos
Latest Reply
Anonymous
Not applicable
  • 15 kudos

Hey there. Thank you for your suggestion. I'll pass this up to the team.

  • 15 kudos
Data_Bricks1
by New Contributor III
  • 3958 Views
  • 7 replies
  • 0 kudos

data from 10 BLOB containers and multiple hierarchical folders(every day and every hour folders) in each container to Delta lake table in parquet format - Incremental loading for latest data only insert no updates

I am able to load data for single container by hard coding, but not able to load from multiple containers. I used for loop, but data frame is loading only last container's last folder record only.Here one more issue is I have to flatten data, when I ...

  • 3958 Views
  • 7 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

for sure function (def) should be declared outside loop, move it after importing libraries,logic is a bit complicated you need to debug it using display(Flatten_df2) (or .show()) and validating json after each iteration (using break or sleep etc.)

  • 0 kudos
6 More Replies
Anonymous
by Not applicable
  • 1854 Views
  • 1 replies
  • 0 kudos
  • 1854 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

In this scenario, the best option would be to have a single readStream reading a source delta table. Since checkpoint logs are controlled when writing to delta tables you would be able to maintain separate logs for each of your writeStreams. I would...

  • 0 kudos
KiranRastogi
by New Contributor
  • 37509 Views
  • 2 replies
  • 2 kudos

Pandas dataframe to a table

I want to write a pandas dataframe to a table, how can I do this ? Write command is not working, please help.

  • 37509 Views
  • 2 replies
  • 2 kudos
Latest Reply
amy_wang
New Contributor II
  • 2 kudos

Hey Kiran, Just taking a stab in the dark but do you want to convert the Pandas DataFrame to a Spark DataFrame and then write out the Spark DataFrame as a non-temporary SQL table? import pandas as pd ## Create Pandas Frame pd_df = pd.DataFrame({u'20...

  • 2 kudos
1 More Replies
Labels