Data Engineering

Forum Posts

Sorted by:

by User16783853501 • Databricks Employee

06-23-2021 2:16:18 PM

3585 Views
4 replies
0 kudos

best practice for optimizedWrites and Optimize

What is the best practice for a delta pipeline with very high throughput to avoid small files problem and also reduce the need for external OPTIMIZE frequently?

Data Engineering

3585 Views
4 replies
0 kudos

06-23-2021 2:16:18 PM

View Replies

Latest Reply

rajkve
New Contributor II

05-20-2025 8:36:31 AM

0 kudos

Hi All,Can anyone who has solved this challenge confirm if the below increases write latency and avoids creating smaller file, based a POC I did, I dont see that behaviour replicable, so I am just wondering. Many thanks.

0 kudos

05-20-2025 8:36:31 AM

3 More Replies

by THIAM_HUATTAN • Valued Contributor

05-15-2023 5:50:55 AM

6385 Views
6 replies
6 kudos

Resolved! Delta Lake’s CDF Feature

https://www.databricks.com/notebooks/delta-lake-cdf.htmlI am trying to understand the above article. Could someone explain to be the below questions?a) From SELECT * FROM table_changes('gold_consensus_eps', 2)why is consensus_eps values of 2.1 and 2....

Data Engineering

6385 Views
6 replies
6 kudos

05-15-2023 5:50:55 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-23-2023 1:58:26 AM

6 kudos

Hi @THIAM HUAT TAN Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

6 kudos

05-23-2023 1:58:26 AM

5 More Replies

by vicusbass • New Contributor II

04-13-2023 11:54:50 PM

21826 Views
3 replies
1 kudos

How to extract values from JSON array field?

Hi,While creating an SQL notebook, I am struggling with extracting some values from a JSON array field. I need to create a view where a field would be an array with values extracted from a field like the one below, specifically I need the `value` fi...

Data Engineering

21826 Views
3 replies
1 kudos

04-13-2023 11:54:50 PM

View Replies

Latest Reply

vicusbass
New Contributor II

04-14-2023 9:26:46 AM

1 kudos

Maybe I didn't explain it correctly. The JSON snippet from the description is a cell from a row from a table.

1 kudos

04-14-2023 9:26:46 AM

2 More Replies

by sarvesh • Contributor III

12-01-2021 5:11:00 AM

6927 Views
4 replies
3 kudos

read percentage values in spark ( no casting )

I have a xlsx file which has a single column ;percentage30%40%50%-10%0.00%0%0.10%110%99.99%99.98%-99.99%-99.98%when i read this using Apache-Spark out put i get is,|percentage|+----------+| 0.3|| 0.4|| 0.5|| -0.1|| 0.0|| ...

Data Engineering

6927 Views
4 replies
3 kudos

12-01-2021 5:11:00 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

12-01-2021 5:42:43 AM

3 kudos

Affirmative. This is how excel stores percentages. What you see is just cell formatting.Databricks notebooks do not (yet?) have the possibility to format the output.But it is easy to use a BI tool on top of Databricks, where you can change the for...

3 kudos

12-01-2021 5:42:43 AM

3 More Replies

by Artem_Y • Databricks Employee

10-13-2021 5:45:06 PM

3838 Views
1 replies
2 kudos

Show all distinct values per column in dataframe Problem Statement:I want to see all the distinct values per column for my entire table, but a SQL que...

Show all distinct values per column in dataframeProblem Statement:I want to see all the distinct values per column for my entire table, but a SQL query with a collect_set() on every column is not dynamic and too long to write.Use this code to show th...

Data Engineering

3838 Views
1 replies
2 kudos

10-13-2021 5:45:06 PM

View Replies

Latest Reply

Anonymous
Not applicable

10-14-2021 10:25:19 AM

2 kudos

@Artem Yevtushenko - This is great! Thank you for sharing!

2 kudos

10-14-2021 10:25:19 AM

by Anonymous • Not applicable

06-10-2021 2:54:43 PM

4731 Views
2 replies
0 kudos

Resolved! Is there a way to validate the values of spark configs?

We can set for example:spark.conf.set('aaa.test.junk.config', 99999) , and then run spark.conf.get("aaa.test.junk.config”) which will return a value.The problem occurs when incorrectly setting to a similar matching property.spark.conf.set('spark.sql....

Data Engineering

4731 Views
2 replies
0 kudos

06-10-2021 2:54:43 PM

View Replies

Latest Reply

User16857281974
Databricks Employee

07-30-2021 2:38:33 PM

0 kudos

You would solve this just like we solve this problem for all lose string references. Namely, that is to create a constant that represents the key-value you want to ensure doesn't get mistyped.Naturally, if you type it wrong the first time, it will be...

0 kudos

07-30-2021 2:38:33 PM

1 More Replies

by User16826987838 • Databricks Employee

06-25-2021 1:57:13 PM

2615 Views
1 replies
0 kudos

I am looking for default values for the following two properties delta.logRetentionDuration delta.deletedFileRetentionDuration Any insight?

Data Engineering

2615 Views
1 replies
0 kudos

06-25-2021 1:57:13 PM

View Replies

Latest Reply

brickster_2018
Databricks Employee

06-25-2021 2:23:38 PM

0 kudos

delta.logRetentionDuration - 30 daysdelta.deletedFileRetentionDuration - 7 days

0 kudos

06-25-2021 2:23:38 PM

Databricks Community

best practice for optimizedWrites and Optimize

Resolved! Delta Lake’s CDF Feature

How to extract values from JSON array field?

read percentage values in spark ( no casting )

Show all distinct values per column in dataframe Problem Statement:I want to see all the distinct values per column for my entire table, but a SQL que...

Resolved! Is there a way to validate the values of spark configs?

I am looking for default values for the following two properties delta.logRetentionDuration delta.deletedFileRetentionDuration Any insight?