cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

THIAM_HUATTAN
by Valued Contributor
  • 3105 Views
  • 6 replies
  • 6 kudos

Resolved! Delta Lake’s CDF Feature

https://www.databricks.com/notebooks/delta-lake-cdf.htmlI am trying to understand the above article. Could someone explain to be the below questions?a) From SELECT * FROM table_changes('gold_consensus_eps', 2)why is consensus_eps values of 2.1 and 2....

  • 3105 Views
  • 6 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hi @THIAM HUAT TAN​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

  • 6 kudos
5 More Replies
vicusbass
by New Contributor II
  • 12858 Views
  • 3 replies
  • 1 kudos

How to extract values from JSON array field?

Hi,While creating an SQL notebook, I am struggling with extracting some values from a JSON array field. I need to create a view where a field would be an array with values extracted from a field like the one below, specifically I need the `value` fi...

  • 12858 Views
  • 3 replies
  • 1 kudos
Latest Reply
vicusbass
New Contributor II
  • 1 kudos

Maybe I didn't explain it correctly. The JSON snippet from the description is a cell from a row from a table.

  • 1 kudos
2 More Replies
sarvesh
by Contributor III
  • 3952 Views
  • 4 replies
  • 3 kudos

read percentage values in spark ( no casting )

I have a xlsx file which has a single column ;percentage30%40%50%-10%0.00%0%0.10%110%99.99%99.98%-99.99%-99.98%when i read this using Apache-Spark out put i get is,|percentage|+----------+| 0.3|| 0.4|| 0.5|| -0.1|| 0.0|| ...

  • 3952 Views
  • 4 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

Affirmative. This is how excel stores percentages. What you see is just cell formatting.Databricks notebooks do not (yet?) have the possibility to format the output.But it is easy to use a BI tool on top of Databricks, where you can change the for...

  • 3 kudos
3 More Replies
Artem_Yevtushen
by New Contributor III
  • 1946 Views
  • 1 replies
  • 2 kudos

Show all distinct values per column in dataframe Problem Statement:I want to see all the distinct values per column for my entire table, but a SQL que...

Show all distinct values per column in dataframeProblem Statement:I want to see all the distinct values per column for my entire table, but a SQL query with a collect_set() on every column is not dynamic and too long to write.Use this code to show th...

collect set table
  • 1946 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Artem Yevtushenko​ - This is great! Thank you for sharing!

  • 2 kudos
Anonymous
by Not applicable
  • 2210 Views
  • 2 replies
  • 0 kudos

Resolved! Is there a way to validate the values of spark configs?

We can set for example:spark.conf.set('aaa.test.junk.config', 99999) , and then run spark.conf.get("aaa.test.junk.config”) which will return a value.The problem occurs when incorrectly setting to a similar matching property.spark.conf.set('spark.sql....

  • 2210 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16857281974
Contributor
  • 0 kudos

You would solve this just like we solve this problem for all lose string references. Namely, that is to create a constant that represents the key-value you want to ensure doesn't get mistyped.Naturally, if you type it wrong the first time, it will be...

  • 0 kudos
1 More Replies
User16826987838
by Contributor
  • 1126 Views
  • 1 replies
  • 0 kudos
  • 1126 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

delta.logRetentionDuration - 30 daysdelta.deletedFileRetentionDuration - 7 days

  • 0 kudos
User16783853501
by New Contributor II
  • 1932 Views
  • 3 replies
  • 0 kudos

best practice for optimizedWrites and Optimize

What is the best practice for a delta pipeline with very high throughput to avoid small files problem and also reduce the need for external OPTIMIZE frequently?  

  • 1932 Views
  • 3 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

The general practice in use is to enable only optimize writes and disable auto-compaction. This is because the optimize writes will introduce an extra shuffle step which will increase the latency of the write operation. In addition to that, the auto-...

  • 0 kudos
2 More Replies
Labels