cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

THIAM_HUATTAN
by Valued Contributor
  • 1716 Views
  • 6 replies
  • 6 kudos

Resolved! Delta Lake’s CDF Feature

https://www.databricks.com/notebooks/delta-lake-cdf.htmlI am trying to understand the above article. Could someone explain to be the below questions?a) From SELECT * FROM table_changes('gold_consensus_eps', 2)why is consensus_eps values of 2.1 and 2....

  • 1716 Views
  • 6 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hi @THIAM HUAT TAN​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

  • 6 kudos
5 More Replies
vicusbass
by New Contributor II
  • 4932 Views
  • 3 replies
  • 0 kudos

How to extract values from JSON array field?

Hi,While creating an SQL notebook, I am struggling with extracting some values from a JSON array field. I need to create a view where a field would be an array with values extracted from a field like the one below, specifically I need the `value` fi...

  • 4932 Views
  • 3 replies
  • 0 kudos
Latest Reply
vicusbass
New Contributor II
  • 0 kudos

Maybe I didn't explain it correctly. The JSON snippet from the description is a cell from a row from a table.

  • 0 kudos
2 More Replies
sarvesh
by Contributor III
  • 2735 Views
  • 4 replies
  • 3 kudos

read percentage values in spark ( no casting )

I have a xlsx file which has a single column ;percentage30%40%50%-10%0.00%0%0.10%110%99.99%99.98%-99.99%-99.98%when i read this using Apache-Spark out put i get is,|percentage|+----------+| 0.3|| 0.4|| 0.5|| -0.1|| 0.0|| ...

  • 2735 Views
  • 4 replies
  • 3 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 3 kudos

Affirmative. This is how excel stores percentages. What you see is just cell formatting.Databricks notebooks do not (yet?) have the possibility to format the output.But it is easy to use a BI tool on top of Databricks, where you can change the for...

  • 3 kudos
3 More Replies
Artem_Yevtushen
by New Contributor III
  • 1156 Views
  • 1 replies
  • 2 kudos

Show all distinct values per column in dataframe Problem Statement:I want to see all the distinct values per column for my entire table, but a SQL que...

Show all distinct values per column in dataframeProblem Statement:I want to see all the distinct values per column for my entire table, but a SQL query with a collect_set() on every column is not dynamic and too long to write.Use this code to show th...

collect set table
  • 1156 Views
  • 1 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

@Artem Yevtushenko​ - This is great! Thank you for sharing!

  • 2 kudos
Anonymous
by Not applicable
  • 1392 Views
  • 2 replies
  • 0 kudos

Resolved! Is there a way to validate the values of spark configs?

We can set for example:spark.conf.set('aaa.test.junk.config', 99999) , and then run spark.conf.get("aaa.test.junk.config”) which will return a value.The problem occurs when incorrectly setting to a similar matching property.spark.conf.set('spark.sql....

  • 1392 Views
  • 2 replies
  • 0 kudos
Latest Reply
User16857281974
Contributor
  • 0 kudos

You would solve this just like we solve this problem for all lose string references. Namely, that is to create a constant that represents the key-value you want to ensure doesn't get mistyped.Naturally, if you type it wrong the first time, it will be...

  • 0 kudos
1 More Replies
User16826987838
by Contributor
  • 758 Views
  • 1 replies
  • 0 kudos
  • 758 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 0 kudos

delta.logRetentionDuration - 30 daysdelta.deletedFileRetentionDuration - 7 days

  • 0 kudos
User16783853501
by New Contributor II
  • 1127 Views
  • 3 replies
  • 0 kudos

best practice for optimizedWrites and Optimize

What is the best practice for a delta pipeline with very high throughput to avoid small files problem and also reduce the need for external OPTIMIZE frequently?  

  • 1127 Views
  • 3 replies
  • 0 kudos
Latest Reply
User16869510359
Esteemed Contributor
  • 0 kudos

The general practice in use is to enable only optimize writes and disable auto-compaction. This is because the optimize writes will introduce an extra shuffle step which will increase the latency of the write operation. In addition to that, the auto-...

  • 0 kudos
2 More Replies
Labels