cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Help trying to calculate a percentage

DanVartanian
New Contributor II

The image below shows what my source data is (HAVE) and what I'm trying to get to (WANT).

I want to be able to calculate the percentage of bad messages (where formattedMessage = false) by source and date.

I'm not sure how to achieve this in DatabricksSql. Any help appreciated.

havewant 

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

you could use a windows function over source and date with a sum of messagecount. This gives you the total per source/date repeated on every line.

Then apply a filter on formattedmessage == false and divide messagecount by the sum above.

View solution in original post

3 REPLIES 3

Thank you so much

-werners-
Esteemed Contributor III

you could use a windows function over source and date with a sum of messagecount. This gives you the total per source/date repeated on every line.

Then apply a filter on formattedmessage == false and divide messagecount by the sum above.

Thank you, I was able to get it following your instructions😀

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group