Databricks Community

SivaPK · ‎11-23-2023

Hi,

How to get the distinct count from the below listed image,

keywords = column name

table = appCatalog

keywords (column)

"[""data"",""cis"",""mining"",""financial"",""pso"",""value""]"
"[""bzo"",""employee news"",""news""]"
"[""core.store"",""fbi""]"
"[""data"",""display"",""support""]"
"[""bTravel"",""bTravel Expense"",""non-travel-related expenses"",""travel expenses""]"

For me output should be like, 17 as result. While counting the distinct values it should result 17.

keyword name called 'data' is occurring 2 times in the result as shown above Row 1 and Row 4.

How to achieve the result?

Thanks in advance.

SivaPK · ‎11-23-2023

Hi Kaniz,

Thank you for your reply and time.

I have tried the above query in the SQL Editor. I have to display the distinct count of the keywords from the table.

datatype is string. I have uploaded the csv file as a data for the Table. How to use the previous comment code in SQL Editor in Databricks.?

Query:

select keywords, count(1) as Number_Occurences

from (

-- This subquery transforms the hashtags array into rows

select id

,keywords as Orignal_Keywords_Array

,explode(keywords) as keywords

from ngportal.applicationcatalog

order by id

)

group by keywords

order by count(1) desc

Error:

[DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "explode(keywords)" due to data type mismatch: Parameter 1 requires the ("ARRAY" or "MAP") type, however "keywords" has the type "STRING"

Thank you.