cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Purpose and usage of Entity matching in Genie Spaces

michael365
New Contributor III

Hi all,

I'm tyring to improve a Genie space and and experimenting with features like Format assistance and Entity matching for single columns.

Doens anybody know how to use Entity matching feature?

Documentation says:

"Entity matching: Entity matching provides curated lists of distinct values for up to 120 columns where users are likely to reference specific entries, such as states and product categories. This helps Genie match user terminology to actual data values. Each column can include up to 1,024 distinct values, each up to 127 characters in length. Entity matching data is stored in your workspace's storage bucket."

Where can I configure the 1,024 distinct values?

Thanks for help ๐Ÿ˜Š

1 REPLY 1

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @michael365,

You donโ€™t configure those 1,024 distinct values manually anywhere. For each column with Entity matching turned on, Genie automatically scans the underlying table (up to 100M rows) and picks up to 1,024 distinct values (currently the most frequent ones) to store as the entity list in your workspace bucket. Here is a link and a snapshot for reference.

Ashwin_DSA_0-1777537076972.png

What you can configure is... 

Which columns use entity matching: In the Genie Space UI: Configure > Data > [table] > pencil icon on column > Advanced > Entity matching = On/Off.

(and)

When to refresh the stored values: In the same column view, use โ‹ฏ > Refresh prompt matching to rescan the data and update the stored value list (for example, after new values are added or formats change).

So the 1,024-cap is a system limit on how many values Genie samples and stores per column, not a place where you supply or edit a custom list.

Hope that clarifies. 

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***