I am trying to install the stanza library and try to create a udf function to create NER tags for my chunk_text in the dataframe.
Cluster Config: DBR 14.3 LTS SPARK 3.5.0 SCALA 2.12
below code:
def extract_entities(text๐
import stanza
nlp = stanza.Pipeline('en', processors='tokenize,ner', use_gpu=False)
doc = nlp(text)
entities = [(entity.text, entity.type) for sentence in doc.sentences for entity in sentence.ents]
return entities
# Register the UDF
entity_udf = udf(extract_entities, ArrayType(StructType([
StructField("text", StringType(), True),
StructField("type", StringType(), True)
])))
df=spark.sql("select * from datafabric_catalog.gen_ai.wiki limit 1")
df_with_entities = df.withColumn("entities", entity_udf(df["chunk_text"]))
it throws the following error:
from typing_extensions import Literal, Match, TypedDict ImportError: cannot import name 'Match' from 'typing_extensions' (/databricks/python3/lib/python3.10/site-packages/typing_extensions.py)