Hey @prasuanu1222, For 10,000 daily users and 100+ requests per minute, plain iframe embedding is probably not enough by itself.Databricks has workspace level throughput limits for Genie. For iframe/UI access, the documented limit is around 20 questi...
Yes, we can build a continuous streaming pipeline using open source Spark. The main thing is to use Spark Structured Streaming, not a normal batch read. For Kafka streaming, we need to use spark.readStream, then write using writeStream, and keep the ...
Genie Spaces do not expose a single fixed model choice for standard usage it uses a compound AI system.For cost, I would split it into two areas:The first one is the SQL warehouse cost. Genie still has to run SQL against the warehouse attached to the...
Yes I think the delay is likely coming from file discovery rather than reading the Excel files.Even if only 10 files match in dev, Databricks still has to find them first. With "docs/ABC*/files/ABC*.xlsm", it can end up scanning a big chunk of the Sh...
I think a hash based approach is worth trying here. Since you are deduping on 20 columns, distinct() has to shuffle and compare all those columns, which can become expensive at this scale.With a hash column, Spark still has to shuffle, but the compar...