@DataBrickatorPerfect timing! I'm a heavy Databricks Notebooks user working on performance optimization and ML pipelines.
My daily workflows involve:
- EDA on 100M+ record Iceberg tables
- Real-time feature engineering with Spark Streaming
- Custom indexing strategies (zonemap + Bloom filters)
Key AI assistance challenges I face:
- AI tools suggesting pandas instead of Spark operations
- Lack of awareness about distributed computing costs
- Generic suggestions that ignore Databricks-specific optimizations
Currently documenting how AI assistance impacts my productivityโhave concrete examples where it saves 70% of time but requires manual optimization for scale.
Would love to discuss how AI could better understand Spark's execution model and suggest performance optimizations.
Filling out the form nowโexcited to contribute to making AI assistance more effective for data scientists!
Question: Will Unity Catalog integration be part of the discussion?