We are excited to announce that we are open sourcing Unity Catalog, the industry’s first open source catalog for data and AI governance across clouds, data formats, and data platforms. Here are the most important pillars of the Unity Catalog vision:
- Open source API and implementation: It is built on OpenAPI spec and an open source server implementation under Apache 2.0 license. It is also compatible with Apache Hive's metastore API and Apache Iceberg's REST catalog API.
- Multi-format support: It is extensible and supports Delta Lake, Apache Iceberg via UniForm, Apache Parquet, CSV, and all the formats out there.
- Multi-engine support: With its open APIs, data cataloged in Unity can be read by virtually all compute engines.
- Multimodal: It supports all your data and AI assets, including tables, files, functions, AI models.
- Vibrant ecosystem: This is a community effort and we are extremely excited to be supported by Amazon Web Services, Microsoft Azure, Google Cloud, Nvidia, Salesforce, DuckDB, LangChain, dbt Labs, Fivetran, Confluent, Unstructured, Onehouse, Immuta, Informatica and many more.
The project is available on GitHub today as the first step in our journey towards bringing the Unity vision into open source. Unity Catalog is hosted at LF AI & Data, an umbrella foundation of the Linux Foundation that supports open source innovation in artificial intelligence (AI) and data, where we are excited to work with the open source communities in the many years to come to realize this vision.
Continue to read more.