I have few questions for which I am looking for answers in Databricks context.
1. Plagiarism and Originality
How do we address the issue of plagiarism, where similar code, solutions, or documentation may be generated for similar prompts globally, potentially leading to intellectual property concerns?
2. Data Governance and Compliance
What measures do we have in place to ensure compliance with data governance regulations such as GDPR, HIPAA, and other relevant policies, to protect user data and maintain transparency in data handling practices?
3. Handling Sensitive Data
How do we handle personally identifiable information (PII) and other sensitive data that may be shared through your platform, to prevent unauthorized access, misuse, or exposure?
Content Moderation and Bias
What steps do we take to detect and prevent the generation of unwanted or inappropriate content, including geo-political or biased viewpoints, in the context of documentation and explanations provided by your LLMs?
5. Decentralization and Vendor Lock-in
How do we mitigate the risk of centralized dependency on large language models (LLMs) and cloud providers, and what alternatives do you offer to ensure users have control over their data and models?
6. Secure Code Transmission and Sharing
What security measures do you have in place to protect proprietary code and sensitive information when transmitted and shared over APIs or the internet to public versions of LLMs
Logging and Auditing
How do we log and store user prompts and responses, and what mechanisms do you have in place to enable auditing and scrutiny of user activity to detect ethical, unethical, or unlawful practices?
8. Sensitive Information Storage and Protection
What safeguards do you have in place to prevent users from storing sensitive information as prompts within LLMs, and how do you ensure that such information is not inadvertently shared or exposed in the public domain?
9. Data Utilization and Ownership
How do we address the concern that, according to the terms and conditions of public LLM usage, providers may store all prompts and responses for a duration and utilize them for training purposes, potentially compromising user data ownership and control?
10. Data Security and Control
How do we ensure the security of the data or prompts and responses used for training, testing, and deployment of the trained/tuned model? Is the data in control within the account logged in
`11.Distilled/Child Model Updates and Pricing
Once a distilled model is created in our account/project, during training or post-deployment, will the trained model update the global version? Or, post-deployment, do the APIs need to be re-built to point to the custom model? Are there any pricing implications for these custom LLMs for prompt/response when called from an application?
12. Hosting Custom Models
What is the cost impact of hosting this custom model post-training, testing, and deployment?