Databricks Community

ShankarM · ‎07-10-2024

I have few questions for which I am looking for answers in Databricks context.

1. Plagiarism and Originality

How do we address the issue of plagiarism, where similar code, solutions, or documentation may be generated for similar prompts globally, potentially leading to intellectual property concerns?

2. Data Governance and Compliance

What measures do we have in place to ensure compliance with data governance regulations such as GDPR, HIPAA, and other relevant policies, to protect user data and maintain transparency in data handling practices?

3. Handling Sensitive Data

How do we handle personally identifiable information (PII) and other sensitive data that may be shared through your platform, to prevent unauthorized access, misuse, or exposure?

Content Moderation and Bias

What steps do we take to detect and prevent the generation of unwanted or inappropriate content, including geo-political or biased viewpoints, in the context of documentation and explanations provided by your LLMs?

5. Decentralization and Vendor Lock-in

How do we mitigate the risk of centralized dependency on large language models (LLMs) and cloud providers, and what alternatives do you offer to ensure users have control over their data and models?

6. Secure Code Transmission and Sharing

What security measures do you have in place to protect proprietary code and sensitive information when transmitted and shared over APIs or the internet to public versions of LLMs

Logging and Auditing

How do we log and store user prompts and responses, and what mechanisms do you have in place to enable auditing and scrutiny of user activity to detect ethical, unethical, or unlawful practices?

8. Sensitive Information Storage and Protection

What safeguards do you have in place to prevent users from storing sensitive information as prompts within LLMs, and how do you ensure that such information is not inadvertently shared or exposed in the public domain?

9. Data Utilization and Ownership

How do we address the concern that, according to the terms and conditions of public LLM usage, providers may store all prompts and responses for a duration and utilize them for training purposes, potentially compromising user data ownership and control?

10. Data Security and Control

How do we ensure the security of the data or prompts and responses used for training, testing, and deployment of the trained/tuned model? Is the data in control within the account logged in

`11.Distilled/Child Model Updates and Pricing

Once a distilled model is created in our account/project, during training or post-deployment, will the trained model update the global version? Or, post-deployment, do the APIs need to be re-built to point to the custom model? Are there any pricing implications for these custom LLMs for prompt/response when called from an application?

12. Hosting Custom Models

What is the cost impact of hosting this custom model post-training, testing, and deployment?

ShankarM · ‎07-14-2024

Thanks @Retired_mod

I am looking for specific solutions on how this can be implemented in databricks for each of the above points. What tools/frameworks/functions can be used. I understand that it will depend on use cases but if you can take one example and guide it will help..

Databricks Community

Gen AI governance and compliance

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!