cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

Gen AI governance and compliance

ShankarM
New Contributor III

I have few questions for which I am looking for answers in Databricks context. 

1. Plagiarism and Originality

   How do we address the issue of plagiarism, where similar code, solutions, or documentation may be generated for similar prompts globally, potentially leading to intellectual property concerns?

2. Data Governance and Compliance

   What measures do we have in place to ensure compliance with data governance regulations such as GDPR, HIPAA, and other relevant policies, to protect user data and maintain transparency in data handling practices?

3. Handling Sensitive Data

   How do we handle personally identifiable information (PII) and other sensitive data that may be shared through your platform, to prevent unauthorized access, misuse, or exposure?

Content Moderation and Bias

   What steps do we take to detect and prevent the generation of unwanted or inappropriate content, including geo-political or biased viewpoints, in the context of documentation and explanations provided by your LLMs?

5. Decentralization and Vendor Lock-in

    How do we mitigate the risk of centralized dependency on large language models (LLMs) and cloud providers, and what alternatives do you offer to ensure users have control over their data and models?

6. Secure Code Transmission and Sharing

    What security measures do you have in place to protect proprietary code and sensitive information when transmitted and shared over APIs or the internet to public versions of LLMs

Logging and Auditing              

    How do we log and store user prompts and responses, and what mechanisms do you have in place to enable auditing and scrutiny of user activity to detect ethical, unethical, or unlawful practices?

8. Sensitive Information Storage and Protection

    What safeguards do you have in place to prevent users from storing sensitive information as prompts within LLMs, and how do you ensure that such information is not inadvertently shared or exposed in the public domain?

9. Data Utilization and Ownership

    How do we address the concern that, according to the terms and conditions of public LLM usage, providers may store all prompts and responses for a duration and utilize them for training purposes, potentially compromising user data ownership and control?

10. Data Security and Control

     How do we ensure the security of the data or prompts and responses used for training, testing, and deployment of the trained/tuned model? Is the data in control within the account logged in

`11.Distilled/Child Model Updates and Pricing

      Once a distilled model is created in our account/project, during training or post-deployment, will the trained model update the global version? Or, post-deployment, do the APIs need to be re-built to point to the custom model? Are there any pricing implications for these custom LLMs for prompt/response  when called from an application?

12. Hosting Custom Models

     What is the cost impact of hosting this custom model post-training, testing, and deployment?

 

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @ShankarMLet’s address your questions related to Databricks. 😊

  1. Plagiarism and Originality: To mitigate plagiarism concerns, Databricks generates code, solutions, and documentation based on the specific prompts provided by users. While similar prompts may lead to similar content, the context and variations in input should result in unique outputs. However, it’s essential to review and validate generated content to ensure originality.

  2. Data Governance and Compliance: Databricks adheres to data governance regulations such as GDPR, HIPAA, and other relevant policies. Measures include access controls, encryption, and auditing. Users can configure workspace-level permissions and manage access to sensitive data.

  3. Handling Sensitive Data: Databricks provides features like access controls, encryption, and OAuth-based authentication. Users should follow best practices to handle personally identifiable information (PII) securely.

  4. Content Moderation and Bias: Databricks aims to prevent inappropriate content generation. While biases can exist, efforts are made to minimize them. Users can review and adjust generated content as needed.

  5. Decentralization and Vendor Lock-in: Databricks encourages decentralization by allowing users to create custom models. Users retain control over their data and models, reducing dependency on centralized providers.

  6. Secure Code Transmission and Sharing: Databricks ensures secure transmission by using HTTPS and OAuth tokens. Users should follow secure practices when sharing code or data.

  7. Logging and Auditing: Databricks logs user prompts and responses for auditing. Admins can monitor activity to detect ethical or unlawful practices.

  8. Sensitive Information Storage and Protection: Users should avoid storing sensitive information as prompts within LLMs. Databricks doesn’t intentionally expose such data.

  9. Data Utilization and Ownership: While Databricks may use prompts for model training, ownership and control remain with users. Review terms and conditions for clarity.

  10. Data Security and Control: Databricks ensures data security during training, testing, and deployment. Users have control within their logged-in accounts.

  11. Distilled/Child Model Updates and Pricing: Distilled models won’t automatically update the global version. Post-deployment, APIs may need reconfiguration. Pricing implications depend on usage.

  12. Hosting Custom Models: Custom model hosting costs vary based on resources used (e.g., compute, storage). Consider resource allocation and usage patterns.

Feel free to explore Databricks further, and if you have any more questions, I’m here to assist! 😊

 

ShankarM
New Contributor III

Thanks @Kaniz_Fatma 

I am looking for specific solutions on how this can be implemented in databricks for each of the above points. What tools/frameworks/functions can be used. I understand that it will depend on use cases but if you can take one example and guide it will help..

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!