cancel
Showing results for 
Search instead for 
Did you mean: 
Community Articles
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

SashankKotta
by Databricks Employee
  • 5361 Views
  • 8 replies
  • 6 kudos

Library Management via Custom Compute Policies and ADF Job Triggering

This guide is intended for those looking to install libraries on a cluster using a Custom Compute Policy and trigger Databricks jobs from an Azure Data Factory (ADF) linked service. While many users rely on init scripts for library installation, it i...

Screenshot 2024-06-16 at 12.34.09 PM.png Screenshot 2024-06-16 at 12.38.33 PM.png
  • 5361 Views
  • 8 replies
  • 6 kudos
Latest Reply
Wojciech_BUK
Valued Contributor III
  • 6 kudos

Hi @hassan2 I had same issue and found solution.When I created POOL i created it as On-demand (not spot) and then policy only worked when I removed  entire section "azure_attributes.spot_bid_max_price" from policy.Looks like "azure_attributes.spot_bi...

  • 6 kudos
7 More Replies
WarrenO
by New Contributor III
  • 3500 Views
  • 1 replies
  • 1 kudos

Resolved! Log Custom Transformer with Feature Engineering Client

Hi everyone,I'm building a Pyspark ML Pipeline where the first stage is to fill nulls with zero. I wrote a custom class to do this since I cannot find a Transformer that will do this imputation. I am able to log this pipeline using ML Flow log model ...

Community Articles
Custom Transformer
feature engineering
ML FLow
pipeline
pyspark
  • 3500 Views
  • 1 replies
  • 1 kudos
Latest Reply
koji_kawamura
Databricks Employee
  • 1 kudos

Hi @WarrenO , thanks for sharing that with the detailed code! I was able to reproduce the error, specifically the following error: AttributeError: module '__main__' has no attribute 'CustomAdder'File <command-1315887242804075>, line 3935 evaluator = ...

  • 1 kudos
himoshi
by New Contributor II
  • 5404 Views
  • 3 replies
  • 0 kudos

Error code 403 - Invalid access to Org

I am trying to make a GET /api/2.1/jobs/list call in a Notebook to get a list of all jobs in my workspace but am unable to do so due to a 403 "Invalid access to Org" error message. I am using a new PAT and the endpoint is correct. I also have workspa...

  • 5404 Views
  • 3 replies
  • 0 kudos
Latest Reply
xmad2772
New Contributor II
  • 0 kudos

Hey did you make any progress on the error? I'm experiencing the same in my environment. Thanks! 

  • 0 kudos
2 More Replies
yadvendra_ksh
by New Contributor II
  • 482 Views
  • 0 replies
  • 0 kudos

The Hidden Security Risks in Stored Procedure Migrations—What Databricks Exposed

Your stored procedure migration to DB isn't just a 'copy-paste' job - it's a security nightmare waiting to happen.We discovered our 'trusted' stored procedures had hidden access patterns that nearly compromised our entire data governance model. Here'...

  • 482 Views
  • 0 replies
  • 0 kudos
yadvendra_ksh
by New Contributor II
  • 889 Views
  • 0 replies
  • 1 kudos

The Hidden Pitfalls of Snowflake to Databricks Migrations

Everyone's rushing their Snowflake to Databricks migration, and they're setting themselves up for failure.After leading multiple enterprise migrations to Databricks last quarter, here's what shocked me: The technical lift isn't the hard part. It's th...

  • 889 Views
  • 0 replies
  • 1 kudos
Ajay-Pandey
by Esteemed Contributor III
  • 1723 Views
  • 1 replies
  • 1 kudos

📊 Simplifying CDC with Databricks Delta Live Tables & Snapshots 📊

In the world of data integration, synchronizing external relational databases (like Oracle, MySQL) with the Databricks platform can be complex, especially when Change Data Feed (CDF) streams aren’t available. Using snapshots is a powerful way to mana...

Pull-Based Snapshots.png
  • 1723 Views
  • 1 replies
  • 1 kudos
Latest Reply
BilalHaniff1
New Contributor II
  • 1 kudos

Hi AjayCan apply changes into snapshot handle re-processing of an older snapshot? UseCase:- Source has delivered data on day T, T1 and T2.  - Consumers realise there is an error on the day T data, and make a correction in the source. The source redel...

  • 1 kudos
ChsAIkrishna
by Contributor
  • 1333 Views
  • 1 replies
  • 4 kudos

Consideration Before Migrating Hive Tables to Unity Catalog

Databricks recommends four methods to migrate Hive tables to Unity Catalog, each with its pros and cons. The choice of method depends on specific requirements.SYNC: A SQL command that migrates schema or tables to Unity Catalog external tables. Howeve...

highresrollsafe piz.PNG
  • 1333 Views
  • 1 replies
  • 4 kudos
Latest Reply
Mantsama4
Valued Contributor
  • 4 kudos

This is a great solution! The post effectively outlines the methods for migrating Hive tables to Unity Catalog while emphasizing the importance of not just performing a simple migration but transforming the data architecture into something more robus...

  • 4 kudos
MichTalebzadeh
by Valued Contributor
  • 4558 Views
  • 3 replies
  • 3 kudos

Resolved! Feature Engineering for Data Engineers: Building Blocks for ML Success

For a  UK Government Agency, I made a Comprehensive presentation titled " Feature Engineering for Data Engineers: Building Blocks for ML Success".  I made an article of it in Linkedlin together with the relevant GitHub code. In summary the code delve...

Community Articles
feature engineering
ML
python
  • 4558 Views
  • 3 replies
  • 3 kudos
Latest Reply
Mantsama4
Valued Contributor
  • 3 kudos

This is a fantastic post! The detailed explanation of feature engineering, from handling missing values to using Variational Autoencoders (VAEs) for synthetic data generation, provides invaluable insights for improving machine learning models. The ap...

  • 3 kudos
2 More Replies
Harun
by Honored Contributor
  • 11123 Views
  • 3 replies
  • 6 kudos

Comprehensive Guide to Databricks Optimization: Z-Order, Data Compaction, and Liquid Clustering

Optimizing data storage and access is crucial for enhancing the performance of data processing systems. In Databricks, several optimization techniques can significantly improve query performance and reduce costs: Z-Order Optimize, Optimize Compaction...

  • 11123 Views
  • 3 replies
  • 6 kudos
Latest Reply
Mantsama4
Valued Contributor
  • 6 kudos

I also have the same question!

  • 6 kudos
2 More Replies
Mantsama4
by Valued Contributor
  • 4569 Views
  • 0 replies
  • 0 kudos

How can Databricks AI/BI Genie, RAG, & LLMs seamlessly coexist with MS Copilot to drive innovation?

The future of enterprise productivity and analytics lies in the seamless integration of advanced tools like Databricks Genie AI/BI, RAG & LLMs and Microsoft Copilot. While each serves distinct purposes, their coexistence can unlock unparalleled value...

  • 4569 Views
  • 0 replies
  • 0 kudos
Mantsama4
by Valued Contributor
  • 1164 Views
  • 0 replies
  • 1 kudos

How Databricks Empowers Scalable Data Products Through Medallion Mesh Architecture?

Unlock the Power of Your Data: Solving Fragmentation and Governance Challenges!In today’s fast-paced, data-driven enterprises, fragmented data and governance issues create roadblocks to decision-making and innovation. Traditional architectures strugg...

  • 1164 Views
  • 0 replies
  • 1 kudos
Mantsama4
by Valued Contributor
  • 548 Views
  • 0 replies
  • 0 kudos

Rebuilding and Re-Platforming Your Databricks Lakehouse with Serverless Compute

Dear Databricks Community,In today’s fast-paced data landscape, managing infrastructure manually can slow down innovation, increase costs, and limit scalability. Databricks Serverless Compute solves these challenges by eliminating infrastructure over...

  • 548 Views
  • 0 replies
  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels