Knowledge Sharing Hub

by SumitSingh • Contributor

07-19-2024 8:25:47 AM

3593 Views
7 replies
11 kudos

From Associate to Professional: My Learning Plan to ace all Databricks Data Engineer Certifications

In today’s data-driven world, the role of a data engineer is critical in designing and maintaining the infrastructure that allows for the efficient collection, storage, and analysis of large volumes of data. Databricks certifications holds significan...

Knowledge Sharing Hub

Reply

3593 Views
7 replies
11 kudos

07-19-2024 8:25:47 AM

View Replies

Latest Reply

sandeepmankikar
New Contributor III

03-12-2025 8:32:21 PM

11 kudos

As an additional tip for those working towards both the Associate and Professional certifications, I recommend avoiding a long gap between the two exams to maintain your momentum. If possible, try to schedule them back-to-back with just a few days in...

11 kudos

03-12-2025 8:32:21 PM

6 More Replies

by Harun • Honored Contributor

06-08-2024 8:20:11 AM

5114 Views
2 replies
1 kudos

Optimizing Costs in Databricks by Dynamically Choosing Cluster Sizes

Databricks is a popular unified data analytics platform known for its powerful data processing capabilities and seamless integration with Apache Spark. However, managing and optimizing costs in Databricks can be challenging, especially when it comes ...

Knowledge Sharing Hub

Reply

5114 Views
2 replies
1 kudos

06-08-2024 8:20:11 AM

View Replies

Latest Reply

kmacgregor
New Contributor II

03-12-2025 8:21:44 AM

1 kudos

How can this actually be used to choose a cluster pool for a Databricks workflow dynamically, that is, at run time? In other words, what can you actually do with the value of `selected_pool` other than printing it out?

1 kudos

03-12-2025 8:21:44 AM

1 More Replies

by yadvendra_ksh • New Contributor II

03-10-2025 10:58:58 PM

341 Views
0 replies
1 kudos

Migrating from MySQL to Databricks: Real-time Insights That Matter

We successfully migrated a client’s MySQL databases to DB using a dual-approach that maintained 100% data integrity while enabling real-time analytics.After struggling with batch-based updates and analytics delays, we implemented:- One-time historica...

Knowledge Sharing Hub

Reply

341 Views
0 replies
1 kudos

03-10-2025 10:58:58 PM

by SashankKotta • Databricks Employee

06-16-2024 12:35:28 AM

3953 Views
8 replies
6 kudos

Library Management via Custom Compute Policies and ADF Job Triggering

This guide is intended for those looking to install libraries on a cluster using a Custom Compute Policy and trigger Databricks jobs from an Azure Data Factory (ADF) linked service. While many users rely on init scripts for library installation, it i...

Screenshot 2024-06-16 at 12.34.09 PM.png

Screenshot 2024-06-16 at 12.38.33 PM.png

Knowledge Sharing Hub

Reply

3953 Views
8 replies
6 kudos

06-16-2024 12:35:28 AM

View Replies

Latest Reply

Wojciech_BUK
Valued Contributor III

10-14-2024 4:31:45 AM

6 kudos

Hi @hassan2 I had same issue and found solution.When I created POOL i created it as On-demand (not spot) and then policy only worked when I removed entire section "azure_attributes.spot_bid_max_price" from policy.Looks like "azure_attributes.spot_bi...

6 kudos

10-14-2024 4:31:45 AM

7 More Replies

by WarrenO • New Contributor III

03-06-2025 12:34:55 PM

2660 Views
1 replies
1 kudos

Resolved! Log Custom Transformer with Feature Engineering Client

Hi everyone,I'm building a Pyspark ML Pipeline where the first stage is to fill nulls with zero. I wrote a custom class to do this since I cannot find a Transformer that will do this imputation. I am able to log this pipeline using ML Flow log model ...

Knowledge Sharing Hub

Custom Transformer

feature engineering

ML FLow

pipeline

pyspark

Reply

2660 Views
1 replies
1 kudos

03-06-2025 12:34:55 PM

View Replies

Latest Reply

koji_kawamura
Databricks Employee

03-07-2025 12:19:23 AM

1 kudos

Hi @WarrenO , thanks for sharing that with the detailed code! I was able to reproduce the error, specifically the following error: AttributeError: module '__main__' has no attribute 'CustomAdder'File <command-1315887242804075>, line 3935 evaluator = ...

1 kudos

03-07-2025 12:19:23 AM

by himoshi • New Contributor II

07-08-2024 4:38:35 PM

4272 Views
3 replies
0 kudos

Error code 403 - Invalid access to Org

I am trying to make a GET /api/2.1/jobs/list call in a Notebook to get a list of all jobs in my workspace but am unable to do so due to a 403 "Invalid access to Org" error message. I am using a new PAT and the endpoint is correct. I also have workspa...

Knowledge Sharing Hub

Reply

4272 Views
3 replies
0 kudos

07-08-2024 4:38:35 PM

View Replies

Latest Reply

xmad2772
New Contributor II

03-03-2025 11:52:02 PM

0 kudos

Hey did you make any progress on the error? I'm experiencing the same in my environment. Thanks!

0 kudos

03-03-2025 11:52:02 PM

2 More Replies

by yadvendra_ksh • New Contributor II

02-28-2025 4:51:16 AM

242 Views
0 replies
0 kudos

The Hidden Security Risks in Stored Procedure Migrations—What Databricks Exposed

Your stored procedure migration to DB isn't just a 'copy-paste' job - it's a security nightmare waiting to happen.We discovered our 'trusted' stored procedures had hidden access patterns that nearly compromised our entire data governance model. Here'...

Knowledge Sharing Hub

Reply

242 Views
0 replies
0 kudos

02-28-2025 4:51:16 AM

by yadvendra_ksh • New Contributor II

02-18-2025 12:54:15 AM

318 Views
0 replies
1 kudos

The Hidden Pitfalls of Snowflake to Databricks Migrations

Everyone's rushing their Snowflake to Databricks migration, and they're setting themselves up for failure.After leading multiple enterprise migrations to Databricks last quarter, here's what shocked me: The technical lift isn't the hard part. It's th...

Knowledge Sharing Hub

Reply

318 Views
0 replies
1 kudos

02-18-2025 12:54:15 AM

by Ajay-Pandey • Esteemed Contributor III

09-11-2024 10:37:19 PM

1167 Views
1 replies
1 kudos

📊 Simplifying CDC with Databricks Delta Live Tables & Snapshots 📊

In the world of data integration, synchronizing external relational databases (like Oracle, MySQL) with the Databricks platform can be complex, especially when Change Data Feed (CDF) streams aren’t available. Using snapshots is a powerful way to mana...

Knowledge Sharing Hub

Reply

1167 Views
1 replies
1 kudos

09-11-2024 10:37:19 PM

View Replies

Latest Reply

BilalHaniff1
New Contributor II

02-07-2025 6:12:45 AM

1 kudos

Hi AjayCan apply changes into snapshot handle re-processing of an older snapshot? UseCase:- Source has delivered data on day T, T1 and T2. - Consumers realise there is an error on the day T data, and make a correction in the source. The source redel...

1 kudos

02-07-2025 6:12:45 AM

by ChsAIkrishna • Contributor

11-27-2024 7:09:52 AM

879 Views
1 replies
4 kudos

Consideration Before Migrating Hive Tables to Unity Catalog

Databricks recommends four methods to migrate Hive tables to Unity Catalog, each with its pros and cons. The choice of method depends on specific requirements.SYNC: A SQL command that migrates schema or tables to Unity Catalog external tables. Howeve...

Knowledge Sharing Hub

Reply

879 Views
1 replies
4 kudos

11-27-2024 7:09:52 AM

View Replies

Latest Reply

Mantsama4
Contributor III

02-06-2025 7:51:29 PM

4 kudos

This is a great solution! The post effectively outlines the methods for migrating Hive tables to Unity Catalog while emphasizing the importance of not just performing a simple migration but transforming the data architecture into something more robus...

4 kudos

02-06-2025 7:51:29 PM

by MichTalebzadeh • Valued Contributor

08-03-2024 11:13:14 AM

2881 Views
3 replies
3 kudos

Resolved! Feature Engineering for Data Engineers: Building Blocks for ML Success

For a UK Government Agency, I made a Comprehensive presentation titled " Feature Engineering for Data Engineers: Building Blocks for ML Success". I made an article of it in Linkedlin together with the relevant GitHub code. In summary the code delve...

Knowledge Sharing Hub

feature engineering

ML

python

Reply

2881 Views
3 replies
3 kudos

08-03-2024 11:13:14 AM

View Replies

Latest Reply

Mantsama4
Contributor III

02-06-2025 7:48:43 PM

3 kudos

This is a fantastic post! The detailed explanation of feature engineering, from handling missing values to using Variational Autoencoders (VAEs) for synthetic data generation, provides invaluable insights for improving machine learning models. The ap...

3 kudos

02-06-2025 7:48:43 PM

2 More Replies

by Harun • Honored Contributor

06-25-2024 2:19:19 AM

6818 Views
3 replies
5 kudos

Comprehensive Guide to Databricks Optimization: Z-Order, Data Compaction, and Liquid Clustering

Optimizing data storage and access is crucial for enhancing the performance of data processing systems. In Databricks, several optimization techniques can significantly improve query performance and reduce costs: Z-Order Optimize, Optimize Compaction...

Knowledge Sharing Hub

Reply

6818 Views
3 replies
5 kudos

06-25-2024 2:19:19 AM

View Replies

Latest Reply

Mantsama4
Contributor III

02-06-2025 7:39:44 PM

5 kudos

I also have the same question!

5 kudos

02-06-2025 7:39:44 PM

2 More Replies

by Mantsama4 • Contributor III

02-05-2025 7:39:25 PM

659 Views
0 replies
0 kudos

How can Databricks AI/BI Genie, RAG, & LLMs seamlessly coexist with MS Copilot to drive innovation?

The future of enterprise productivity and analytics lies in the seamless integration of advanced tools like Databricks Genie AI/BI, RAG & LLMs and Microsoft Copilot. While each serves distinct purposes, their coexistence can unlock unparalleled value...

Knowledge Sharing Hub

Reply

659 Views
0 replies
0 kudos

02-05-2025 7:39:25 PM

by Mantsama4 • Contributor III

02-05-2025 7:08:40 AM

409 Views
0 replies
1 kudos

How Databricks Empowers Scalable Data Products Through Medallion Mesh Architecture?

Unlock the Power of Your Data: Solving Fragmentation and Governance Challenges!In today’s fast-paced, data-driven enterprises, fragmented data and governance issues create roadblocks to decision-making and innovation. Traditional architectures strugg...

Knowledge Sharing Hub

Reply

409 Views
0 replies
1 kudos

02-05-2025 7:08:40 AM

by Mantsama4 • Contributor III

02-02-2025 10:28:16 PM

317 Views
0 replies
0 kudos

Rebuilding and Re-Platforming Your Databricks Lakehouse with Serverless Compute

Dear Databricks Community,In today’s fast-paced data landscape, managing infrastructure manually can slow down innovation, increase costs, and limit scalability. Databricks Serverless Compute solves these challenges by eliminating infrastructure over...

Knowledge Sharing Hub

Reply

317 Views
0 replies
0 kudos

02-02-2025 10:28:16 PM

by hari-prasad • Valued Contributor II

01-10-2025 11:24:29 AM

1359 Views
0 replies
3 kudos

Mapping Compliance Standards to Industries: A Comprehensive Guide

Brief Guideline: Mapping Compliance Standards to IndustriesThis guide provides a detailed mapping of various compliance standards to their respective industries, highlighting the specific sectors and descriptions for each standard. Understanding thes...

Knowledge Sharing Hub

Reply

1359 Views
0 replies
3 kudos

01-10-2025 11:24:29 AM

Databricks Community

Forum Posts

From Associate to Professional: My Learning Plan to ace all Databricks Data Engineer Certifications

Optimizing Costs in Databricks by Dynamically Choosing Cluster Sizes

Migrating from MySQL to Databricks: Real-time Insights That Matter

Library Management via Custom Compute Policies and ADF Job Triggering

Resolved! Log Custom Transformer with Feature Engineering Client

Error code 403 - Invalid access to Org

The Hidden Security Risks in Stored Procedure Migrations—What Databricks Exposed

The Hidden Pitfalls of Snowflake to Databricks Migrations

📊 Simplifying CDC with Databricks Delta Live Tables & Snapshots 📊

Consideration Before Migrating Hive Tables to Unity Catalog

Resolved! Feature Engineering for Data Engineers: Building Blocks for ML Success

Comprehensive Guide to Databricks Optimization: Z-Order, Data Compaction, and Liquid Clustering

How can Databricks AI/BI Genie, RAG, & LLMs seamlessly coexist with MS Copilot to drive innovation?

How Databricks Empowers Scalable Data Products Through Medallion Mesh Architecture?

Rebuilding and Re-Platforming Your Databricks Lakehouse with Serverless Compute

Mapping Compliance Standards to Industries: A Comprehensive Guide

Join Us as a Local Community Builder!

Log Custom Transformer with Feature Engineering Cl...

Want to learn LakeFlow Pipelines in community edit...

Standardized Framework to update Databricks job de...

Feature Engineering for Data Engineers: Building B...

Timeout handling with JDBC connection to SQL Wareh...