- 4866 Views
- 1 replies
- 1 kudos
Resolved! Log Custom Transformer with Feature Engineering Client
Hi everyone,I'm building a Pyspark ML Pipeline where the first stage is to fill nulls with zero. I wrote a custom class to do this since I cannot find a Transformer that will do this imputation. I am able to log this pipeline using ML Flow log model ...
- 4866 Views
- 1 replies
- 1 kudos
- 1 kudos
Hi @WarrenO , thanks for sharing that with the detailed code! I was able to reproduce the error, specifically the following error: AttributeError: module '__main__' has no attribute 'CustomAdder'File <command-1315887242804075>, line 3935 evaluator = ...
- 1 kudos
- 7417 Views
- 3 replies
- 0 kudos
Error code 403 - Invalid access to Org
I am trying to make a GET /api/2.1/jobs/list call in a Notebook to get a list of all jobs in my workspace but am unable to do so due to a 403 "Invalid access to Org" error message. I am using a new PAT and the endpoint is correct. I also have workspa...
- 7417 Views
- 3 replies
- 0 kudos
- 0 kudos
Hey did you make any progress on the error? I'm experiencing the same in my environment. Thanks!
- 0 kudos
- 1021 Views
- 0 replies
- 0 kudos
The Hidden Security Risks in Stored Procedure Migrations—What Databricks Exposed
Your stored procedure migration to DB isn't just a 'copy-paste' job - it's a security nightmare waiting to happen.We discovered our 'trusted' stored procedures had hidden access patterns that nearly compromised our entire data governance model. Here'...
- 1021 Views
- 0 replies
- 0 kudos
- 2900 Views
- 0 replies
- 1 kudos
The Hidden Pitfalls of Snowflake to Databricks Migrations
Everyone's rushing their Snowflake to Databricks migration, and they're setting themselves up for failure.After leading multiple enterprise migrations to Databricks last quarter, here's what shocked me: The technical lift isn't the hard part. It's th...
- 2900 Views
- 0 replies
- 1 kudos
- 3433 Views
- 1 replies
- 1 kudos
📊 Simplifying CDC with Databricks Delta Live Tables & Snapshots 📊
In the world of data integration, synchronizing external relational databases (like Oracle, MySQL) with the Databricks platform can be complex, especially when Change Data Feed (CDF) streams aren’t available. Using snapshots is a powerful way to mana...
- 3433 Views
- 1 replies
- 1 kudos
- 1 kudos
Hi AjayCan apply changes into snapshot handle re-processing of an older snapshot? UseCase:- Source has delivered data on day T, T1 and T2. - Consumers realise there is an error on the day T data, and make a correction in the source. The source redel...
- 1 kudos
- 2418 Views
- 1 replies
- 4 kudos
Consideration Before Migrating Hive Tables to Unity Catalog
Databricks recommends four methods to migrate Hive tables to Unity Catalog, each with its pros and cons. The choice of method depends on specific requirements.SYNC: A SQL command that migrates schema or tables to Unity Catalog external tables. Howeve...
- 2418 Views
- 1 replies
- 4 kudos
- 4 kudos
This is a great solution! The post effectively outlines the methods for migrating Hive tables to Unity Catalog while emphasizing the importance of not just performing a simple migration but transforming the data architecture into something more robus...
- 4 kudos
- 7089 Views
- 3 replies
- 3 kudos
Resolved! Feature Engineering for Data Engineers: Building Blocks for ML Success
For a UK Government Agency, I made a Comprehensive presentation titled " Feature Engineering for Data Engineers: Building Blocks for ML Success". I made an article of it in Linkedlin together with the relevant GitHub code. In summary the code delve...
- 7089 Views
- 3 replies
- 3 kudos
- 3 kudos
This is a fantastic post! The detailed explanation of feature engineering, from handling missing values to using Variational Autoencoders (VAEs) for synthetic data generation, provides invaluable insights for improving machine learning models. The ap...
- 3 kudos
- 26669 Views
- 3 replies
- 7 kudos
Comprehensive Guide to Databricks Optimization: Z-Order, Data Compaction, and Liquid Clustering
Optimizing data storage and access is crucial for enhancing the performance of data processing systems. In Databricks, several optimization techniques can significantly improve query performance and reduce costs: Z-Order Optimize, Optimize Compaction...
- 26669 Views
- 3 replies
- 7 kudos
- 6740 Views
- 0 replies
- 0 kudos
How can Databricks AI/BI Genie, RAG, & LLMs seamlessly coexist with MS Copilot to drive innovation?
The future of enterprise productivity and analytics lies in the seamless integration of advanced tools like Databricks Genie AI/BI, RAG & LLMs and Microsoft Copilot. While each serves distinct purposes, their coexistence can unlock unparalleled value...
- 6740 Views
- 0 replies
- 0 kudos
- 4095 Views
- 0 replies
- 1 kudos
How Databricks Empowers Scalable Data Products Through Medallion Mesh Architecture?
Unlock the Power of Your Data: Solving Fragmentation and Governance Challenges!In today’s fast-paced, data-driven enterprises, fragmented data and governance issues create roadblocks to decision-making and innovation. Traditional architectures strugg...
- 4095 Views
- 0 replies
- 1 kudos
- 981 Views
- 0 replies
- 0 kudos
Rebuilding and Re-Platforming Your Databricks Lakehouse with Serverless Compute
Dear Databricks Community,In today’s fast-paced data landscape, managing infrastructure manually can slow down innovation, increase costs, and limit scalability. Databricks Serverless Compute solves these challenges by eliminating infrastructure over...
- 981 Views
- 0 replies
- 0 kudos
- 4734 Views
- 0 replies
- 3 kudos
Mapping Compliance Standards to Industries: A Comprehensive Guide
Brief Guideline: Mapping Compliance Standards to IndustriesThis guide provides a detailed mapping of various compliance standards to their respective industries, highlighting the specific sectors and descriptions for each standard. Understanding thes...
- 4734 Views
- 0 replies
- 3 kudos
- 4632 Views
- 3 replies
- 0 kudos
Getting data from Databricks into Excel using Databricks Jobs API
If you have your data in Databricks, but like to analyse it in Excel, you can use Web API on Power Query. It allows you to not just query an existing table, but also trigger the execution of a PySpark notebook using Databricks Jobs API, and get the d...
- 4632 Views
- 3 replies
- 0 kudos
- 0 kudos
Got it, yes you have specified the same in your message. Thanks for sharing.
- 0 kudos
- 3577 Views
- 1 replies
- 3 kudos
How to Grant Workspace Admin Permissions to an ID Using Parent Groups
Hello,There are several ways to grant Workspace Admin permissions in Databricks. While this may seem straightforward, I found it a bit confusing when I started using Databricks, so I’d like to share my experience. This guide is aimed at beginners.How...
- 3577 Views
- 1 replies
- 3 kudos
- 907 Views
- 0 replies
- 0 kudos
Learn Data Engineering on Databricks step by step
For new aspiring Data Engineers, it has always been difficult to start their learning. With decade of experience in Data Engineering now I have put together a series of article that can help new aspirants. The list is small attempt to help new Data E...
- 907 Views
- 0 replies
- 0 kudos
-
Access Data
1 -
Access Delta Tables
1 -
ADF Linked Service
1 -
ADF Pipeline
1 -
Advanced Data Engineering
6 -
agent bricks
2 -
Agentic AI
3 -
AI
2 -
AI Agents
5 -
AI Readiness
1 -
AIBI
1 -
Analytics Engineering
1 -
Apache spark
3 -
Apache Spark 3.0
2 -
ApacheSpark
1 -
Architecture
3 -
Associate Certification
2 -
Audit
1 -
Auto-loader
1 -
Automation
1 -
AWSDatabricksCluster
2 -
Azure
3 -
Azure databricks
3 -
Azure Databricks Delta Table
1 -
Azure Databricks Job
2 -
Azure Delta Lake
3 -
Azure devops integration
1 -
Azure Unity Catalog
2 -
AzureDatabricks
2 -
BI
1 -
BI Integrations
1 -
Big data
1 -
Billing and Cost Management
2 -
Blog
1 -
Caching
2 -
CDC
3 -
CDF
1 -
Certification
1 -
Certification Badge
1 -
Certification Exam
1 -
CICD
2 -
CICDForDatabricksWorkflows
1 -
Cluster
1 -
Cluster Policies
1 -
Cluster Pools
1 -
Collect
1 -
Community Event
1 -
CommunityArticle
2 -
Cost Optimization Effort
2 -
CostOptimization
2 -
custom compute policy
1 -
CustomLibrary
1 -
DABs
1 -
DAIS 0206
3 -
DAIS 2026
2 -
Dashboards
2 -
Data
1 -
Data Analysis with Databricks
1 -
Data Architecture
2 -
Data Driven AI Roadmap
1 -
Data Engineering
13 -
Data Governance
4 -
Data Ingestion
2 -
Data Ingestion & connectivity
1 -
data layout
1 -
Data Mesh
1 -
data optimization
1 -
Data Processing
1 -
Data Quality
1 -
Data warehouse
1 -
databricks
2 -
Databricks App
1 -
Databricks Apps
1 -
Databricks Assistant
2 -
Databricks Certified
1 -
Databricks Community
1 -
Databricks Dashboard
2 -
Databricks Delta Table
2 -
Databricks Demo Center
1 -
Databricks genAI associate
1 -
databricks genie
1 -
Databricks Job
2 -
Databricks Lakeflow
3 -
Databricks Lakehouse
2 -
Databricks Migration
3 -
Databricks Mlflow
1 -
Databricks News
1 -
Databricks Notebooks
1 -
Databricks Pyspark
3 -
Databricks Serverless
1 -
Databricks Support
1 -
Databricks Training
1 -
Databricks Unity Catalog
3 -
Databricks Workflows
3 -
DatabricksAutomation
1 -
DatabricksML
1 -
DatabricksOptimization
1 -
DataEngineering
1 -
DBR Versions
1 -
Declartive Pipelines
1 -
DeepLearning
1 -
Delta Lake
10 -
Delta Lake Files
1 -
Delta Live Table
2 -
Delta Live Tables
1 -
Delta Time Travel
1 -
Delta-lake
1 -
DeltaLake
1 -
DevOps
2 -
DimensionTables
1 -
DLT
2 -
DLT Pipelines
3 -
DLT-Meta
1 -
Dns
1 -
Dynamic
1 -
ETL Pipelines
2 -
fastapi
1 -
Free Databricks
3 -
Free Edition
1 -
GenAI
1 -
GenAI agent
2 -
GenAI and LLMs
4 -
GenAIGeneration AI
2 -
Generation AI
1 -
Generative AI
2 -
Generative AI Engineer
1 -
Genie
3 -
Git
1 -
Google Bigquery
1 -
Google cloud
1 -
Governance
1 -
Governed Tag
1 -
hackathon
1 -
Hive metastore
1 -
Hubert Dudek
42 -
Hybrid Lakehouse
1 -
Kafka streaming
2 -
LakeBase
2 -
Lakeflow Pipelines
1 -
Lakehouse
2 -
Lakehouse Migration
1 -
Langchain
1 -
LangGraph
1 -
Lazy Evaluation
1 -
Learning
1 -
Library Installation
1 -
Lineage
2 -
LiquidClustering
2 -
Live Tables CDC
1 -
Llama
1 -
LLM
1 -
LLMs
1 -
Machine Learning
1 -
mcp
2 -
Medallion Architecture
3 -
MERGE Performance
2 -
Metadata
1 -
Metric Views
2 -
Microsoft Teams
1 -
Migrations
1 -
MSExcel
3 -
Multi-Table Transactions
1 -
Multiagent
3 -
Networking
2 -
New Features
1 -
NotMvpArticle
1 -
Optimize Command
1 -
Partitioning
3 -
Partner
1 -
Performance
2 -
Performance Tuning
3 -
PII
1 -
Powerbi
1 -
PredictiveOptimization
1 -
Private Link
1 -
Pyspark
5 -
Pyspark Code
1 -
Pyspark Databricks
1 -
Pytest
1 -
Python
1 -
Reading-excel
2 -
Row Level Security
1 -
SAP
2 -
Sap Hana Driver
1 -
Scala Code
1 -
Scd Type 2
1 -
Scripting
1 -
SDK
1 -
Security
1 -
Semantic Layer
1 -
Serverless
2 -
slack
1 -
Spark
5 -
Spark Caching
1 -
Spark Performance
1 -
SparkSQL
1 -
SQL
2 -
Sql Scripts
2 -
SQL Serverless
1 -
streamlit
1 -
Structured streaming
1 -
Students
2 -
Support Ticket
1 -
Sync
1 -
Training
1 -
Tutorial
3 -
UCSD
1 -
Unit Test
1 -
Unity Catalog
11 -
Unity Catlog
1 -
University Alliance
1 -
VACUUM Command
1 -
Variant
1 -
Warehousing
1 -
Workflow Jobs
1 -
Workflows
8 -
Zerobus
1 -
Zordering
1
- « Previous
- Next »
| User | Count |
|---|---|
| 85 | |
| 75 | |
| 61 | |
| 57 | |
| 44 |