- 2218 Views
- 1 replies
- 1 kudos
Databricks Optimization Tips – What’s Your Secret?
When I first started working with Databricks, I was genuinely impressed by its potential. The seamless integration with Delta Lake, the power of PySpark, and the ability to process massive datasets at incredible speeds—it was truly impactful.Over tim...
- 2218 Views
- 1 replies
- 1 kudos
- 1 kudos
1. Try to remove cache() and persist() in the dataframe operations in the code base.2. Fully avoid driver operations like collect() and take() - the information from the executors are brought back to driver, which is highly network i/o overhead.3. Av...
- 1 kudos
- 863 Views
- 0 replies
- 0 kudos
Request for a guest post
Hi, I hope you're doing well. My name is Prasanna. C, Digital Marketing Strategist at Express Analytics, a company that understands consumer behavior and provides analytics solutions and services to businesses. Express Analytics primarily offers...
- 863 Views
- 0 replies
- 0 kudos
- 2150 Views
- 2 replies
- 1 kudos
Automatic Liquid Clustering and PO
I spent some time to understand how to use automatic liquid clustering with dlt pipelines. Hope this can help you as well.Enable Predictive Optimization Use this code:# Enabling Automatic Liquid Clustering on a new table @dlt.table(cluster_by_auto=Tr...
- 2150 Views
- 2 replies
- 1 kudos
- 1 kudos
Hi @Addy0_, thanks for sharing how to set it for existing table. Unfortunately, I think ALTER cannot be used with materialized view and streaming tables defined in dlt pipelines.I was looking for something similar to @dlt.table(cluster_by_auto=True, ...
- 1 kudos
- 1133 Views
- 0 replies
- 1 kudos
Databricks Data Classification
I encourage you to try out a new beta feature in Databricks called : Data Classification. It automatically classifies your catalog data and tag it with tags. Docs: https://docs.databricks.com/aws/en/lakehouse-monitoring/data-classification
- 1133 Views
- 0 replies
- 1 kudos
- 1069 Views
- 0 replies
- 1 kudos
Strong Databricks Fundamental - Gen Z
Why Databricks is the Future of Data Analytics for Gen ZIn the fast-paced world of data analytics, staying ahead of the curve is crucial. For Gen Z, who are digital natives and always on the lookout for the latest tech trends, understanding the diffe...
- 1069 Views
- 0 replies
- 1 kudos
- 4052 Views
- 1 replies
- 1 kudos
Post: Lakehouse Federation - Databricks
Lakehouse Federation - Databricks In the world of data, innovation is constant. And the most recent revolution comes with Lakehouse Federation, a fusion between data lakes and data warehouses, taking data manipulation to a new level. This advancement...
- 4052 Views
- 1 replies
- 1 kudos
- 1 kudos
Hey Quick Question, Can we use it for the production version ? We have application server as SQL server, we are planning to use lakehouse federation so we can bypass creating and maintaining 100 of workflows. as we a small dataset I am not too sure o...
- 1 kudos
- 1251 Views
- 0 replies
- 1 kudos
Hub Star Modeling 2.0 for Medalion Architecture
Excited to share my latest publication on arXiv!“Hub Star Modeling 2.0 for Medallion Architecture” https://arxiv.org/abs/2504.08788This new version builds on the original Hub Star Modeling approach, published last year, and now tailored for the Meda...
- 1251 Views
- 0 replies
- 1 kudos
- 4374 Views
- 1 replies
- 6 kudos
Handling Complex Nested JSON in Databricks Using schemaHints
When I first got into managing schemas in Databricks, it took me a while to realize that putting in a little planning up front could save me a ton of headaches later on.I was working with these deeply nested, constantly changing JSON files. At first,...
- 4374 Views
- 1 replies
- 6 kudos
- 6 kudos
Great tip @genevive_mdonça! schemaHints help avoid issues with evolving JSON data, making data processing more reliable and easier to maintain. Thanks for sharing.
- 6 kudos
- 3429 Views
- 1 replies
- 0 kudos
Understanding Coalesce, Skewed Joins, and Why AQE Doesn't Always Intervene
In Spark, data skew can be the silent killer of performance. One wide partition pulling in 90% of the data?But even with AQE (Adaptive Query Execution) turned on in Databricks, skewness isn't always automatically identified— and here’s why.What Is co...
- 3429 Views
- 1 replies
- 0 kudos
- 0 kudos
@mark_ott , this question seems right up your alley. Care to comment?
- 0 kudos
- 2691 Views
- 0 replies
- 1 kudos
One of the solution of [FAILED_READ_FILE.NO_HINT] Error while reading file, when display() or SELECT
One of the solution of [FAILED_READ_FILE.NO_HINT] Error while reading file, when display() or SELECTI got stuck with the above error when using `spark.read.table().display()` or directly query the table using %sql.While the display method is just one...
- 2691 Views
- 0 replies
- 1 kudos
- 2829 Views
- 0 replies
- 1 kudos
Power BI to Databricks Semantic Layer Generator (DAX → SQL/PySpark)
Hi everyone!I’ve just released an open-source tool that generates a semantic layer in Databricks notebooks from a Power BI dataset using the Power BI REST API. Im not an expert yet, but it gets job done and instead of using AtScale/dbt/or the PBI Sem...
- 2829 Views
- 0 replies
- 1 kudos
- 1011 Views
- 0 replies
- 0 kudos
How to train a Convolutional Neural Network on Databricks with Tensorflow and Keras
Here is how to trained a lightweight Convolutional Neuronal Network (CNN) to detect pneumonia from chest X-rays pictures on Azure Databricks. I promise no LLMs, no hype, just real-world deep learning:1. Built it with TensorFlow & Keras on Databricks2...
- 1011 Views
- 0 replies
- 0 kudos
- 2696 Views
- 0 replies
- 1 kudos
Real Lessons in Databricks Schema, Streaming, and Unity Catalog
Hey Databricks community,I wanted to take a moment to share some things I’ve learned while working with Databricks in real projects—especially around schema management, Unity Catalog, Autoloader, and streaming jobs. These are the kinds of small detai...
- 2696 Views
- 0 replies
- 1 kudos
- 1229 Views
- 0 replies
- 1 kudos
Inclusion of special characters while saving or downloading as a csv
Hi All, I have data which looks like this High Corona40% 50cl Pm £13.29 but when saving it as a csv it is getting converted into High Corona40% 50cl Pm £13.29 . wherever we have the euro sign . I thing to note here is while displaying the data i...
- 1229 Views
- 0 replies
- 1 kudos
- 2063 Views
- 0 replies
- 1 kudos
Use Query Patterns to Suggest Indexes Dynamically
Hey folks,Ever notice how a query that used to run super fast suddenly starts dragging? We’ve all been there. As data grows, those little inefficiencies in your SQL start showing up — and they show up hard. That’s where something cool comes in: using...
- 2063 Views
- 0 replies
- 1 kudos
-
Access Data
1 -
Access Delta Tables
1 -
ADF Linked Service
1 -
ADF Pipeline
1 -
Advanced Data Engineering
6 -
agent bricks
2 -
Agentic AI
3 -
AI
2 -
AI Agents
5 -
AI Readiness
1 -
AIBI
1 -
Analytics Engineering
1 -
Apache spark
3 -
Apache Spark 3.0
2 -
ApacheSpark
1 -
Architecture
2 -
Associate Certification
1 -
Audit
1 -
Auto-loader
1 -
Automation
1 -
AWSDatabricksCluster
2 -
Azure
3 -
Azure databricks
3 -
Azure Databricks Delta Table
1 -
Azure Databricks Job
2 -
Azure Delta Lake
3 -
Azure devops integration
1 -
Azure Unity Catalog
2 -
AzureDatabricks
2 -
BI
1 -
BI Integrations
1 -
Big data
1 -
Billing and Cost Management
2 -
Blog
1 -
Caching
2 -
CDC
3 -
CDF
1 -
CICD
2 -
CICDForDatabricksWorkflows
1 -
Cluster
1 -
Cluster Policies
1 -
Cluster Pools
1 -
Collect
1 -
Community Event
1 -
CommunityArticle
2 -
Cost Optimization Effort
2 -
CostOptimization
2 -
custom compute policy
1 -
CustomLibrary
1 -
DABs
1 -
DAIS 0206
3 -
DAIS 2026
2 -
Dashboards
2 -
Data
1 -
Data Analysis with Databricks
1 -
Data Architecture
2 -
Data Driven AI Roadmap
1 -
Data Engineering
12 -
Data Governance
3 -
Data Ingestion
2 -
Data Ingestion & connectivity
1 -
data layout
1 -
Data Mesh
1 -
data optimization
1 -
Data Processing
1 -
Data Quality
1 -
Data warehouse
1 -
databricks
2 -
Databricks App
1 -
Databricks Apps
1 -
Databricks Assistant
2 -
Databricks Community
1 -
Databricks Dashboard
2 -
Databricks Delta Table
2 -
Databricks Demo Center
1 -
databricks genie
1 -
Databricks Job
2 -
Databricks Lakeflow
3 -
Databricks Lakehouse
2 -
Databricks Migration
3 -
Databricks Mlflow
1 -
Databricks News
1 -
Databricks Notebooks
1 -
Databricks Pyspark
3 -
Databricks Serverless
1 -
Databricks Support
1 -
Databricks Training
1 -
Databricks Unity Catalog
3 -
Databricks Workflows
3 -
DatabricksAutomation
1 -
DatabricksML
1 -
DatabricksOptimization
1 -
DataEngineering
1 -
DBR Versions
1 -
Declartive Pipelines
1 -
DeepLearning
1 -
Delta Lake
10 -
Delta Live Table
2 -
Delta Live Tables
1 -
Delta Time Travel
1 -
DeltaLake
1 -
DevOps
2 -
DimensionTables
1 -
DLT
2 -
DLT Pipelines
3 -
DLT-Meta
1 -
Dns
1 -
Dynamic
1 -
ETL Pipelines
2 -
fastapi
1 -
Free Databricks
3 -
Free Edition
1 -
GenAI
1 -
GenAI agent
2 -
GenAI and LLMs
4 -
GenAIGeneration AI
2 -
Generation AI
1 -
Generative AI
1 -
Genie
3 -
Git
1 -
Google Bigquery
1 -
Google cloud
1 -
Governance
1 -
Governed Tag
1 -
hackathon
1 -
Hive metastore
1 -
Hubert Dudek
42 -
Hybrid Lakehouse
1 -
Kafka streaming
2 -
LakeBase
2 -
Lakeflow Pipelines
1 -
Lakehouse
2 -
Lakehouse Migration
1 -
Langchain
1 -
LangGraph
1 -
Lazy Evaluation
1 -
Learning
1 -
Library Installation
1 -
Lineage
2 -
LiquidClustering
1 -
Live Tables CDC
1 -
Llama
1 -
LLM
1 -
LLMs
1 -
Machine Learning
1 -
mcp
2 -
Medallion Architecture
3 -
MERGE Performance
2 -
Metadata
1 -
Metric Views
2 -
Microsoft Teams
1 -
Migrations
1 -
MSExcel
3 -
Multi-Table Transactions
1 -
Multiagent
3 -
Networking
2 -
New Features
1 -
NotMvpArticle
1 -
Optimize Command
1 -
Partitioning
2 -
Partner
1 -
Performance
2 -
Performance Tuning
3 -
PII
1 -
Powerbi
1 -
PredictiveOptimization
1 -
Private Link
1 -
Pyspark
5 -
Pyspark Code
1 -
Pyspark Databricks
1 -
Pytest
1 -
Python
1 -
Reading-excel
2 -
Row Level Security
1 -
SAP
2 -
Sap Hana Driver
1 -
Scala Code
1 -
Scd Type 2
1 -
Scripting
1 -
SDK
1 -
Security
1 -
Semantic Layer
1 -
Serverless
2 -
slack
1 -
Spark
5 -
Spark Caching
1 -
Spark Performance
1 -
SparkSQL
1 -
SQL
2 -
Sql Scripts
2 -
SQL Serverless
1 -
streamlit
1 -
Structured streaming
1 -
Students
2 -
Support Ticket
1 -
Sync
1 -
Training
1 -
Tutorial
3 -
UCSD
1 -
Unit Test
1 -
Unity Catalog
10 -
Unity Catlog
1 -
University Alliance
1 -
VACUUM Command
1 -
Variant
1 -
Warehousing
1 -
Workflow Jobs
1 -
Workflows
8 -
Zerobus
1 -
Zordering
1
- « Previous
- Next »
| User | Count |
|---|---|
| 85 | |
| 74 | |
| 59 | |
| 44 | |
| 44 |