cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 
Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Governing the Full Data Lifecycle: From Ingestion to Insight with Unity Catalog Date: Saturday, December 13 Time: 12:00 PM – 3:00 PM Location: Bengaluru, Karnataka (Register to see address) Registration: Approval required Register Here! About Thi...

  • 191 Views
  • 1 replies
  • 2 kudos
Wednesday
Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

BrickTalk: Vibe-Coding Databricks Apps in Replit Discover how to vibe-code Databricks Apps in Replit and accelerate your development workflow. Follow along as we demonstrate techniques for going from concept to demo in record time.  Join us for a han...

  • 1122 Views
  • 8 replies
  • 4 kudos
2 weeks ago
Celebrating Our First Brickster Champion: Louis Frolio

Our Champion program has always celebrated the customers who go above and beyond to engage, help others, and uplift the Community. Recently, we have seen remarkable participation from Bricksters as well—and their impact deserves recognition too. Begi...

  • 684 Views
  • 6 replies
  • 12 kudos
2 weeks ago
Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

As data volume and complexity increase, engineers are left figuring out how to manage, monitor and maintain fragile pipelines while also handling fragmented tools. The Big Book of Data Engineering equips you with cutting-edge methods for building pip...

  • 1012 Views
  • 5 replies
  • 9 kudos
3 weeks ago
Level Up with Databricks Specialist Sessions

How to Register & Prepare If you're interested in advancing your skills with Databricks through a Specialist Session, here's a clear guide on how to register and what free courses you can take to prepare effectively. How to Begin Your Learning Path S...

  • 2645 Views
  • 2 replies
  • 8 kudos
10-02-2025
⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Hello Community, Let me start off with a quick question: Have you ever... Migrated your workloads from on-prem Spark to Databricks and encountered a bug and thought, “I wish I could repro this locally to debug the issue without burning cluster hours...

  • 1848 Views
  • 12 replies
  • 17 kudos
3 weeks ago
🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

A warm and vibrant week in the Databricks Community! Even with the Thanksgiving celebrations in full swing, the Community kept the spark alive – sharing insights, solving tricky questions, and dropping some solid technical gems.Here’s your weekly rou...

  • 360 Views
  • 6 replies
  • 6 kudos
Friday

Community Activity

MandyR
by Databricks Employee
  • 83 Views
  • 2 replies
  • 5 kudos

Recording! BrickTalks: Vibe Coding Databricks Apps in Replit

In this BrickTalk, Databricks Solutions Engineer Augusto Carneiro demonstrates how to vibe-code Databricks Apps directly in Replit, moving from concept to working demo and showing how to troubleshoot along the way. Watch the full walkthrough, learn t...

  • 83 Views
  • 2 replies
  • 5 kudos
Latest Reply
Nidhig
Contributor
  • 5 kudos

@MandyR No, notifications for this training this time as well

  • 5 kudos
1 More Replies
steveKris
by > Visitor
  • 23 Views
  • 2 replies
  • 2 kudos

Extract all users from Databricks Groups

Hey everyone,we are trying to get an overview of all users that we have in our databricks groups. We have tried to do so with the REST API as well as the SQL-queries (with normal developer accounts as well as workspace administrator accounts). The pr...

  • 23 Views
  • 2 replies
  • 2 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 2 kudos

UI shows all provisioned users, but REST/SQL only expose subsets depending on whether you query account vs workspace vs UC. To get a true overview, you need to combine account SCIM API + workspace SCIM API + UC system tables.

  • 2 kudos
1 More Replies
abetogi
by > New Contributor III
  • 1566 Views
  • 2 replies
  • 0 kudos

AI

At Chevron we actively use Databricks to provide answers to business users. It was extremely interesting to see the use LakeHouseIQ initiatives as it can expedite how fast our users can receive their answers/reports. Is there any documentation that I...

  • 1566 Views
  • 2 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 0 kudos

All docs are on Databricks official pages, I beleive below will be helpful to begin with.LakehouseIQ: AI Engine for Your Business — https://www.databricks.com/blog/introducing-lakehouseiq-ai-powered-engine-uniquely-understands-your-business - Officia...

  • 0 kudos
1 More Replies
mordex
by > Visitor
  • 25 Views
  • 3 replies
  • 0 kudos

Resolved! Why is spark creating 5 jobs and 200 tasks?

I am trying to read 1000 small csv files each 30 kb size which are stored in databricks volume. Below is the query i am doing:df=spark.read.csv.options(header=true).load('/path')df.collect() Why is it creating 5 jobs? Why 1-3 jobs have 200 tasks,4 ha...

030a9798-9c6f-4ab3-be53-7f6e4a5f7289.jfif
  • 25 Views
  • 3 replies
  • 0 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 0 kudos

@mordex - yes, Spark caps the parallelism for file listing at 200 tasks, regardless of whether you have 1,000 or 10,000 files. it is controlled by spark.sql.sources.parallelPartitionDiscovery.parallelism. Run below command to get value of it. spark.c...

  • 0 kudos
2 More Replies
tarunnagar
by > Contributor
  • 17 Views
  • 1 replies
  • 0 kudos

How to Connect Databricks with Web and Mobile Apps

Hi everyone,I’m exploring ways to leverage Databricks for building data-driven web and mobile applications and wanted to get some insights from this community. Databricks is great for processing large datasets, running analytics, and building machine...

  • 17 Views
  • 1 replies
  • 0 kudos
Latest Reply
jameswood32
New Contributor III
  • 0 kudos

To connect Databricks with web or mobile apps, most developers recommend exposing your data or models through a lightweight API layer. Use Databricks SQL Endpoints or MLflow model serving to generate secure REST endpoints your app can call directly. ...

  • 0 kudos
Hubert-Dudek
by Esteemed Contributor III
  • 85 Views
  • 1 replies
  • 3 kudos

Databricks Advent Calendar 2025

With the first day of December comes the first window of our Databricks Advent Calendar. It’s a perfect time to look back at this year’s biggest achievements and surprises — and to dream about the new “presents” the platform may bring us next year. ...

2025_1.png 2025_2.png
  • 85 Views
  • 1 replies
  • 3 kudos
Latest Reply
Advika
Databricks Employee
  • 3 kudos

Fantastic kickoff to the Databricks Advent Calendar 2025 , appreciate you steering the series, @Hubert-Dudek!

  • 3 kudos
nanditakrishnan
by > New Contributor II
  • 27 Views
  • 1 replies
  • 0 kudos

Databricks Dashboard Optimization

I have trouble understanding why, for every report in a dashboard that refers to the same data source, the query re-runs each time Ideally I would want queries being used to fuel the tables for the dashboard run exactly once, and then have the filter...

  • 27 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @nanditakrishnan ,There's already something like that in databricks dashboards, but some conditions need to be fulfilled (i.e queries need to share same group by). One of dataset optimization techniques that databricks team implemented is doing fo...

  • 0 kudos
gokkul
by > New Contributor II
  • 24 Views
  • 1 replies
  • 0 kudos

Help me with the databricks streamlit application related doubt

Hi Databricks community ,Hi I have a doubt regarding databricks streamlit application . I have a databricks streamlit application that takes input values from the user through streamlit UI. Now I want to store these input values in a delta table in U...

  • 24 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @gokkul ,Your app service principal needs to have a proper permission to write to UC table. You also need to use python databricks sdk to interact with UC object (i.e read/save a table).You can get some inspiration from following databricks cookbo...

  • 0 kudos
tarunnagar
by > Contributor
  • 136 Views
  • 7 replies
  • 3 kudos

How to Optimize Data Pipeline Development on Databricks for Large-Scale Workloads?

Hi everyone,I’m working on building and optimizing data pipelines in Databricks, especially for large-scale workloads, and I want to learn from others who have hands-on experience with performance tuning, architecture decisions, and best practices.I’...

  • 136 Views
  • 7 replies
  • 3 kudos
Latest Reply
jameswood32
New Contributor III
  • 3 kudos

Optimizing Databricks pipelines for large-scale workloads mostly comes down to smart architecture + efficient Spark practices.Key tips from real-world users:Use Delta Lake – for ACID transactions, incremental updates, and schema enforcement.Partition...

  • 3 kudos
6 More Replies
KrishZ
by > Contributor
  • 10875 Views
  • 5 replies
  • 4 kudos

How to use Parallel processing using Concurrent Jobs in Databricks ?

QuestionIt would be great if you could recommend how I go about solving the below problem. I haven't been able to find much help online. A. Background:A1. I have to text manipulation using python (like concatenation , convert to spacy doc , get verbs...

  • 10875 Views
  • 5 replies
  • 4 kudos
Latest Reply
Sangsha
Visitor
  • 4 kudos

I have to process data for n number of devices which is sending data in every 5 seconds.I have a similar scenario where I have to take last 3 hours of data and process it for all the devices for some key parameters. Now if I am doing it sequentially ...

  • 4 kudos
4 More Replies
mtaraviya-QA
by > New Contributor II
  • 392 Views
  • 3 replies
  • 3 kudos

How do I configure my interactive compute in databricks to access files from an EFS filesystem?

I have an S3 account in which I have full administrator privileges. In that account I have a databricks workspace and an EFS filesystem setup.  I created an interactive compute inside databricks workspace with the default config. How do I configure m...

  • 392 Views
  • 3 replies
  • 3 kudos
Latest Reply
EllieFarrell
New Contributor
  • 3 kudos

if you’re trying to mount EFS directly to an interactive cluster, you’ll usually need to handle it through init scripts since EFS requires the NFS client to be installed and mounted at cluster startup. One thing to double-check is whether your worksp...

  • 3 kudos
2 More Replies
cgrant
by Databricks Employee
  • 19350 Views
  • 4 replies
  • 6 kudos

What is the difference between OPTIMIZE and Auto Optimize?

I see that Delta Lake has an OPTIMIZE command and also table properties for Auto Optimize. What are the differences between these and when should I use one over the other?

  • 19350 Views
  • 4 replies
  • 6 kudos
Latest Reply
basit
New Contributor II
  • 6 kudos

Is this still valid answer in 2025 ? https://docs.databricks.com/aws/en/delta/tune-file-size#auto-compaction-for-delta-lake-on-databricks 

  • 6 kudos
3 More Replies
radha_krishna
by > New Contributor
  • 91 Views
  • 4 replies
  • 1 kudos

"ai_parse_document()" is not a full OCR engine ? It's not extracting text from high quality image

 I used "ai_parse_document()" to parse a PNG file that contains cat images and text. From the image, I wanted to extract all the cat names, but the response returned nothing. It seems that "ai_parse_document()" does not support rich image extraction....

  • 91 Views
  • 4 replies
  • 1 kudos
Latest Reply
Raman_Unifeye
Contributor III
  • 1 kudos

@szymon_dybczak - yes, as it relies on AI models, there are chances of missing few cases due to non-deterministic nature of it. I have used it with vast number of PDFs in anger and it has worked pretty well in all those cases. Have not tried with PNG...

  • 1 kudos
3 More Replies
Michael_Galli
by > Contributor III
  • 14636 Views
  • 5 replies
  • 8 kudos

Resolved! Monitoring Azure Databricks in an Azure Log Analytics Workspace

Does anyone have experience with the mspnp/spark-monitoring library ?Is this best practice, or are there better ways to monitor a Databricks Cluster?

  • 14636 Views
  • 5 replies
  • 8 kudos
Latest Reply
vr
Valued Contributor
  • 8 kudos

Interesting that Microsoft deleted this project. Was there any announcement as to when, why, and what to do now?

  • 8 kudos
4 More Replies
Ravikumashi
by > Contributor
  • 3187 Views
  • 4 replies
  • 1 kudos

Resolved! Issue with Logging Spark Events to LogAnalytics after Upgrading to Databricks 11.3 LTS

We have recently been in the process of upgrading our Databricks clusters to version 11.3 LTS. As part of this upgrade, we have been working on integrating the logging of Spark events to LogAnalytics using the repository available at https://github.c...

  • 3187 Views
  • 4 replies
  • 1 kudos
Latest Reply
vr
Valued Contributor
  • 1 kudos

Anyone knows why was this repository deleted?https://github.com/mspnp/spark-monitoring

  • 1 kudos
3 More Replies
Welcome to the Databricks Community!

Once you are logged in, you will be ready to post content, ask questions, participate in discussions, earn badges and more.

Spend a few minutes exploring Get Started Resources, Learning Paths, Certifications, and Platform Discussions.

Connect with peers through User Groups and stay updated by subscribing to Events. We are excited to see you engage!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Top Kudoed Authors
Read Databricks Data Intelligence Platform reviews on G2

Latest from our Blog

How to perform Semantic Search in Databricks Lakebase

Introduction In today’s AI-native world, applications no longer rely on exact keyword matches—they understand meaning. This shift is powered by embeddings: numerical representations of text that captu...

141Views 1kudos