cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

m997al
by Contributor III
  • 4629 Views
  • 3 replies
  • 0 kudos

Errors using Databricks Extension for VS Code on Windows

Hi - I am trying to get my VS Code (running on Windows) to work with the Databricks extension for VS Code.  It seems like I can almost get this to work.  Here is my setup:1. Using Databricks Extension v2.4.02. Connecting to Databricks cluster with ru...

  • 4629 Views
  • 3 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor III
  • 0 kudos

I guess it will be very likely you are already using it but if not, I would suggest to use GIT folders directly in Databricks in conjunction with VS Code as you are doing now. There will be times in which is much faster/straightforward to run noteboo...

  • 0 kudos
2 More Replies
pooja_bhumandla
by New Contributor III
  • 304 Views
  • 3 replies
  • 1 kudos

Resolved! Seeking Insights on Liquid Clustering (LC) Based on Table Sizes

Hi all,I'm exploring Liquid Clustering (LC) and its effectiveness based on the size of the tables.Specifically, I’m interested in understanding how LC behaves with small, medium, and large tables and the best practices for each, along with size range...

  • 304 Views
  • 3 replies
  • 1 kudos
Latest Reply
pooja_bhumandla
New Contributor III
  • 1 kudos

Hi @bianca_unifeye , thank you for your response.My tables range in size from 1 KB to 5 TB. Given this, I’d love to hear your thoughts and experiences on whether Liquid Clustering (LC) would be a good fit in this scenario. Thanks in advance for shari...

  • 1 kudos
2 More Replies
Charansai
by New Contributor III
  • 289 Views
  • 2 replies
  • 2 kudos

Pipelines not included in Databricks Asset Bundles deployment

Hi all,I’m working with Databricks Asset Bundles (DAB) to build and deploy Jobs and pipelines across multiple environments in Azure Databricks.I can successfully deploy Jobs using bundles.However, when I try to deploy pipelines, I notice that the bun...

  • 289 Views
  • 2 replies
  • 2 kudos
Latest Reply
Coffee77
Contributor III
  • 2 kudos

As per this documentation https://docs.databricks.com/aws/en/dev-tools/bundles/resources#pipeline you should be able to do it whit latest CLI version. Check you have that latest version.Here is a sample databricks.yml configuration file -> https://gi...

  • 2 kudos
1 More Replies
Charansai
by New Contributor III
  • 297 Views
  • 1 replies
  • 0 kudos

How to use serverless clusters in DAB deployments with Unity Catalog in private network?

Hi everyone,I’m deploying Jobs and Pipelines using Databricks Asset Bundles (DAB) in an Azure Databricks workspace configured with private networking. I’m trying to use serverless compute for some workloads, but I’m running into issues when Unity Cat...

  • 297 Views
  • 1 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor III
  • 0 kudos

A lot of questions  Concerning usage of serverless clusters in databricks.yml and assuming you're using those clusters in jobs, you must define them in the job definition. Take a look here: https://github.com/databricks/bundle-examples/tree/main/know...

  • 0 kudos
intelliconnectq
by New Contributor III
  • 306 Views
  • 2 replies
  • 0 kudos

Resolved! Loading CSV from private S3 bucket

Trying to load a csv file from a private S3 bucketplease clarify requirements to do this- Can I do it in community edition (if yes then how)? How to do it in premium version?I have IAM role and I also access key & secret 

  • 306 Views
  • 2 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor III
  • 0 kudos

Assuming you have these pre-requisites: A private S3 bucket (e.g., s3://my-private-bucket/data/file.csv)An IAM user or role with access (list/get) to that bucketThe AWS Access Key ID and Secret Access Key (client and secret)The most straightforward w...

  • 0 kudos
1 More Replies
Hubert-Dudek
by Databricks MVP
  • 26719 Views
  • 14 replies
  • 12 kudos

Resolved! dbutils or other magic way to get notebook name or cell title inside notebook cell

Not sure it exists but maybe there is some trick to get directly from python code:NotebookNameCellTitlejust working on some logger script shared between notebooks and it could make my life a bit easier

  • 26719 Views
  • 14 replies
  • 12 kudos
Latest Reply
rtullis
New Contributor II
  • 12 kudos

I got the solution to work in terms of printing the notebook that I was running; however, what if you have notebook A that calls a function that prints the notebook name, and you run notebook B that %runs notebook A?  I get the notebook B's name when...

  • 12 kudos
13 More Replies
kahrees
by New Contributor II
  • 425 Views
  • 3 replies
  • 4 kudos

Resolved! DATA_SOURCE_NOT_FOUND Error with MongoDB (Suggestions in other similar posts have not worked)

I am trying to load data from MongoDB into Spark. I am using the Community/Free version of DataBricks so my Jupiter Notebook is in a Chrome browser.Here is my code:from pyspark.sql import SparkSession spark = SparkSession.builder \ .config("spar...

  • 425 Views
  • 3 replies
  • 4 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 4 kudos

Hey @kahrees , Good Day! I tested this internally, and I was able to reproduce the issue. Screenshot below:   You’re getting [DATA_SOURCE_NOT_FOUND] ... mongodb because the MongoDB Spark connector jar isn’t actually on your cluster’s classpath. On D...

  • 4 kudos
2 More Replies
eyalholzmann
by New Contributor II
  • 385 Views
  • 3 replies
  • 2 kudos

Resolved! Does VACUUM on Delta Lake also clean Iceberg metadata when using Iceberg Uniform feature?

I'm working with Delta tables using the Iceberg Uniform feature to enable Iceberg-compatible reads. I’m trying to understand how metadata cleanup works in this setup.Specifically, does the VACUUM operation—which removes old Delta Lake metadata based ...

  • 385 Views
  • 3 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Here’s how to approach cleaning and maintaining Apache Iceberg metadata on Databricks, and how it differs from Delta workflows. First, know your table type For Unity Catalog–managed Iceberg tables, Databricks runs table maintenance for you (predicti...

  • 2 kudos
2 More Replies
pooja_bhumandla
by New Contributor III
  • 307 Views
  • 1 replies
  • 1 kudos

Resolved! Should I enable Liquid Clustering based on table size distribution?

Hi everyone,I’m evaluating whether Liquid Clustering would be beneficial for the tables based on the sizes. Below is the size distribution of tables in my environment:Size Bucket Table Count Large (> 1 TB)3Medium (10 GB – 1 TB)284Small (< 10 GB)17,26...

  • 307 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Greetings @pooja_bhumandla  Based on your size distribution, enabling Liquid Clustering can provide meaningful gains—but you’ll get the highest ROI by prioritizing your medium and large tables first and selectively applying it to small tables where q...

  • 1 kudos
bidek56
by Contributor
  • 549 Views
  • 5 replies
  • 1 kudos

Resolved! Location of spark.scheduler.allocation.file

In DBR 164.LTS, I am trying to add the following Spark config: spark.scheduler.allocation.file: file:/Workspace/init/fairscheduler.xmlBut the all purpose cluster is throwing this error Spark error: Driver down cause: com.databricks.backend.daemon.dri...

  • 549 Views
  • 5 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

Here's some solutions without using DBFS..  Yes, there are solutions for using the Spark scheduler allocation file on Databricks without DBFS, but options are limited and depend on your environment and access controls. Alternatives to DBFS for Schedu...

  • 1 kudos
4 More Replies
Yuki
by Contributor
  • 475 Views
  • 4 replies
  • 1 kudos

Resolved! Is there any way to run jobs from github actions and catch the results?

Hi all,Is there any way to run jobs from github actions and catch the results?Of course, I can do this if I use the API or CLI.But I found the actions for notebook: https://github.com/marketplace/actions/run-databricks-notebook  Compared to this, wri...

  • 475 Views
  • 4 replies
  • 1 kudos
Latest Reply
Yuki
Contributor
  • 1 kudos

OK, thank you for your advices, I will consider to use asset bundles for this.

  • 1 kudos
3 More Replies
Naveenkumar1811
by New Contributor III
  • 297 Views
  • 2 replies
  • 0 kudos

What is the Best Practice of Maintaining the Delta table loaded in Streaming?

Hi Team,We have our Bronze(append) Silver(append) and Gold(merge) Tables loaded using spark streaming continuously with trigger as processing time(3 secs).We Also Run our Maintenance Job on the Table like OPTIMIZE,VACCUM and we perform DELETE for som...

  • 297 Views
  • 2 replies
  • 0 kudos
Latest Reply
Naveenkumar1811
New Contributor III
  • 0 kudos

Hi Mark,But the real problem is our streaming job runs 365 days 24 *7 and we cant afford any further latency to our data flowing to gold layer. We don't have any window to pause or slower our streaming and we continuously get the data feed actually s...

  • 0 kudos
1 More Replies
hidden
by New Contributor II
  • 190 Views
  • 1 replies
  • 0 kudos

DLT PARAMETERIZATION FROM JOBS PARAMETERS

I have created a dlt pipeline notebook which creates tables based on a config file that has the configuration of the tables that need to be created . now what i want is i want to run my pipeline every 30 min for 4 tables from config and every 3 hours...

  • 190 Views
  • 1 replies
  • 0 kudos
Latest Reply
Coffee77
Contributor III
  • 0 kudos

Define "parameters" in job as usual and then, try to capture them in DLT by using similar code to this one:dlt.conf.get("PARAMETER_NAME", "PARAMETER_DEFAULT_VALUE")It should get parameter values from job if value exists, otherwise it'll set the defau...

  • 0 kudos
santosh_bhosale
by New Contributor
  • 205 Views
  • 2 replies
  • 1 kudos

Issue with Unity Catlog on Azure

when I create Databricks workspace on Azure and tries to login on https://accounts.azuredatabricks.net/ it redirects to my workspace. Where as on Azure subscription I am the owner, I created this azure subscription and Databricks workspace is also cr...

  • 205 Views
  • 2 replies
  • 1 kudos
Latest Reply
Coffee77
Contributor III
  • 1 kudos

Clearly, you don't have "admin account" permissions. Try to click in the workspace drop-down and then, check if you can see and click in "Manage Account" to confirm BUT it will be very likely you are not allowed to access.You must be Azure Global Adm...

  • 1 kudos
1 More Replies
Allen123Maria_1
by New Contributor
  • 685 Views
  • 2 replies
  • 0 kudos

Resolved! Optimizing Azure Functions for Performance and Cost with Variable Workloads

Hey, everyone!!I use Azure Functions in a project where the workloads change a lot. Sometimes it's quiet, and other times we get a lot of traffic.Azure Functions is very scalable, but I've had some trouble with cold starts and keeping costs down.I'm ...

  • 685 Views
  • 2 replies
  • 0 kudos
Latest Reply
susanrobert3
New Contributor II
  • 0 kudos

Hey!!!Cold starts on Azure Functions Premium can still bite if your instances go idle long enough — even with pre-warmed instances.What usually helps is bumping the `preWarmedInstanceCount` to at least 1 per plan (so there’s always a warm worker), an...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels