cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

somedeveloper
by New Contributor III
  • 1336 Views
  • 2 replies
  • 1 kudos

Modifying size of /var/lib/lxc

Good morning,When running a library (sparkling water) for a very large dataset, I've noticed that during an export procedure the /var/lib/lxc storage is being used. Since the storage seems to be at a static 130GB of memory, this is a problem because ...

  • 1336 Views
  • 2 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

Unfortunately this is a setting that cannot be increased on customer side

  • 1 kudos
1 More Replies
VikasSinha
by New Contributor
  • 8400 Views
  • 5 replies
  • 0 kudos

Which is better - Azure Databricks or GCP Databricks?

Which cloud hosting environment is best to use for Databricks? My question pins down to the fact that there must be some difference between the latency, throughput, result consistency & reproducibility between different cloud hosting environments of ...

  • 8400 Views
  • 5 replies
  • 0 kudos
Latest Reply
bidek56
Contributor
  • 0 kudos

@VikasSinha Databricks is not stable regardless of the cloud, jobs and clusters keep crashing. Use Polars or Duckdb instead.

  • 0 kudos
4 More Replies
Danish11052000
by Contributor
  • 410 Views
  • 1 replies
  • 1 kudos

Resolved! Missing warehouse id/metadata for the system compute warehouse table

I ran the following queries for a specific warehouse_id = '54a93d2138433216' SELECT * FROM system.billing.usage WHERE usage_metadata.warehouse_id = '54a93d2138433216';SELECT * FROM system.compute.warehouse_events WHERE warehouse_id = '54a93d213843321...

Danish11052000_0-1762164373471.png Danish11052000_1-1762164374050.png Danish11052000_2-1762164373505.png
  • 410 Views
  • 1 replies
  • 1 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 1 kudos

@Danish11052000 The issue can happen when the warehouse created is very old (let's say last year), then you may not see the details in system table. If they were deleted before the table was created, you'll not see it.

  • 1 kudos
bianca_unifeye
by Databricks MVP
  • 954 Views
  • 1 replies
  • 2 kudos

Resolved! Webinars

Hi! My colleagues and I at Unifeye are hosting a series of regular webinars focused on Databricks content. In November, we’re running four sessions covering Geospatial, Governance, AI, and Delta Sharing, featuring Databricks architects as guest speak...

  • 954 Views
  • 1 replies
  • 2 kudos
Latest Reply
Advika
Community Manager
  • 2 kudos

Hello @bianca_unifeye! Thanks for sharing! This looks like a great initiative.For better visibility and engagement, please go ahead and post about the Webinar series directly in the Community as a post. Members can then register and join through your...

  • 2 kudos
bianca_unifeye
by Databricks MVP
  • 393 Views
  • 0 replies
  • 1 kudos

Webinar: Geospatial Data Ingestion and Manipulation on Databricks

Geospatial Data Meets Databricks + Felt: Turning Coordinates into Business InsightMost organisations capture huge volumes of spatial data — addresses, coordinates, routes, catchments — but struggle to operationalise it at scale. Traditional GIS tool...

Data Engineering
geospatial
webinar
  • 393 Views
  • 0 replies
  • 1 kudos
Abdul_Alikhan
by New Contributor II
  • 2203 Views
  • 5 replies
  • 3 kudos

Resolved! in data bricks free edition Serverless compute is not working

I recently logged into the Databricks free edition, but the serverless compute is not working. I'm receiving the error: 'An error occurred while trying to attach serverless compute. Please try again or contact support.'"

  • 2203 Views
  • 5 replies
  • 3 kudos
Latest Reply
LonaOsmani
New Contributor III
  • 3 kudos

Hi @Abdul_Alikhan ,I experienced the same yesterday when I imported some of my notebooks. I noticed that this error only appeared for imported notebooks because the environment version was 1 by default. Changing the environment version to 2 solved th...

  • 3 kudos
4 More Replies
FarhanM
by New Contributor II
  • 1071 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks Streaming: Recommended Cluster Types and Best Practices

Hi Community, I recently built some streaming pipelines (Autoloader-based) that extract JSON data from the Data Lake and, after parsing and logging, dump it into the Delta Lake bronze layer. Since these are streaming pipelines, they are supposed to r...

  • 1071 Views
  • 1 replies
  • 1 kudos
Latest Reply
bianca_unifeye
Databricks MVP
  • 1 kudos

When running streaming pipelines, the key is to design for stability and isolation, not to rely on restart jobs.The first thing to do is run your streams on Jobs Compute, not All-Purpose clusters. If available, use Serverless Jobs. Each pipeline shou...

  • 1 kudos
brickster_2018
by Databricks Employee
  • 2815 Views
  • 2 replies
  • 0 kudos

Resolved! I do not have any Spark jobs running, but my cluster is not getting auto-terminated.

The cluster is Idle and there are no Spark jobs running on the Spark UI. Still I see my cluster is active and not getting terminated.

  • 2815 Views
  • 2 replies
  • 0 kudos
Latest Reply
brickster_2018
Databricks Employee
  • 0 kudos

Databricks cluster is treated as active if there are any spark or non-Spark operations running on the cluster. Even though there are no Spark jobs running on the cluster, it's possible to have some driver-specific application code running marking th...

  • 0 kudos
1 More Replies
fundat
by New Contributor III
  • 507 Views
  • 2 replies
  • 2 kudos

Resolved! Course - Introduction to Apache Spark

Hi,In the course Introduction to Apache Spark; according to Apache Spark Runtime Architecture; Page 6 of 15. It says that :The cluster manager allocates resources and assigns tasks......Workers perform tasks assigned by the driverCan you help me plea...

fundat_3-1761596488970.png
  • 507 Views
  • 2 replies
  • 2 kudos
Latest Reply
BS_THE_ANALYST
Databricks Partner
  • 2 kudos

Hi @fundat Perhaps the picture is useful here:Give this blog a read, I think this will answer some of your questions: https://medium.com/@knoldus/understanding-the-working-of-spark-driver-and-executor-4fec0e669399 .All the best,BS

  • 2 kudos
1 More Replies
jigar191089
by New Contributor III
  • 6970 Views
  • 12 replies
  • 0 kudos

Multiple concurrent jobs using interactive cluster

Hi All,I have notebook in Databricks. This notebook is executed from azure datafactory pipeline having a databricks notebook activity with linkedservice connected to an interactive cluster.When multiple concurrent runs of this pipeline are created, I...

Data Engineering
azure
Databricks
interactive cluster
  • 6970 Views
  • 12 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @jigar191089 , I did some digging and here are some ideas to think about.   This smells like a shared-state/import-path issue on an interactive cluster under concurrency.   What likely happened Your notebook imports Python modules from /dbf...

  • 0 kudos
11 More Replies
mkwparth
by Databricks Partner
  • 1095 Views
  • 2 replies
  • 1 kudos

Resolved! DLT | Communication lost with driver | Cluster was not reachable for 120 seconds

Hey Community, I'm facing this error, It says that "com.databricks.pipelines.common.errors.deployment.DeploymentException: Communication lost with driver. Cluster 1030-205818-yu28ft9s was not reachable for 120 seconds" This issue occurred in producti...

mkwparth_0-1761892686441.png
  • 1095 Views
  • 2 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 1 kudos

This is actually a known intermittent issue in Databricks, particularly with streaming or Delta Live Tables (DLT) pipelines.This isn’t a logical failure in your code — it’s an infrastructure-level timeout between the Databricks control plane and the ...

  • 1 kudos
1 More Replies
CaptainJack
by New Contributor III
  • 479 Views
  • 1 replies
  • 0 kudos

Pull workspace url and workspace name using databricks-sdk / programaticaly in notebook

1. How could I pull workspace url (https://adb-XXXXX.XX.....net) 2. How could I get workspace name visible in top right corner.I know that easies solution is dbutils.notebook.entry_point.... browserHostName but unfortunetly it is not working in job c...

  • 479 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

Can you give this a shot? Not sure if you've a hard requirement of using SDK.  workspace_url = spark.conf.get('spark.databricks.workspaceUrl') Getting name is more tricky. You could potentially get it from tags if there is a tagging strategy in place...

  • 0 kudos
deano2025
by New Contributor II
  • 1929 Views
  • 1 replies
  • 1 kudos

Databricks asset bundles CI/CD design for github actions

We are wanting to use Databricks asset bundles and deploy code changes and tests using github actions. We have seen lots of content online, but nothing concrete on how this is done at scale. So I'm wondering, if we have many changes and therefore man...

Data Engineering
asset bundles
  • 1929 Views
  • 1 replies
  • 1 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 1 kudos

Have you read about following approach before?    Repository Structure Options     1. Monorepo with Multiple Bundles     repo-root/   ├── .github/   │   └── workflows/   │       ├── bundle-ci.yml   │       └── bundle-deploy.yml   ├── bundles/   │   ├...

  • 1 kudos
JanFalta
by New Contributor
  • 497 Views
  • 1 replies
  • 0 kudos

Data Masking

Hi all,I need some help on this masking problem. If you create a view with used masking function based on table.The user reading this view has to have read access to underlying table. So theoretically, he can access unmasked data in the table.I would...

  • 497 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

Are you on Unity catalog?  Databricks has a solution for this through Unity Catalog Column Masking (also called Dynamic Views or Column-Level Security). https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/filters-and-mask...

  • 0 kudos
bhawana-pandey
by Databricks Partner
  • 540 Views
  • 1 replies
  • 0 kudos

Looking for reference DABs bundle yaml and resources for Databricks app deployment (FastAPI redirect

Looking for example databricks.yml and bundle resources for deploying a FastAPI Databricks app using DABs from one environment to another. Deployment works but FastAPI redirects to localhost after deployment, though the homepage loads fine. Need refe...

  • 540 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

This is a great place to start: https://apps-cookbook.dev/resources/ Happy to answer specifics as they come after you've reviewed that resource. 

  • 0 kudos
Labels