cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Nirupam
by New Contributor III
  • 2715 Views
  • 1 replies
  • 2 kudos

Resolved! Access Mode: Dedicated (assigned to a group) VS Standard

Dedicated Access mode on Azure Databricks clusters provides the option to give access to a GROUP.Trying to understand the use casewhen compared to Standard (formerly: Shared)?When compared to Dedicated (access given to single user)?Ignoring - Languag...

  • 2715 Views
  • 1 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Dedicated Access mode on Azure Databricks clusters is an upgraded feature that extends the capabilities of single-user access mode. This mode allows a compute resource to be assigned either to a single user or to a group. It offers secure sharing amo...

  • 2 kudos
Pat
by Esteemed Contributor
  • 1296 Views
  • 1 replies
  • 0 kudos

Spark custom data sources - SQS streaming reader [DLT]

Hey,I’m working on pulling data from AWS SQS into Databricks using Spark custom data sources and DLT (see https://docs.databricks.com/aws/en/pyspark/datasources). I started with a batch reader/writer based on this example: https://medium.com/@zcking/...

  • 1296 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

For your consideration: To address the challenge of passing message handles from executors back to the driver within the DataSourceStreamReader, consider the following approaches: Challenges in Spark Architecture 1. Executor Memory Isolation: Execut...

  • 0 kudos
Sadam97
by New Contributor III
  • 2367 Views
  • 2 replies
  • 1 kudos

Databricks Job Cluster became unreachable

We have production streaming jobs running on Job Clusters. We face cluster related errors now and then, one such example is below error. Run failed with error message Cluster became unreachable during run Cause: Got invalid response: 404 /ERR_NGROK_3...

  • 2367 Views
  • 2 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Here are some things considerations:   The errors experienced in your production streaming jobs—ERR_NGROK_3200 and Spark driver failed to start within 900 seconds—stem from distinct causes related to connectivity, underlying system constraints, and d...

  • 1 kudos
1 More Replies
zmwaris1
by New Contributor II
  • 4674 Views
  • 3 replies
  • 2 kudos

Connect databricks delta table to Apache Kyln using JDBC

I am using Apache Kylin for Data Analytics and Databricks for data modelling and filtering. I have my final data in gold tables and I would like to integrate this data with Apache Kylin using JDBC where the gold table will be the Data Source. I would...

  • 4674 Views
  • 3 replies
  • 2 kudos
Latest Reply
rpiotr
New Contributor III
  • 2 kudos

@Sidhant07 also using Kylin4 and Sqoop I am getting: "Unsupported transaction isolation level: 2" when running sqoop list-tables. 

  • 2 kudos
2 More Replies
antonioferegrin
by New Contributor
  • 2914 Views
  • 6 replies
  • 1 kudos

FeatureEngineeringClient and Databricks Connect

Hello everyone, I want to use Databricks Connect to connect externally to my clusters and run code, and while Databricks connect works without any issue, like this: ```from databricks.sdk.core import Config config = Config(cluster_id="XXXX")spark = S...

  • 2914 Views
  • 6 replies
  • 1 kudos
Latest Reply
leopoloc0
New Contributor II
  • 1 kudos

I just accomplished by modifying the DatabricksClient by adding the feature_store_uri parameter to it and passing it to each call of get_host_creds. Lets see if databricks releases this simple change soon...

  • 1 kudos
5 More Replies
ankris
by New Contributor III
  • 8067 Views
  • 4 replies
  • 0 kudos

Can anyone provide support on streamlit connectivity with databricks delta table/sql end point

Can anyone provide support on streamlit connectivity with databricks delta table/sql end point

  • 8067 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @ananthakrishna raikar​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best...

  • 0 kudos
3 More Replies
Abishrp
by Contributor
  • 2463 Views
  • 6 replies
  • 1 kudos

Issue in getting system.compute.warehouses table in some workspaces

In some workspaces, I can get system.compute.warehouses table But in some other workspaces, it is not available how can i enable it?Both are in same account but assigned to different metastore. 

Abishrp_0-1737014513631.png Abishrp_1-1737014632007.png
  • 2463 Views
  • 6 replies
  • 1 kudos
Latest Reply
aranjan99
Contributor
  • 1 kudos

I disabled the compute schema and then enabled it again

  • 1 kudos
5 More Replies
ViniciusFC54
by New Contributor
  • 665 Views
  • 1 replies
  • 0 kudos

Cluster policy "personal compute"

Hello! can you help me? In the police "personal compute" how can I remove the permission for all users? I am facing this message when I go to edit: "All users can use the Personal Compute default policy to create compute. This is enabled at the accou...

  • 665 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Community Manager
  • 0 kudos

Hello @ViniciusFC54! Did you try following the steps mentioned here: Managing access to the Personal Compute policy? It shows how admins can manage access to the Personal Compute policy and control who can use it.

  • 0 kudos
mslow
by New Contributor II
  • 1170 Views
  • 2 replies
  • 0 kudos

Docker image for runtime 16.4 LTS with Scala 2.13

I'm trying to test a custom python package with new 16.4 LTS runtime, pulling the official Databricks docker image fromhttps://hub.docker.com/layers/databricksruntime/standard/16.4-LTS/images/sha256-604b73feeac08bc902ab16110218ffc63c1b24ac31f5f04a679...

  • 1170 Views
  • 2 replies
  • 0 kudos
Latest Reply
mslow
New Contributor II
  • 0 kudos

Thanks @Renu_ . That makes sense, I understand that in the DBX workspaces, you can choose between two 16.4 spark versions when creating a compute.My confusion was with using the docker image in local environment. I pull it from the registry but then ...

  • 0 kudos
1 More Replies
maasg
by New Contributor II
  • 1116 Views
  • 2 replies
  • 1 kudos

Displaying images hosted in a Unity Catalog volume in a notebook

I have an image dataset in a Volume in the Unity Catalog.As part of exploratory analysis, I want to run some queries and display the resulting image set.Yet, including dynamic references to images in the Catalog doesn't seem to work on a Databricks N...

  • 1116 Views
  • 2 replies
  • 1 kudos
Latest Reply
maasg
New Contributor II
  • 1 kudos

Hi @Walter_C , Thanks for your suggestion. Exporting the images (=> duplicating the storage) is a no-go in our case. We are talking about very large automotive datasets.Is there another option or should we consider moving this data out of the catalog...

  • 1 kudos
1 More Replies
writetofaiz
by New Contributor II
  • 2827 Views
  • 3 replies
  • 0 kudos

Error: Unable to locate package winehq-stable

Hi,I am running a notebook which has mix of python and scala codes. Below code was working fine until May 27, 2025. However, since yesterday, I’ve been encountering the following error: "E: Unable to locate package winehq-stable".Could someone please...

  • 2827 Views
  • 3 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

add me https://www.linkedin.com/in/aviralb/ we will do live debugging 

  • 0 kudos
2 More Replies
Pavankumar7
by New Contributor III
  • 2583 Views
  • 2 replies
  • 1 kudos

Resolved! Error: Community edition spark job skewnees

I am trying to run the spark job in community edition, but when I noticed in the spark UI the whole data is reading on the driver node, instead of reading it on the worker node.does community version will not support for the worker node?

  • 2583 Views
  • 2 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

In Databricks Community Edition, the compute environment is set up as a single-node cluster. This means there is only one node, which serves both as the driver and the worker. Because of this, all data processing—including reading data—is performed o...

  • 1 kudos
1 More Replies
Bram
by New Contributor II
  • 9293 Views
  • 9 replies
  • 1 kudos

Configuration spark.sql.sources.partitionOverwriteMode is not available.

Dear, In the current setup, we are using dbt as a modeling tool for our data lakehouse.For a specific use case, we want to use the insert_overwrite strategy, where dbt will replace all data for a specific partition:Databricks configurations | dbt Dev...

  • 9293 Views
  • 9 replies
  • 1 kudos
Latest Reply
hendrik
Databricks Employee
  • 1 kudos

An approach that works well when using a Databricks SQL Warehouse is to use the replace_where strategy - I've just tested this. It also works with partitioned tables:{{ config( materialized='incremental', incremental_strategy='replace_where', ...

  • 1 kudos
8 More Replies
SteveC527
by New Contributor
  • 4737 Views
  • 6 replies
  • 1 kudos

Medallion Architecture and Databricks Assistant

I am in the process of rebuilding the data lake at my current company with databricks and I'm struggling to find comprehensive best practices for naming conventions and structuring medallion architecture to work optimally with the Databricks assistan...

  • 4737 Views
  • 6 replies
  • 1 kudos
Latest Reply
suman23479
New Contributor II
  • 1 kudos

If we talk about traditional data warehouse way of building the architecture, we can consider Silver layer as Data mart with star schema kind of relations for dimensions and fact. Can we build entire DWH enterprise scale using databrikcs? I see in pr...

  • 1 kudos
5 More Replies
Splush
by New Contributor II
  • 787 Views
  • 1 replies
  • 0 kudos

JDBC Oracle Connection change Container Statement

Hey,Im running into a weird issue while running the following code:def getDf(query, preamble_sql=None): jdbc_url = f"jdbc:oracle:thin:@//{host}:{port}/{service_name}" request = spark.read \ .format("jdbc") \ .o...

  • 787 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Here is something to consider: The issue you're experiencing likely stems from differences in behavior when accessing Oracle database objects via Spark JDBC versus other database clients like DBeaver. Specifically, Spark's JDBC interface may perform ...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels