cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

oneill
by New Contributor II
  • 3818 Views
  • 3 replies
  • 0 kudos

SQL - Dynamic overwrite + overwrite schema

Hello,Let say we have an empty table S that represents the schema we want to keepABCDEWe have another table T partionned by column A with a schema that depends on the file we have load into. Say :ABCF1b1c1f12b2c2f2Now to make T having the same schema...

  • 3818 Views
  • 3 replies
  • 0 kudos
Latest Reply
oneill
New Contributor II
  • 0 kudos

Hi, thanks for the reply. I've already looked at the documentation on this point, which actually states that dynamic overwrite doesn't work with schema overwrite, while the instructions described above seem to indicate the opposite.

  • 0 kudos
2 More Replies
andreapeterson
by Contributor
  • 402 Views
  • 1 replies
  • 0 kudos

Question about which tags appear in drop down

Hi there, I have a question regarding the appearance of tags in the drop down when adding a tag to a resource (catalog, schema, table, column - level). When does a tag get populated in a drop down? I noticed when I created a column level tag, and wan...

  • 402 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hello @andreapeterson Yes, your understanding of Databricks tag behavior is correct. In Databricks Unity Catalog, tags follow a hierarchical inheritance pattern:Downward inheritance: Tags applied at higher levels (catalog → schema → table) become ava...

  • 0 kudos
sparklez
by New Contributor III
  • 1327 Views
  • 3 replies
  • 2 kudos

Resolved! Creating Cluster configuration with library dependency using DABS

I am trying to create a cluster configuration using DABS and defining library dependencies.My yaml file looks like this: resources: clusters: project_Job_Cluster: cluster_name: "Project Cluster" spark_version: "16.3.x-cpu-ml-scala2.12" node_type_id: ...

  • 1327 Views
  • 3 replies
  • 2 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 2 kudos

Hi @sparklez You're encountering this issue because the libraries field is not valid in the cluster configuration.Libraries need to be specified at the job level, not the cluster level.Option 1: Job-Level Libraries (Recommended)Move the libraries sec...

  • 2 kudos
2 More Replies
Pratikmsbsvm
by Contributor
  • 2315 Views
  • 5 replies
  • 7 kudos

Resolved! Migrating From Azure to Databricks

Hi Techie,May someone please help me with Pros and Cons from migrating my Realtime streaming solution from Azure to Databricks.which component I can replaced with Databricks and what benefit I can get out of it.Current Architecture:- Many Thanks 

HLD.png
  • 2315 Views
  • 5 replies
  • 7 kudos
Latest Reply
vaibhavs120
Contributor
  • 7 kudos

I completely agree with @lingareddy_Alva on the costing part. One small point I would like to mention is We should only enable SPOT instances (60-90% cost savings) in Development/non-critical(PROD) environment. This option works great and is indeed c...

  • 7 kudos
4 More Replies
anil_reddaboina
by New Contributor II
  • 1098 Views
  • 2 replies
  • 0 kudos

Slow running Spark job issue - due to the unknown spark stages created by Databircks Compute cluster

Hi Team,Recently we migrated the spark jobs from self hosted spark(YARN) Cluster to Databricks.Currently we are using the Databricks workflows with Job_Compute clusters and the Job Type - Spark JAR type execution, so when we run the job in databricks...

databricks_new_stages.png
  • 1098 Views
  • 2 replies
  • 0 kudos
Latest Reply
anil_reddaboina
New Contributor II
  • 0 kudos

Hey Brahma,Thanks for your reply. As a first step I will disable AQE config and test it. We are using the node pools with job_compute cluster type so that its not spinning up a new cluster for each Job. I'm configuring the below two configs also, do ...

  • 0 kudos
1 More Replies
chsoni12
by New Contributor II
  • 946 Views
  • 1 replies
  • 0 kudos

Legacy Autoscaling(workflow) VS Enhanced Autoscaling(DLT)

I conducted a proof of concept (POC) to compare the performance of the DLT pipeline and Databricks Workflow using the same workload, task, code, and cluster configuration. Both configurations were set with autoscaling enabled, with a minimum of 1 wor...

  • 946 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi chsoni12,How are you doing today?, As per my understanding, That's a great observation, and it's awesome that you're testing performance and cost between DLT and regular workflows. The key difference here lies in how autoscaling works. DLT pipelin...

  • 0 kudos
MohammadWasi
by New Contributor II
  • 3204 Views
  • 4 replies
  • 0 kudos

i can list out the file using dbutils but can not able to read files in databricks

I can list out the file using dbutils but can not able to read files in databricks. PFB in screenshot. I can able to see the  file using dbutils.fs.ls but when i try to read this file using read_excel then it is showing me an error like "FileNotFound...

MohammadWasi_0-1715064354700.png
Data Engineering
Databricks
  • 3204 Views
  • 4 replies
  • 0 kudos
Latest Reply
BenjaminJacquet
New Contributor II
  • 0 kudos

Hello @MohammadWasi  did you finally figure out what the problem was? I am encountering the exact same issue

  • 0 kudos
3 More Replies
BMex
by New Contributor III
  • 535 Views
  • 1 replies
  • 0 kudos

Folders in Workflows/Jobs

Would be great if we could "group" Workflows/Jobs in Databricks using folders.This way, the Workflows list won't be too cluttered with all Workflows/Jobs in the same root-level.

Data Engineering
Folders
ideas
Workflows
  • 535 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @BMex! You can submit this as a feature request through the Databricks Ideas Portal. This helps the product team consider it for future improvements

  • 0 kudos
GeoPer
by New Contributor III
  • 1064 Views
  • 5 replies
  • 1 kudos

Resolved! Fails to use unity catalog in All purpose cluster

Hey there,today we cannot load/read data from Unity Catalog with the same cluster as we did yesterday successfully (no changes in clsuter configuration).The error, which persists, according to the cluster logs is:com.databricks.common.client.Databric...

  • 1064 Views
  • 5 replies
  • 1 kudos
Latest Reply
GeoPer
New Contributor III
  • 1 kudos

@Advika the issue is gone.Now without any change all-purpose has access again to unity catalog.Who knows what happened...Thanks again for your interest

  • 1 kudos
4 More Replies
mkwparth
by New Contributor III
  • 1483 Views
  • 2 replies
  • 1 kudos

Resolved! Intermittent Timeout Error While Waiting for Python REPL to Start in Databricks

Hi everyone,I’ve been encountering an error that says "Timeout while waiting for the Python REPL to start. Took longer than 60 seconds" during my work in Databricks. The issue seems to happen intermittently - sometimes the REPL starts without any pro...

  • 1483 Views
  • 2 replies
  • 1 kudos
Latest Reply
mkwparth
New Contributor III
  • 1 kudos

@Rohan2405"If everything else is in place, increasing the REPL startup timeout in the cluster configuration may help accommodate slower setups".Can you please guide me how to increase the REPL timeout in cluster configuration? Like I've add this conf...

  • 1 kudos
1 More Replies
minhhung0507
by Valued Contributor
  • 1642 Views
  • 2 replies
  • 2 kudos

Spark Driver keeps restarting due to high GC pressure despite scaling up memory

I'm running into an issue where my Spark driver keeps pausing and eventually restarting due to excessive garbage collection (GC), even though I’ve already scaled up the cluster memory. Below is an example from the driver logs:Driver/192.168.231.23 pa...

minhhung0507_0-1749024097281.png minhhung0507_1-1749024103949.png
  • 1642 Views
  • 2 replies
  • 2 kudos
Latest Reply
minhhung0507
Valued Contributor
  • 2 kudos

Thank you very much for your detailed analysis and helpful recommendations.We have reviewed your suggestions, and I’d like to share a quick update:We have already tried most of the mitigation strategies you mentioned — including increasing driver mem...

  • 2 kudos
1 More Replies
ankit001mittal
by New Contributor III
  • 1898 Views
  • 1 replies
  • 0 kudos

Policy for DLT

Hi,I am trying to define a policy for our DLT pipelines and I would like to provide a specific spark version like in the below example:  { "spark_conf.spark.databricks.cluster.profile": { "type": "forbidden", "hidden": true }, "spark_ve...

  • 1898 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @ankit001mittal The error you're encountering is because Delta Live Tables (DLT) has specific requirements and automatically manages certain cluster configurations, including the Spark version. DLT pipelines are designed to use optimized Spark ver...

  • 0 kudos
RabahO
by New Contributor III
  • 8531 Views
  • 4 replies
  • 1 kudos

Dashboard always display truncated data

Hello, we're working with a serverless SQL cluster to query Delta tables and display some analytics in dashboards. We have some basic group by queries that generate around 36k lines, and they are executed without the "limit" key word. So in the data ...

RabahO_0-1714985064998.png RabahO_1-1714985222841.png
  • 8531 Views
  • 4 replies
  • 1 kudos
Latest Reply
DougCorson1234
New Contributor II
  • 1 kudos

I also have this issue, 95% of our reporting goes to excel from display window..  We need the full data shown so can simply copy and paste to excel , no need to "Download",  this causes unneeded files piled up in the download folder, it also as you s...

  • 1 kudos
3 More Replies
petergriffin1
by New Contributor II
  • 1651 Views
  • 3 replies
  • 1 kudos

Resolved! Are you able to create a iceberg table natively in Databricks?

Been trying to create a iceberg table natively in databricks with the cluster being 16.4. I also have the Iceberg JAR file for 3.5.2 Spark.Using a simple command such as:%sql CREATE OR REPLACE TABLE catalog1.default.iceberg( a INT ) USING iceberg...

  • 1651 Views
  • 3 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Databricks supports creating and working with Apache Iceberg tables natively under specific conditions. Managed Iceberg tables in Unity Catalog can be created directly using Databricks Runtime 16.4 LTS or newer. The necessary setup requires enabling ...

  • 1 kudos
2 More Replies
nadia
by New Contributor II
  • 2970 Views
  • 2 replies
  • 0 kudos

Resolved! Connection Databricks Postgresql

I use Databricks and I try to connect to posgresql via the following code"jdbcHostname = "xxxxxxx"jdbcDatabase = "xxxxxxxxxxxx"jdbcPort = "5432"username = "xxxxxxx"password = "xxxxxxxx"jdbcUrl = "jdbc:postgresql://{0}:{1}/{2}".format(jdbcHostname, jd...

  • 2970 Views
  • 2 replies
  • 0 kudos
Latest Reply
santhosh11
New Contributor II
  • 0 kudos

Can you tell me how you are able to connect postgres database from Databricks . Do we have to whitelist ips in postgres?

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels