cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

GeoPer
by New Contributor III
  • 1079 Views
  • 5 replies
  • 1 kudos

Resolved! Fails to use unity catalog in All purpose cluster

Hey there,today we cannot load/read data from Unity Catalog with the same cluster as we did yesterday successfully (no changes in clsuter configuration).The error, which persists, according to the cluster logs is:com.databricks.common.client.Databric...

  • 1079 Views
  • 5 replies
  • 1 kudos
Latest Reply
GeoPer
New Contributor III
  • 1 kudos

@Advika the issue is gone.Now without any change all-purpose has access again to unity catalog.Who knows what happened...Thanks again for your interest

  • 1 kudos
4 More Replies
mkwparth
by New Contributor III
  • 1506 Views
  • 2 replies
  • 1 kudos

Resolved! Intermittent Timeout Error While Waiting for Python REPL to Start in Databricks

Hi everyone,I’ve been encountering an error that says "Timeout while waiting for the Python REPL to start. Took longer than 60 seconds" during my work in Databricks. The issue seems to happen intermittently - sometimes the REPL starts without any pro...

  • 1506 Views
  • 2 replies
  • 1 kudos
Latest Reply
mkwparth
New Contributor III
  • 1 kudos

@Rohan2405"If everything else is in place, increasing the REPL startup timeout in the cluster configuration may help accommodate slower setups".Can you please guide me how to increase the REPL timeout in cluster configuration? Like I've add this conf...

  • 1 kudos
1 More Replies
minhhung0507
by Valued Contributor
  • 1667 Views
  • 2 replies
  • 2 kudos

Spark Driver keeps restarting due to high GC pressure despite scaling up memory

I'm running into an issue where my Spark driver keeps pausing and eventually restarting due to excessive garbage collection (GC), even though I’ve already scaled up the cluster memory. Below is an example from the driver logs:Driver/192.168.231.23 pa...

minhhung0507_0-1749024097281.png minhhung0507_1-1749024103949.png
  • 1667 Views
  • 2 replies
  • 2 kudos
Latest Reply
minhhung0507
Valued Contributor
  • 2 kudos

Thank you very much for your detailed analysis and helpful recommendations.We have reviewed your suggestions, and I’d like to share a quick update:We have already tried most of the mitigation strategies you mentioned — including increasing driver mem...

  • 2 kudos
1 More Replies
ankit001mittal
by New Contributor III
  • 1906 Views
  • 1 replies
  • 0 kudos

Policy for DLT

Hi,I am trying to define a policy for our DLT pipelines and I would like to provide a specific spark version like in the below example:  { "spark_conf.spark.databricks.cluster.profile": { "type": "forbidden", "hidden": true }, "spark_ve...

  • 1906 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @ankit001mittal The error you're encountering is because Delta Live Tables (DLT) has specific requirements and automatically manages certain cluster configurations, including the Spark version. DLT pipelines are designed to use optimized Spark ver...

  • 0 kudos
RabahO
by New Contributor III
  • 8554 Views
  • 4 replies
  • 1 kudos

Dashboard always display truncated data

Hello, we're working with a serverless SQL cluster to query Delta tables and display some analytics in dashboards. We have some basic group by queries that generate around 36k lines, and they are executed without the "limit" key word. So in the data ...

RabahO_0-1714985064998.png RabahO_1-1714985222841.png
  • 8554 Views
  • 4 replies
  • 1 kudos
Latest Reply
DougCorson1234
New Contributor II
  • 1 kudos

I also have this issue, 95% of our reporting goes to excel from display window..  We need the full data shown so can simply copy and paste to excel , no need to "Download",  this causes unneeded files piled up in the download folder, it also as you s...

  • 1 kudos
3 More Replies
petergriffin1
by New Contributor II
  • 1664 Views
  • 3 replies
  • 1 kudos

Resolved! Are you able to create a iceberg table natively in Databricks?

Been trying to create a iceberg table natively in databricks with the cluster being 16.4. I also have the Iceberg JAR file for 3.5.2 Spark.Using a simple command such as:%sql CREATE OR REPLACE TABLE catalog1.default.iceberg( a INT ) USING iceberg...

  • 1664 Views
  • 3 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Databricks supports creating and working with Apache Iceberg tables natively under specific conditions. Managed Iceberg tables in Unity Catalog can be created directly using Databricks Runtime 16.4 LTS or newer. The necessary setup requires enabling ...

  • 1 kudos
2 More Replies
nadia
by New Contributor II
  • 2979 Views
  • 2 replies
  • 0 kudos

Resolved! Connection Databricks Postgresql

I use Databricks and I try to connect to posgresql via the following code"jdbcHostname = "xxxxxxx"jdbcDatabase = "xxxxxxxxxxxx"jdbcPort = "5432"username = "xxxxxxx"password = "xxxxxxxx"jdbcUrl = "jdbc:postgresql://{0}:{1}/{2}".format(jdbcHostname, jd...

  • 2979 Views
  • 2 replies
  • 0 kudos
Latest Reply
santhosh11
New Contributor II
  • 0 kudos

Can you tell me how you are able to connect postgres database from Databricks . Do we have to whitelist ips in postgres?

  • 0 kudos
1 More Replies
dollyb
by Contributor II
  • 1576 Views
  • 4 replies
  • 2 kudos

Resolved! Databricks Connect and DBR 16.4 LTS @ Scala 2.13

Hi there,we're running Scala jobs on Databricks and I was eager to finally upgrade to Scala 2.13. However, Databricks Connect 16.4.x doesn't handle Scala versioning, so all dependencies are tied to Scala 2.13. It's rather tedious to exclude all 2.12 ...

  • 1576 Views
  • 4 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Runtime 17.0 is out in beta right now and I expect it to GA in the near future. Keep an eye out for the runtime release notes. Once they are released you will be able to see what's present and hopefully (fingers crossed) your dependency issue will be...

  • 2 kudos
3 More Replies
Akshay_Petkar
by Valued Contributor
  • 1473 Views
  • 3 replies
  • 3 kudos

Resolved! Understanding Serverless Compute Sharing Across Notebooks in Databricks

Hi Community,I am using Databricks Serverless compute in notebooks. When I create multiple notebooks and choose Serverless as the compute, I noticed that I can select the same serverless cluster for all of them.This brings up a few questions:Is this ...

  • 1473 Views
  • 3 replies
  • 3 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 3 kudos

Could you clarify what you mean by “The driver detaches”? If the driver detaches, the cluster would typically fail. Are you using Spark for processing, or is this a pure Python workload? If you’re using pure Python, only the driver node is utilized, ...

  • 3 kudos
2 More Replies
marko_rd
by New Contributor II
  • 879 Views
  • 2 replies
  • 2 kudos

Resolved! azure-storage-blob-changefeed -ModuleNotFoundError: No module named 'azure.storage.blob.changefeed'

Added a python package azure-storage-blob-changefeed (https://pypi.org/project/azure-storage-blob-changefeed/).But trying to access it from the notebook like this:from azure.storage.blob.changefeed import ChangeFeedClientraises ModuleNotFoundError.Tr...

  • 879 Views
  • 2 replies
  • 2 kudos
Latest Reply
jjaymez
New Contributor III
  • 2 kudos

Hi! thanks for answering to our question.We tried that and it didn't work; but it gave us the idea to uninstall and install the last version azure-storage-blob, it worked after that.For serverless clusters, the package: azure-storage-blob is coming w...

  • 2 kudos
1 More Replies
smukhi
by New Contributor II
  • 7978 Views
  • 5 replies
  • 0 kudos

Encountering Error UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE

As of this morning we started receiving the following error message on a Databricks job with a single Pyspark Notebook task. The job has not had any code changes in 2 months. The cluster configuration has also not changed. The last successful run of ...

  • 7978 Views
  • 5 replies
  • 0 kudos
Latest Reply
EndreM
New Contributor III
  • 0 kudos

Im facing the same issue. The error came after we switched to unity catalog, and I try to replay a stream job. The job both reads and writes. This problem has been open for more than 12 months and only new contributes has commented on it...The stackt...

  • 0 kudos
4 More Replies
Naeem_K
by New Contributor III
  • 2927 Views
  • 4 replies
  • 1 kudos

Resolved! Data Engineer Associate Certificate and badge not yet received

@Kaniz Fatma​ I have cleared the certification exam on 26th January 2023, but still haven't received the certificate. I had given the exam with a different mail ID but I'm not receiving any emails from Databricks to that mail ID.Kindly help me resolv...

  • 2927 Views
  • 4 replies
  • 1 kudos
Latest Reply
sekhar1
New Contributor II
  • 1 kudos

Hi @Naeem_K I am also facing same issue can you please tell me how to resolve this 

  • 1 kudos
3 More Replies
athos
by New Contributor
  • 478 Views
  • 1 replies
  • 0 kudos

FeatureEngineeringClient and R

Hi! I'm trying to find a way to create a feature table from R and reticulateIs it possible? Currently I'm not been able to make a pyspark dataframe to be passed from R to the create_table() function.The code I'm trying to make it work follows:   inst...

  • 478 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Using the provided CONTEXT, it can be concluded that: Creating Databricks Feature Tables using the create_table() function is well-documented for use with PySpark DataFrames. However, passing a PySpark DataFrame generated in R using sparklyr to the ...

  • 0 kudos
fernandomendi
by New Contributor II
  • 2514 Views
  • 2 replies
  • 0 kudos

Row IDs for DLTs

Hi all,I have a DLT pipeline where I am reading from a cloud source and want to mode data through some tables onto a final Gold layer table. I would like to use SQL to write my DLTs. I would also like to have a row_id for each row to identify each in...

Data Engineering
dlt
identity
row_id
  • 2514 Views
  • 2 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor
  • 0 kudos

Hi @fernandomendi In Delta Live Tables (DLT), if you want to assign a unique identifier to each row, enabling delta.enableRowTracking and selecting _metadata.row_id directly in your SQL query is a valid approach, just be sure to include it explicitly...

  • 0 kudos
1 More Replies
Thanapat_S
by Contributor
  • 29634 Views
  • 9 replies
  • 5 kudos

Resolved! Can I change from default showing first 1,000 to return all records when query?

I have to query a data for showing in my dashboard.But it truncated the results and showing only first 1,000 rows.In the dashboard view, there is no option to re-execute with maximum result limits.I don't want to switch back to standard view and clic...

image image.png
  • 29634 Views
  • 9 replies
  • 5 kudos
Latest Reply
jngnyc
New Contributor II
  • 5 kudos

I found this explanation helpful: 

  • 5 kudos
8 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels