cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

M_S
by New Contributor II
  • 1015 Views
  • 2 replies
  • 2 kudos

Dataframe is getting empty during execution of daily job with random pattern

Hello, I have a daily ETL job that adds new records to a table for the previous day. However, from time to time, it does not produce any output.After investigating, I discovered that one table is sometimes loaded as empty during execution. As a resul...

M_S_0-1746605849738.png
  • 1015 Views
  • 2 replies
  • 2 kudos
Latest Reply
M_S
New Contributor II
  • 2 kudos

Thank you very much, @Louis_Frolio , for such a detailed and insightful answer!All tables used in this processing are managed Delta tables loaded through Unity Catalog.I will try running it with spark.databricks.io.cache.enabled set to false just to ...

  • 2 kudos
1 More Replies
5UDO
by New Contributor II
  • 1986 Views
  • 6 replies
  • 4 kudos

Databricks warehouse table optimization

Hi everyone,I just started using the Databricks and wanted to evaluate the reading speeds when using the Databricks warehouse.So I've generated the dataset of 100M records, which contains name, surname, date of birth, phone number and an address. Dat...

  • 1986 Views
  • 6 replies
  • 4 kudos
Latest Reply
5UDO
New Contributor II
  • 4 kudos

Hi Brahmareddy and AndrewN,Thank you on your answers.I first need to apologize as I accidentally wrote wrong that I got 270ms with hashing the date of birth, surname and name and then using the z ordering.I actually achieved around 290ms with hashing...

  • 4 kudos
5 More Replies
jtjohnson
by New Contributor II
  • 1124 Views
  • 4 replies
  • 0 kudos

API Definition File

Hello. We are in the process of setting up Azure APIM to Databricks Rest API(s). Is there an official definition file available for download?Any help would be greatly appreciated

  • 1124 Views
  • 4 replies
  • 0 kudos
Latest Reply
jtjohnson
New Contributor II
  • 0 kudos

Thank you for the feedback. The postman collection would be ideal but the link is a no longer active

  • 0 kudos
3 More Replies
harika5991
by New Contributor II
  • 912 Views
  • 1 replies
  • 0 kudos

Unable to create a metastore for Unity Catalog as I don't have Account Admin rights

Hello guys,I just started learning Databricks. I created a Databricks workspace via the Azure Portal using the Trial (Premium - 14-Days Free DBUs) plan. The workspace name is `easewithdata-adb`.However,I do not currently see the option to create a Un...

  • 912 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @harika5991 You're right about the root cause of your issue. Creating a Unity Catalog metastore requires Account Admin privileges, which is separate from just creating a workspace in Azure.These are options you can try:When you create a Databricks...

  • 0 kudos
Louis_Frolio
by Databricks Employee
  • 4821 Views
  • 4 replies
  • 4 kudos

Resolved! What are your most impactful use cases for schema evolution in Databricks?

  Data Engineers, Share Your Experiences with Delta Lake Schema Evolution! We're calling on all data engineers to share their experiences with the powerful schema evolution feature in Delta Lake. This feature allows for seamless adaptation to changin...

  • 4821 Views
  • 4 replies
  • 4 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 4 kudos

Outstanding!

  • 4 kudos
3 More Replies
flashmav
by New Contributor II
  • 701 Views
  • 1 replies
  • 0 kudos

Resolved! ConcurrentDeleteDeleteException in liquid cluster table

I am doing a merge in a table in parallel via 2 jobs.The table is a liquid clustered table with the following properties:delta.enableChangeDataFeed=truedelta.enableDeletionVectors=truedelta.enableRowTracking=truedelta.feature.changeDataFeed=supported...

  • 701 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hey @flashmav ,  keep in mind that operations in Delta Lake often occur at the file level rather than the row level. For example, if two sessions attempt to update data in the same file (even if they’re not updating the same row), you may encounter a...

  • 0 kudos
SoniSole
by New Contributor II
  • 8970 Views
  • 6 replies
  • 6 kudos

Issue with Docker Image connection

Hello, I have created and pushed a docker image to Azure Container Registry . I used that image to start the cluster in Databricks. But the cluster doesn't start and as such when I try to run a Databricks' Jon using that Cluster, I get this error bel...

image
  • 8970 Views
  • 6 replies
  • 6 kudos
Latest Reply
jeremy98
Honored Contributor
  • 6 kudos

We have the same issue, right now. Which is the problem??

  • 6 kudos
5 More Replies
vgupta
by New Contributor II
  • 9718 Views
  • 6 replies
  • 4 kudos

DLT | Cluster terminated by System-User | INTERNAL_ERROR: Communication lost with driver. Cluster 0312-140502-k9monrjc was not reachable for 120 seconds

Dear Community, Hope you are doing well.For the last couple of days I am seeing very strange issues with my DLT pipeline, So every 60-70 mins it is getting failed in continuous mode, with the ERROR; INTERNAL_ERROR: Communication lost with driver. Clu...

DLT_ERROR DLT_Cluster_events
  • 9718 Views
  • 6 replies
  • 4 kudos
Latest Reply
Rahiman
New Contributor II
  • 4 kudos

We had similar error for one the DLT pipeline, This could be some times because of compute size, we had increased compute size of server in your DLT pipelines, still we were seeing this error while processing very large file. we then added below para...

  • 4 kudos
5 More Replies
aswinvishnu
by New Contributor II
  • 1335 Views
  • 3 replies
  • 1 kudos

Exporting table to GCS bucket using job

Hi all,Usecase: I want to send the result of a query to GCS bucket location in json format.Approach: From my java based application I create a job and that job will be running a notebook`. Notebook will have something like this```query = "SELECT * FR...

  • 1335 Views
  • 3 replies
  • 1 kudos
Latest Reply
LorelaiSpence
New Contributor II
  • 1 kudos

Consider using GCS signed URLs or access tokens for secure access.

  • 1 kudos
2 More Replies
Maverick1
by Valued Contributor II
  • 5496 Views
  • 6 replies
  • 6 kudos

How to infer the online feature store table via an mlflow registered model, which is deployed to a sagemaker endpoint?

Can an mlflow registered model automatically infer the online feature store table, if that model is trained and logged via a databricks feature store table and the table is pushed to an online feature store (like AWS RDS)?

  • 5496 Views
  • 6 replies
  • 6 kudos
Latest Reply
Janifer45
New Contributor II
  • 6 kudos

Thanks for this

  • 6 kudos
5 More Replies
BrianLind
by New Contributor II
  • 785 Views
  • 2 replies
  • 0 kudos

Need access to browse onprem SQL data

 Our BI team has started using Databricks and would like to browse our local (onprem) SQL database servers from within Databricks. I'm not sure if that's even possible.So far, I've set up Databricks Secure Cluster Connectivity (SCC), created a privat...

  • 785 Views
  • 2 replies
  • 0 kudos
Latest Reply
Renu_
Valued Contributor II
  • 0 kudos

Hi, based on what you’ve shared, it seems you’ve already completed many of the necessary steps. Just a few things to double-check as you move forward:SQL Warehouses used for BI tools need to run in Pro mode, not serverless, since only Pro or Classic ...

  • 0 kudos
1 More Replies
muano_makhokha
by New Contributor II
  • 988 Views
  • 1 replies
  • 1 kudos

Resolved! Row filtering and Column masking not working even when requirements the are met

I have been trying to use the Row filtering and Column masking feature to redacted columns and and filter rows based on the group a user is in.I have all the necessary permissions and I've used cluster's with version 15.4 and higher.When I run the fo...

  • 988 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Here are some things to consider/try:   The UnityCatalogServiceException error you are encountering, ABORTED.UC_DBR_TRUST_VERSION_TOO_OLD, generally indicates that the Databricks Runtime (DBR) version you are using no longer supports the operation, s...

  • 1 kudos
meret
by New Contributor II
  • 803 Views
  • 1 replies
  • 0 kudos

Column Default Propagation

Hi Today I found I somewhat strange behavior when it comes to default values in columns. Apparently, column defaults are propagated to a new table, when you select the column without any operation on it. This is a bit unexpected for me. Here a short...

  • 803 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

The behavior you described regarding the propagation of default column values is expected and is tied to the specific usage of the delta.feature.allowColumnDefaults table property in Delta Lake. Here’s an explanation: Default Propagation Without Tra...

  • 0 kudos
Dinesh6351
by New Contributor II
  • 740 Views
  • 2 replies
  • 3 kudos
  • 740 Views
  • 2 replies
  • 3 kudos
Latest Reply
amos
New Contributor III
  • 3 kudos

Esse erro ocorre quando sua conta Azure excede a cota regional de núcleos disponíveis, impedindo a criação do cluster no Databricks. Isso significa que o cluster tentou utilizar mais recursos do que o permitido na sua região.1-Revise a configuração d...

  • 3 kudos
1 More Replies
jonhieb
by New Contributor III
  • 2932 Views
  • 6 replies
  • 3 kudos

Resolved! [Databricks Asset Bundles] Triggering Delta Live Tables

I would like to know how to schedule a DLT pipeline using DAB's.I'm trying to trigger a Delta Live Table pipeline using Databricks Asset Bundles. Below is my YAML code:resources:  pipelines:    data_quality_pipelines:      name: data_quality_pipeline...

  • 2932 Views
  • 6 replies
  • 3 kudos
Latest Reply
Walter_C
Databricks Employee
  • 3 kudos

As of now, Databricks Asset Bundles do not support direct scheduling of DLT pipelines using cron expressions within the bundle configuration. Instead, you can achieve scheduling by creating a Databricks job that triggers the DLT pipeline and then sch...

  • 3 kudos
5 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels