cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AanchalSoni
by New Contributor II
  • 394 Views
  • 4 replies
  • 0 kudos

Azure not getting listed in create external location

Hi,I'm trying to create a pipeline using Azure, however, Azure is not getting listed in the drop down of Catalog Explorer -> Create External Location. I'm using community version for practice. Please advice.

  • 394 Views
  • 4 replies
  • 0 kudos
Latest Reply
nayan_wylde
Honored Contributor II
  • 0 kudos

@AanchalSoni Yes Databricks free edition have this limitation you cannot create customized external location It has only support for s3 now and that location is managed by databricks. Look at the limitations below. 

  • 0 kudos
3 More Replies
Dharinip
by Contributor
  • 1039 Views
  • 2 replies
  • 0 kudos

Materialized Views Incremental Load

My question is: can materialized views be updated incrementally. For example:In my case, we store the data in Iron layer and it gets flattened in bronze and silver layer as separate tables. The required transformations happen from silver to gold laye...

  • 1039 Views
  • 2 replies
  • 0 kudos
Latest Reply
guptaharsh
New Contributor III
  • 0 kudos

So, I am using DLT declarative framework to work with MV in the gold layer. I am sharing code for the sample. So, can someone tell me that how we can do only the incremental refresh, as this code is doing a full refresh daily. I don't to want do agai...

  • 0 kudos
1 More Replies
SharathE
by New Contributor III
  • 2071 Views
  • 4 replies
  • 1 kudos

Incremental refresh of materialized view in serverless DLT

Hello, Every time that I run a delta live table materialized view in serverless , I get a log of "COMPLETE RECOMPUTE" . How can I achieve incremental refresh in serverless in DLT pipelines?

  • 2071 Views
  • 4 replies
  • 1 kudos
Latest Reply
guptaharsh
New Contributor III
  • 1 kudos

So, I am using DLT declarative framework to work with MV in the gold layer. I am sharing code for the sample. So, can someone tell me that how we can do only the incremental refresh, as this code is doing a full refresh daily. I don't to want do agai...

  • 1 kudos
3 More Replies
mosayed
by New Contributor II
  • 476 Views
  • 4 replies
  • 6 kudos

Resolved! Databricks clusters unresponsive

Hello everyone,we are experiencing issues on one of our Databricks workspaces:Notebooks and SQL queries are executing, but results are not returned to the UI.On the screenshots you can see examples where cells in notebooks and queries in a SQL wareho...

  • 476 Views
  • 4 replies
  • 6 kudos
Latest Reply
mosayed
New Contributor II
  • 6 kudos

Thanks a lot for the quick replies!It seems the issue was related to a faulty iPython version (or something similar inside the workspace). The problem resolved itself later the same day in the evening, and everything is working normally again now.

  • 6 kudos
3 More Replies
jv_v
by Contributor
  • 3374 Views
  • 11 replies
  • 2 kudos

Resolved! Issue with Installing Remorph Reconcile Tool and Compatibility Clarification

I am currently working on a table migration project from a source Hive Metastore workspace to a target Unity Catalog workspace. After migrating the tables, I intend to write table validation scripts using the Remorph Reconcile tool. However, I am enc...

  • 3374 Views
  • 11 replies
  • 2 kudos
Latest Reply
Kvant
New Contributor II
  • 2 kudos

 I would just like to mention that it might not be due to remorph or your python version that you encounter this error. I got a similar error message when trying to apply changes to the metastore grants through terraform.It worked when I authenticate...

  • 2 kudos
10 More Replies
ToNiOZ45
by New Contributor II
  • 327 Views
  • 1 replies
  • 2 kudos

Resolved! New Scroll bar appears in cells with more than 300 lines

Hi,In a Databricks notebook, I noticed a new behaviour which I'd like to deactivate.Essentially, when a cell reaches 300 lines or more, a new scroll bar within the cell appears. I'd rather have the cell displayed in full and keep using the page scrol...

Databricks_Dual_Scrollbar.jpg
  • 327 Views
  • 1 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @ToNiOZ45 ,Unfortunately, there's no option to disable it in workspace settings. You can create ticket in Databricks Ideas and suggest them to add that option if you find this scroll bar frustrating:Feedback | Databricks

  • 2 kudos
noorbasha534
by Valued Contributor II
  • 255 Views
  • 4 replies
  • 0 kudos

Figure out stale tables/folders being loaded by auto-loader

Hello allWe have a pipeline which uses auto-loader to load data from cloud object storage (ADLS) to a delta table. We use directory listing at the moment. And there exist around 20000 folders to be verified in ADLS every 30 mins to check for new data...

  • 255 Views
  • 4 replies
  • 0 kudos
Latest Reply
noorbasha534
Valued Contributor II
  • 0 kudos

@szymon_dybczak  ah sorry, let me rephrase. I tried the command initially on the delta table directly. That resulted the error. Then I tried on the check point. It did give me results though discovered on null for all the rows. Still, this does not s...

  • 0 kudos
3 More Replies
Ramana
by Valued Contributor
  • 329 Views
  • 5 replies
  • 4 kudos

Serverless Compute - pySpark - Any alternative for rdd.getNumPartitions()

Hello Community,We have been trying to migrate our jobs from Classic Compute to Serverless Compute. As part of this process, we face several challenges, and this is one of them.When we read CSV or JSON files with multiLine=true, the load becomes sing...

  • 329 Views
  • 5 replies
  • 4 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 4 kudos

Hi @Ramana ,Yep, RDD API is not supported on ServelessAs a workaround you can obtain number of partitions in following way - using spark_partiton_id and then counting distinct occurance of each idfrom pyspark.sql.functions import spark_partition_id,...

  • 4 kudos
4 More Replies
LJacobsen
by New Contributor III
  • 648 Views
  • 1 replies
  • 0 kudos

Deploy asset bundle without recreating Lakeflow SQL gateway and DLT Pipeline

Hello all,I have a pre-existing Databricks Asset Bundle that deploys a workflow.I am starting to investigate Lakeflow Connect using a SQL Server connector. In my existing Databricks Asset Bundle, I added a pipeline YAML file that defines a gateway an...

  • 648 Views
  • 1 replies
  • 0 kudos
Latest Reply
thomas-totter
New Contributor III
  • 0 kudos

@LJacobsen If i understand you correctly, you have manually created objects (in your case a DLT pipeline) that you now want to manage in your asset bundle. If that's the case "bundle deployment bind" is your friend:https://learn.microsoft.com/en-us/a...

  • 0 kudos
Ramana
by Valued Contributor
  • 206 Views
  • 2 replies
  • 0 kudos

Serverless Compute - Python - Custom Emails via SMTP (smtplib.SMTP(host_name)) - Any alternative?

Hello Community,We have been trying to migrate our jobs from Classic Compute to Serverless Compute. As part of this process, we face several challenges, and this is one of them.We have several scenarios where we need to send an inline email via Pytho...

  • 206 Views
  • 2 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Ramana ,What error do you get in serverless? Could you provide error message?

  • 0 kudos
1 More Replies
minhhung0507
by Valued Contributor
  • 1567 Views
  • 2 replies
  • 0 kudos

Executors Getting FORCE_KILL After Migration to GCE – Resource Scaling Not Helping

Hi everyone,We're facing a persistent issue with our production streaming pipelines where executors are being forcefully killed with the following error:Executor got terminated abnormally due to FORCE_KILL Screenshot for reference:Context:Our pipelin...

minhhung0507_1-1750841419951.png minhhung0507_2-1750841451453.png
  • 1567 Views
  • 2 replies
  • 0 kudos
Latest Reply
thomas-totter
New Contributor III
  • 0 kudos

@minhhung0507 wrote:We're facing a persistent issue with our production streaming pipelines where executors are being forcefully killed with the following error:Executor got terminated abnormally due to FORCE_KILLI solved the issue in our case and al...

  • 0 kudos
1 More Replies
thomas-totter
by New Contributor III
  • 394 Views
  • 4 replies
  • 4 kudos

NativeADLGen2RequestComparisonHandler: Error in request comparison (when running DLT)

Since at least two weeks (but probably even longer) our DLT pipeline posts error messages to log4j (driver logs) like the one below. I tried with both channels (preview, current), switched between serverless and classic compute and started the pipeli...

  • 394 Views
  • 4 replies
  • 4 kudos
Latest Reply
thomas-totter
New Contributor III
  • 4 kudos

I also tried below setting below (spark.conf), but that didn't help either:spark.sql.legacy.timeParserPolicy: LEGACYLEGACY_TIME_PARSER_POLICY - Azure Databricks - Databricks SQL | Microsoft Learn

  • 4 kudos
3 More Replies
ajaysh
by New Contributor
  • 134 Views
  • 1 replies
  • 1 kudos

connect RabbitMQ hosted on AWS EC2 instance

We are facing an issue. We created a cluster "dev_cluster_catalog_support" . When we are using compute policy "unrestricted" we are able to connect RabbitMQ hosted on AWS EC2 instance, but when we change the compute policy "Shared Compute" getting er...

  • 134 Views
  • 1 replies
  • 1 kudos
Latest Reply
BigRoux
Databricks Employee
  • 1 kudos

RabbitMQ, it brings me back to my IoT days With that said, here is some helpful guidance: The issue occurs because Shared Compute cluster policies enforce stricter network access controls compared to Unrestricted policies. When using the Unrestricte...

  • 1 kudos
chirag_nagar
by New Contributor
  • 940 Views
  • 3 replies
  • 1 kudos

Seeking Guidance on Migrating Informatica PowerCenter Workflows to Databricks using Lakebridge

Hi everyone,I hope you're doing well.I'm currently exploring options to migrate a significant number of Informatica PowerCenter workflows and mappings to Databricks. During my research, I came across Lakebridge, especially its integration with BladeB...

  • 940 Views
  • 3 replies
  • 1 kudos
Latest Reply
thelogicplus
Contributor
  • 1 kudos

@chirag_nagar You should definitely check out Travinto Tools—it's one of the most reliable and effective solutions for migrating Informatica (PowerCenter/IDMC) workloads to Databricks. I've been using it for the past three years during my time at Del...

  • 1 kudos
2 More Replies
Vamsi_S
by New Contributor II
  • 351 Views
  • 3 replies
  • 0 kudos

Ingest data from SQL Server

I've been working on data ingestion from SQL Server to UC using lakeflow connect. Lakeflow connect actually made the work easier when everything is right. I am trying to incorporate this with DAB and this would work fine with schema and table tags fo...

  • 351 Views
  • 3 replies
  • 0 kudos
Latest Reply
Vamsi_S
New Contributor II
  • 0 kudos

I’ve been using the notebook style as there are some transformations that need to be done on destination_table names as the source_table names has special characters. My requirement is: for example, I have a schema with 5 tables, 10 individual tables...

  • 0 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels