cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AnonymousK
by New Contributor II
  • 252 Views
  • 2 replies
  • 2 kudos

Why do you want to migrate from azure synapse analytics or Azure data factory to databricks

It's a simple answer bro. According to our analysis Azure pipelines and not books match process approximately 40% faster than the snaps analytics. If we really want to optimise your pipelines and perform cost optimisations in your team please migrate...

  • 252 Views
  • 2 replies
  • 2 kudos
Latest Reply
amirabedhiafi
New Contributor III
  • 2 kudos

Hello !The only thing I can add here is migrating from ASA or ADF to dbks can make sense when your workloads need more scalable spark based processing, batch and streaming support, advanced transformation logic, lakehouse architecture or ML and AI ca...

  • 2 kudos
1 More Replies
mjedy78
by New Contributor II
  • 2527 Views
  • 5 replies
  • 1 kudos

Transition from partitioned table to Liquid clustered table

Hi all,I have a table called classes, which is already partitioned on three different columns. I want to create a Liquid Clustered Table, but as far as I understand from the documentation—and from Dany Lee and his team—it was not possible as of 2024 ...

  • 2527 Views
  • 5 replies
  • 1 kudos
Latest Reply
biancaorita
New Contributor II
  • 1 kudos

Is there a plan to implement a way to migrate to liquid clustering for an existing table that has traditional partitioning and that is quite large (over 4 TB)? Re-creating such tables from scratch is not always ideal.

  • 1 kudos
4 More Replies
SahilRana3097
by New Contributor
  • 297 Views
  • 1 replies
  • 0 kudos

Resolved! Databricks not able to create cluster with Amazon free trial version

Error : Cannot launch the cluster because the user specified an invalid argument.Instance ID: failed-2d901c0f-d88d-499a-aInternal error message: The VM launch request to AWS failed, please check your configuration. [details] InvalidParameterCombinati...

  • 297 Views
  • 1 replies
  • 0 kudos
Latest Reply
DivyaandData
Databricks Employee
  • 0 kudos

The error is coming from AWS, not Databricks: your AWS account is restricted to Free Tier–eligible instance types, but the node type you picked in Databricks maps to an EC2 instance that is not Free Tier–eligible, so AWS rejects the launch request wi...

  • 0 kudos
AdrianLobacz
by Databricks Partner
  • 149 Views
  • 1 replies
  • 0 kudos

FileNotFoundError: [Errno 2] No such file or directory: '../00_configuration/prd/main_configuration.

Maybe someone has encountered this problem before?I’m running parallel loading for 10 objects using pool.map. Nine of them complete successfully, but one fails when trying to read a configuration file. The problem occurs occasionally and doesn’t foll...

  • 149 Views
  • 1 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

@AdrianLobacz You can read the configuration once and pass the object into your function instead of reading the same file multiple times. It eliminates the IO overhead and avoids hitting the FUSE layer. When the code triggers parallel processes, they...

  • 0 kudos
seefoods
by Valued Contributor
  • 283 Views
  • 1 replies
  • 0 kudos

Resolved! databricks autoloader source files

Hello, How can handle this error when we use autoloader with spark.readStream (com.databricks.sql.cloudfiles.errors.CloudFilesException) [CF_EMPTY_DIR_FOR_SCHEMA_INFERENCE] Cannot infer schema when the input path `/Volumes/default/landing/source/bund...

  • 283 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @seefoods, The error message seems to indicate there are no files in the source path? You can either define the schema yourself and pass it to schema(...) so Auto Loader doesn’t need to infer anything.. and as soon as files arrive, the stream will...

  • 0 kudos
IM_01
by Contributor III
  • 1041 Views
  • 8 replies
  • 0 kudos
  • 1041 Views
  • 8 replies
  • 0 kudos
Latest Reply
IM_01
Contributor III
  • 0 kudos

Hi @Ashwin_DSA Thanks for the response â€Œâ€ŒI was thinking if the results are precomputed using cube and persisted using mv that would be retrieve results faster than metric view. Could u please let me know if my understanding is correct

  • 0 kudos
7 More Replies
Diehl
by New Contributor III
  • 439 Views
  • 1 replies
  • 1 kudos

Resolved! Auto Loader with ignoreMissingFiles and useManagedFileEvents fails on Classic Compute

Hi everyone,I am seeing an unexpected behavior with Auto Loader when using Managed File Events on Classic Compute.The error message itself seems inconsistent with the behavior I am seeing:[FAILED_READ_FILE.DBR_FILE_NOT_EXIST] Error while reading file...

  • 439 Views
  • 1 replies
  • 1 kudos
Latest Reply
Diehl
New Contributor III
  • 1 kudos

Just sharing a solution in case anyone runs into the same issue.The error was caused by the cluster configuration including spark.master: "local[*]". After removing this setting, the error stopped occurring and the Auto Loader finished correctly.This...

  • 1 kudos
vg33
by New Contributor
  • 295 Views
  • 1 replies
  • 0 kudos

Resolved! Network Configuration

I have a Databricks workspace on AWS (serverless compute). I created a network policy with "Allow access to all destinations" enabled and attached it to my workspace. When I run a Python notebook and try to make an HTTP request or curl to any externa...

  • 295 Views
  • 1 replies
  • 0 kudos
Latest Reply
Lu_Wang_ENB_DBX
Databricks Employee
  • 0 kudos

Most likely the egress policy change hasn’t actually taken effect on the serverless compute that’s running your notebook. Check these things in order: Verify the network policy itself (Account Console → Security → Networking → Context-based ingress ...

  • 0 kudos
mdee
by Databricks Partner
  • 334 Views
  • 2 replies
  • 1 kudos

Resolved! LDP Materialized View Incremental Refreshes - Changeset Size Thresholds

Is there any documentation available around the changeset size thresholds for materialized view incremental refreshes?  Are these configurable at all?  Are they constant or do the thresholds change depending on the number of rows/size of the material...

  • 334 Views
  • 2 replies
  • 1 kudos
Latest Reply
emma_s
Databricks Employee
  • 1 kudos

Hi, On top of Pradeep's reply, which I'd recommend trying, I'd also suggest you raise a support ticket for this. They will potentially be able to tweak the settings in the backend (not guaranteed), but it may help. Thanks,Emma

  • 1 kudos
1 More Replies
MyProfile
by New Contributor
  • 338 Views
  • 1 replies
  • 0 kudos

Disable Public Network Access on Databricks Managed Storage Account - Deny Assignment

Issue Description:I am attempting to disable public network access on the Azure Databricks managed storage account. However, I am encountering the following error:Failed to save resource settings — access is denied due to a deny assignment created by...

  • 338 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sumit_7
Honored Contributor III
  • 0 kudos

@MyProfile This would be helpful, check once - https://learn.microsoft.com/en-us/answers/questions/1707749/managed-storage-accounts-compliance

  • 0 kudos
Raghu_Bindingan
by New Contributor III
  • 5754 Views
  • 5 replies
  • 2 kudos

Truncate delta live table and try to repopulate it in the pipeline

Has anyone attempted to truncate a delta live gold level table that gets populated via a pipeline and then tried to repopulate it by starting the pipeline. I have this situation wherein i need to reprocess all data in my gold table, so i stopped the ...

  • 5754 Views
  • 5 replies
  • 2 kudos
Latest Reply
sanjivsingh
New Contributor II
  • 2 kudos

My Blog on thishttps://medium.com/@singh.sanjiv/truncate-and-load-streaming-live-table-8f840eb424d1

  • 2 kudos
4 More Replies
leopold_cudzik
by New Contributor II
  • 336 Views
  • 1 replies
  • 0 kudos

Resolved! Lakehouse sync tables over rolling history

Hi,we're exploring replacing one of the use cases we are running in our clour provider with a Databricks pipelines. We currently have explored possibility to subscribe to an eventhub using SDP pipelines, feedding our iot data into a Delta table where...

  • 336 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @leopold_cudzik, The pattern you are suggesting is feasible, but it’s much easier to manage if you separate history ingestion from the 7-day serving view instead of cleaning the streaming sink table in place. A common architecture on Databricks wo...

  • 0 kudos
kevinleindecker
by New Contributor III
  • 603 Views
  • 6 replies
  • 1 kudos

SQL Warehouse error: "Cannot read properties of undefined (reading 'data')" when querying system tab

Queries that previously worked started failing in SQL Warehouse (Dashboards) without any changes on our side.The query succeeds, but fails to render results with error:"Cannot read properties of undefined (reading 'data')"This happens with:- system.b...

  • 603 Views
  • 6 replies
  • 1 kudos
Latest Reply
Esgario
New Contributor II
  • 1 kudos

Same problem here. I have previously reported this issue, and it had been resolved at the time. However, the problem has now reoccurred.When ingesting large tables (over 100k rows), the system is unable to properly render the data, preventing the tab...

  • 1 kudos
5 More Replies
PNC
by Databricks Partner
  • 542 Views
  • 4 replies
  • 0 kudos

Resolved! Materialized view creation fails

Hi,I have ran into a problem when creating materialized view.Here's my simple query I'm trying to run:%sql create or replace materialized view catalog.schema.mView_test as select * from catalog.schema.table limit 10;I'm getting following error:Encoun...

  • 542 Views
  • 4 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

There are multiple requirements for materialized views. You can check belowYou must use a Unity Catalog enabled pro or serverless SQL warehouse.To incrementally refresh a materialized view from Delta tables, the source tables must have row tracking e...

  • 0 kudos
3 More Replies
yit
by Databricks Partner
  • 1411 Views
  • 3 replies
  • 1 kudos

How to implement MERGE operations in Lakeflow Declarative Pipelines

Hey everyone,We’ve been using Autoloader extensively for a while, and now we’re looking to transition to full Lakeflow Declarative Pipelines. From what I’ve researched, the reader part seems straightforward and clear.For the writer, I understand that...

  • 1411 Views
  • 3 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 1 kudos

Use APPLY CHANGES INTO (SQL) or dlt.apply_changes() (Python). This is the declarative replacement for foreachBatch MERGE logic in pipelines import dlt from pyspark.sql.functions import col @dlt.table(name="bronze_events") def bronze_events(): re...

  • 1 kudos
2 More Replies
Labels