cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

nakaxa
by New Contributor
  • 48366 Views
  • 5 replies
  • 1 kudos

Fastest way to write a Spark Dataframe to a delta table

I read a huge array with several columns into memory, then I convert it into a spark dataframe,  when I want to write to a delta table it using the following command it takes forever (I have a driver with large memory and 32 workers) : df_exp.write.m...

  • 48366 Views
  • 5 replies
  • 1 kudos
Latest Reply
ShawnRR
New Contributor II
  • 1 kudos

Out of interest, Did you try seeing what happens if you break the steps down into something like...df.write() .format("parquet") .mode(SaveMode.Overwrite) .save(parquetPath);Followed by....spark.sql("CREATE TABLE my_delta_table USING DELTA LOCATION '...

  • 1 kudos
4 More Replies
JUMAN4422
by Databricks Partner
  • 303 Views
  • 1 replies
  • 0 kudos

Resolved! ABAC Policies Not Working on Metric Views

I wanted to check if ABAC (Attribute-Based Access Control) policies can be applied to metric views in Databricks.I have successfully applied ABAC policies on a fact table, and they are working as expected. However, when I query a metric view that use...

  • 303 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @JUMAN4422 ,Yes, this is a limitation. You cannot apply ABAC policies directly to views. Since metric views are a special type of view (CREATE VIEW ... WITH METRICS), so this limitation applies to them as well.ABAC requirements, quotas, and limita...

  • 0 kudos
kcyugesh
by New Contributor II
  • 396 Views
  • 2 replies
  • 0 kudos

Unity Catalog storage credential fails although same Access Connector works in another credential

  In Azure Databricks Unity Catalog, I have two storage credentials that use the same connector_id / Azure Databricks Access Connector.One credential works and can access ADLS Gen2 successfully, but the other fails with: Failed to access cloud storag...

  • 396 Views
  • 2 replies
  • 0 kudos
Latest Reply
zoe_unifeye
Databricks Partner
  • 0 kudos

Hi @kcyugesh How are you getting on so far?It might also be worth checking the privileges associated with each credential to see if they differ.And secondly check the credential type on the credential, as a manaded identity in comparison to a service...

  • 0 kudos
1 More Replies
HariharaSam
by Databricks Partner
  • 28658 Views
  • 5 replies
  • 2 kudos

Parallel Processing of Databricks Notebook

I have a scenario where I need to run same databricks notebook multiple times in parallel.What is the best approach to do this ?

  • 28658 Views
  • 5 replies
  • 2 kudos
Latest Reply
Akshay_Petkar
Valued Contributor
  • 2 kudos

Hi,You can use Databricks Jobs to run the same notebook multiple times in parallel. For this, you can create a Databricks Job for each activity, which allows you to execute the notebook concurrently with different parameters as needed.You can refer t...

  • 2 kudos
4 More Replies
faruko
by New Contributor III
  • 978 Views
  • 5 replies
  • 5 kudos

Resolved! Best practices for initial large-scale ingestion from on‑premises Oracle to Databricks

Hello everyone,I am responsible for designing and implementing a Lakehouse architecture in an industrial company.I am currently facing some challenges regarding the initial ingestion of data from our on‑premise Oracle database into Databricks.The dat...

  • 978 Views
  • 5 replies
  • 5 kudos
Latest Reply
amirabedhiafi
Contributor
  • 5 kudos

Hi @faruko  !My idea is to treat the initial load as a controlled batch backfill then start the CDC pipeline afterwards from a clear cutoff point.You define a fixed cutoff timestamp or Oracle SCN for the initial snapshot and later load history in sma...

  • 5 kudos
4 More Replies
rohit8491
by New Contributor III
  • 11377 Views
  • 4 replies
  • 8 kudos

Azure Databricks Connectivity with Power BI Cloud - Firewall Whitelisting

Hi Support TeamWe want to connect to tables in Azure Databricks via Power BI. We are able to connect this via Power BI Desktop but when we try to Publish the same, we can see the dataset associated does not refresh and throws error from Powerbi.comIt...

  • 11377 Views
  • 4 replies
  • 8 kudos
Latest Reply
LokeshChikuru
Databricks Partner
  • 8 kudos

WHAT IS THE FIX FOR THIS ? IS THIS RESOLVED FOR YOU ?

  • 8 kudos
3 More Replies
cvh
by New Contributor III
  • 957 Views
  • 8 replies
  • 4 kudos

Resolved! Does Lakeflow Connect Have Any Change Tracking Diagnostics?

We have set up Change Tracking on multiple SQL Servers for Lakeflow Connect successfully in the past, but lately we are having lots of problems with a couple of servers. The latest utility script has been run and both lakeflowSetupChangeTracking and ...

  • 957 Views
  • 8 replies
  • 4 kudos
Latest Reply
cvh
New Contributor III
  • 4 kudos

Problem resolved!Databricks Solutions Architect Casey Orr suggested that having 2 different versions of the ddl audit table and trigger in place was the issue - and he was right.As we have been using Lakeflow Connect for more than six months we have ...

  • 4 kudos
7 More Replies
ismaelhenzel
by Valued Contributor
  • 705 Views
  • 1 replies
  • 2 kudos

Resolved! Is there a way to natively mount external Iceberg REST Catalogs (e.g., BigLake) in Unity Catalog?

Hi everyone,I have been reviewing the documentation on integrating external Iceberg tables with Databricks. Currently, the only method I have found to read from an Iceberg REST catalog (specifically GCP BigLake in my case) is by explicitly passing th...

  • 705 Views
  • 1 replies
  • 2 kudos
Latest Reply
amirabedhiafi
Contributor
  • 2 kudos

Hello @ismaelhenzel AFAIK, there is no documented native UC foreign catalog integration for a generic Iceberg REST catalog such as BigLake REST Catalog today.DBKS does support Iceberg in UC including UC managed and foreign Iceberg tables but the docu...

  • 2 kudos
jar
by Contributor
  • 2358 Views
  • 9 replies
  • 1 kudos

Databricks single user compute cannot write to storage

I've deployed unrestricted single user compute for each developer in our dev workspace and everything works fine except for writing to storage where the cell will continuously run but seemingly not execute anything. If I switch to an unrestricted sha...

  • 2358 Views
  • 9 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Adding to @saurabh18cs comments, also check if any instance profile attached to the cluster. What is the difference between the clusters, only access mode?

  • 1 kudos
8 More Replies
shan-databricks
by Databricks Partner
  • 573 Views
  • 2 replies
  • 3 kudos

Lakeflow Connect: Data Ingestion from SQL Server to Databricks

We have a use case to ingest data from SQL Server into Databricks using Lakeflow Connect. There are 100 tables, and on a daily basis we will perform inserts, updates, and deletes based on CDC data. For this requirement, how can we enable multiple par...

  • 573 Views
  • 2 replies
  • 3 kudos
Latest Reply
amirabedhiafi
Contributor
  • 3 kudos

Hello @shan-databricks  !One additional point, I would also validate the expected load with the SQL Server DBA because even if Lakeflow manages the parallelism internally the source SQL Server still needs to handle those concurrent reads. For 100 tab...

  • 3 kudos
1 More Replies
Darshan137
by New Contributor II
  • 667 Views
  • 2 replies
  • 1 kudos

Transitioning from ADF to Databricks Workflows: Best Practices in a Multi-Workspace (dev-prod)

Hi Community,We have a data processing framework running on Azure Databricks with Unity Catalog, and we're evaluating options to consolidate our orchestration entirely within the Databricks ecosystem.CURRENT ARCHITECTURE:~20 use cases, each containin...

  • 667 Views
  • 2 replies
  • 1 kudos
Latest Reply
amirabedhiafi
Contributor
  • 1 kudos

Hello @Darshan137  !Few things I will add to @Lu_Wang_ENB_DBX  answer that I did on a similar project.If ADF currently passes values such as environment, run date, catalog, schema, or business domain, define a clear parameter contract in Lakeflow Job...

  • 1 kudos
1 More Replies
wschoi
by New Contributor III
  • 21455 Views
  • 17 replies
  • 17 kudos

How to fix plots and image color rendering on Notebooks?

I am currently running dark mode for my Databricks Notebooks, and am using the "new UI" released a few days ago (May 2023) and the "New notebook editor."Currently all plots (like matplotlib) are showing wrong colors. For example, denoting:```... p...

  • 21455 Views
  • 17 replies
  • 17 kudos
Latest Reply
griffen_kociela
New Contributor II
  • 17 kudos

Still a problem when using Plotly visualizations.

  • 17 kudos
16 More Replies
MikeGo
by Valued Contributor
  • 1419 Views
  • 7 replies
  • 2 kudos

Table update trigger and File Arrival trigger latency

Hi team,When using table update or file arrival trigger, what latency I can expect for the trigger. Does Databricks poll the source by some schedule? If yes, whether the poll is free?Thanks

  • 1419 Views
  • 7 replies
  • 2 kudos
Latest Reply
MikeGo
Valued Contributor
  • 2 kudos

Hi @Ashwin_DSA ,Appreciate for the further clarification. Let's make this even clearer. "the trigger hands your job a parameter payload with the updated table list and the most recent commit version"This is a good thing but likely it cannot be used, ...

  • 2 kudos
6 More Replies
Avinash_Narala
by Databricks Partner
  • 743 Views
  • 2 replies
  • 2 kudos

Resolved! Data Loss in Incremental Batch Jobs Due to Latency in delta file write to blob

Hi everyone,I am facing a data consistency issue in my Databricks incremental pipeline where records are being skipped because of a time gap between when a record is processed and when the physical file is finalized in Azure Blob Storage (ABFS).Our A...

  • 743 Views
  • 2 replies
  • 2 kudos
Latest Reply
balajij8
Contributor III
  • 2 kudos

You can handle it as belowFix the Bronze Write - The 20+ minutes commit gap suggests metadata contention or "Small File Issues" in the bronze delta tables. You can optimize tables manually or enable Optimized Write and Auto Optimize if feasible. This...

  • 2 kudos
1 More Replies
AdrianLobacz
by Databricks Partner
  • 632 Views
  • 1 replies
  • 0 kudos

Best option for parallel processing

I faced some challenges in my projects related to parallel processing in Databricks. In many cases, the issue was not the volume of data itself, but the overall execution time. I was processing a relatively small number of objects, but each object re...

  • 632 Views
  • 1 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

The Driver was the bottleneck in the Thread Pool approach. By moving to Serverless Workflows, you can shift the orchestration weight to the Databricks Control Plane.Eliminate Driver Saturation: Serverless compute for Workflows natively handles task d...

  • 0 kudos
Labels