cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

VikasM
by New Contributor
  • 225 Views
  • 13 replies
  • 5 kudos

Resolved! PySpark AnalysisException: Ambiguous reference to field t when parsing nested JSON

I'm working on a personal data engineering project using Kafka, Spark Structured Streaming, and Docker.The application consumes two Kafka topics that originate from an external market-data websocket source:a trade streama candlestick (kline/OHLCV) st...

  • 225 Views
  • 13 replies
  • 5 kudos
Latest Reply
balajij8
Contributor III
  • 5 kudos

Spark Structured Streaming writes to file sinks and generally it uses a phased commit by writing temporary files to the output directory followed by writing metadata with references and a final commit by moving/renaming temp files to final names. You...

  • 5 kudos
12 More Replies
Rupa0503
by New Contributor III
  • 67 Views
  • 2 replies
  • 1 kudos

Implementing Row Level Security using ABAC

I have to implement row level Security to single/multiple tables based on roles and we don't want to create separate copies for users this one how can i implement and what is the code i can use?

  • 67 Views
  • 2 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hi @Rupa0503 , Yes, you can do row-level security across one table or many in Unity Catalog without copying data per role. @balajij8  pointed you in the right architectural direction (ABAC with governed tags, a reusable row-filter function, and centr...

  • 1 kudos
1 More Replies
gaurang033
by New Contributor II
  • 1932 Views
  • 3 replies
  • 2 kudos

how to access snapshots in iceberg tables?

I have created an iceberg tables in databricks, and inserted bunch of values in it. how do I list the snapshot and other metadata of the tables. create table raw.landing.emp_ice(id int, name string ) using icebergfollowing doesn't work https://iceber...

  • 1932 Views
  • 3 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

@gaurang033 , I believe my solution gets you going in the right direction.  Please give it a read and let me know.  Cheers, Louis.

  • 2 kudos
2 More Replies
Félix_banqi
by New Contributor
  • 90 Views
  • 3 replies
  • 0 kudos

Is there a way to deactivate genie auto corretion

Genie keeps breaking my code, sometimes making almost impossible to write code.Sometimes it behaves in a normal way, but sometimes it auto correts at every moment, with non wanted code.There is any way to fix it? I know its a bug, but i also dont kno...

  • 90 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @Félix_banqi, Sorry you are facing this issue. That definitely doesn’t sound like the intended experience. I would like to understand the issue better to give you a better steer. Is there an example you can share? In the meantime, given that you h...

  • 0 kudos
2 More Replies
Dhivyadharshini
by New Contributor II
  • 133 Views
  • 2 replies
  • 1 kudos

Spark UI Troubleshooting: Data Skew vs Cluster Resource Bottlenecks

How can Spark UI metrics be used to distinguish data skew from insufficient cluster resources?When a Databricks job is slow, we usually look at Spark UI metrics such as task duration, shuffle read/write, spilled bytes, GC time, executor CPU utilizati...

  • 133 Views
  • 2 replies
  • 1 kudos
Latest Reply
Vibiksha
New Contributor II
  • 1 kudos

A simple way to troubleshoot a slow Spark job using Spark UI is:Check task durationA few very slow tasks → Likely data skew.Most tasks are slow → Likely cluster resource or execution issue.Check Spark UI metricsLarge differences in shuffle read/task ...

  • 1 kudos
1 More Replies
Pratikmsbsvm
by Contributor
  • 5111 Views
  • 2 replies
  • 1 kudos

Data Migration from SAP S/4HANA to Databricks

May someone please help me designing the Migration of SAP S/4 HANA to Databricks. How to design this. what all we need to consider as LLD.1. How Data needs to be extracted and by which tool ? near–real-time replication is required2. Each layer for Da...

  • 5111 Views
  • 2 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @Pratikmsbsvm, Here is an updated view of the options for moving SAP S/4HANA data into Databricks, including the SAP and Databricks partnership path that is now the recommended low-friction approach. I will cover the integration options first, the...

  • 1 kudos
1 More Replies
DineshOjha
by New Contributor III
  • 90 Views
  • 1 replies
  • 0 kudos

Views in DR environment

Hi Team,We are currently using the Databricks Deep clone feature to clone our tables to Databricks DR environment. When we deploy our jobs, they run in production and the tables get cloned to the DR. But the views dont get cloned as deepclone doesnt ...

  • 90 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @DineshOjha, This is expected behaviour. DEEP CLONE is designed for tables, so it works well for keeping Delta tables in sync to a DR environment. The public docs describe clone as creating a copy of a source table at a specific version, and they ...

  • 0 kudos
emorgoch
by New Contributor III
  • 283 Views
  • 2 replies
  • 2 kudos

Resolved! Managing IPYNB cell timestamps in source control

We're in the process of converting over our Databricks notebooks from .py file to .ipynb. We have disabled storing notebook output in source control at the workspace level.However, what we're discovering is that every cell in our notebooks has 3 time...

emorgoch_0-1781635989625.png
  • 283 Views
  • 2 replies
  • 2 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 2 kudos

Hi @emorgoch, Thanks for raising this. This appears to be a regression rather than expected behaviour. Internally, the issue has been identified around .ipynb handling in Git folders, and the intended fix is to stop serialising these execution timest...

  • 2 kudos
1 More Replies
alejandro_jaram
by New Contributor
  • 189 Views
  • 3 replies
  • 0 kudos

DLT pipelines failing out of memory (serverless)

I have a Data Lake Transformation (DLT) pipeline that runs weekly. Normally, it takes 8 minutes to complete, but since last Friday (June 19), it has been running for hours until it encounters an out-of-memory error. This pipeline is responsible for c...

  • 189 Views
  • 3 replies
  • 0 kudos
Latest Reply
bala_sai
New Contributor
  • 0 kudos

I think this is more like an incremental refresh issue than a generic serverless memory issue.Since the pipeline completes in around 20 minutes with a full refresh, but the normal weekly run runs for hours and then fails with OOM, I would first recom...

  • 0 kudos
2 More Replies
yit337
by Contributor
  • 84 Views
  • 1 replies
  • 0 kudos

How to share history of streaming table with Open Sharing?

Databricks has limitations when using Open sharing that history of Streaming tables cannot be shared.What are the best 'workarounds'? How to provide history to the recipient?

  • 84 Views
  • 1 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

You can follow belowShare Standard Delta Table - Instead of sharing the streaming table directly, share a regular Delta table with history that captures the same data. You can create a standard Delta table from the streaming source and share it WITH ...

  • 0 kudos
YoshikiFujiwara
by New Contributor II
  • 162 Views
  • 1 replies
  • 0 kudos

Unity Catalog External Location with Amazon S3 Access Points,session policy behavior and workarounds

ContextI'm working on integration patterns between enterprise NAS storage (Amazon FSx for NetApp ONTAP) and Databricks via S3 Access Points. S3 Access Points provide S3 API access to file data without copying — a common pattern for organizations with...

  • 162 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hey @YoshikiFujiwara , I took a look and have some meaningful feedback for you. Short version: your diagnosis is right, and what it points to is an unsupported path, not a mistake in your IAM setup. Amazon S3 Access Points are not a supported target ...

  • 0 kudos
yanchr
by New Contributor II
  • 55 Views
  • 1 replies
  • 0 kudos

CHECKPOINT_RDD_BLOCK_ID_NOT_FOUND randomly appears

[CHECKPOINT_RDD_BLOCK_ID_NOT_FOUND] Checkpoint block not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted.After switching from reliable checkpoint() to localCheckpoint() to ...

  • 55 Views
  • 1 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

Its a issue due to the fundamental difference in the handling of data durability by checkpoint and localCheckpoint. Provisioning a larger cluster will not reliably solve this problem as the issue is about executor lifecycle & not capacity.localCheckp...

  • 0 kudos
Labels