cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

brickster_2018
by Esteemed Contributor
  • 1504 Views
  • 1 replies
  • 0 kudos

Resolved! Why do I see data loss with Structured streaming jobs?

I have a Spark structured streaming job reading data from Kafka and loading it to the Delta table. I have some transformations and aggregations on the streaming data before writing to Delta table

  • 1504 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

The typical reason for data loss on a Structured streaming application is having an incorrect value set for watermarking. The watermarking is done to ensure the application does not develop the state for a long period, However, it should be ensured ...

  • 0 kudos
User16826994223
by Honored Contributor III
  • 820 Views
  • 1 replies
  • 1 kudos

Does Databricks have a data processing agreement?

Does Databricks have a data processing agreement?

  • 820 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 1 kudos

Databricks offers a standalone data processing agreement to comply with certain data protection laws that contains our contractual commitments with respect to applicable data protection and privacy law. If your company determines that you require ter...

  • 1 kudos
User16826994223
by Honored Contributor III
  • 939 Views
  • 1 replies
  • 0 kudos

How do we manage data recency in Databricks

I want to know how databricks maintain data recency in databricks

  • 939 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

When using delta tables in databricks, you have the advantage of delta cache which accelerates data reads by creating copies of remote files in nodes’ local storage using a fast intermediate data format. At the beginning of each query delta tables au...

  • 0 kudos
brickster_2018
by Esteemed Contributor
  • 1795 Views
  • 1 replies
  • 0 kudos

Resolved! Z-order or Partitioning? Which is better for Data skipping?

For Delta tables, among Z-order and Partioning which is recommended technique for efficient Data Skipping

  • 1795 Views
  • 1 replies
  • 0 kudos
Latest Reply
brickster_2018
Esteemed Contributor
  • 0 kudos

Partition pruning is the most efficient way to ensure Data skipping. However, choosing the right column for partitioning is very important. It's common to see choosing the wrong column for partitioning can cause a large number of small file problems ...

  • 0 kudos
RonanStokes_DB
by New Contributor III
  • 1093 Views
  • 0 replies
  • 1 kudos

Questions on Bronze / Silver / Gold data set layering

I have a DB-savvy customer who is concerned their silver/gold layer is becoming too expensive.  These layers are heavily denormalized, focused on logical business entities (customers, claims, services, etc), and maintained by MERGEs.  They cannot pre...

  • 1093 Views
  • 0 replies
  • 1 kudos
Labels