cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Anonymous
by Not applicable
  • 804 Views
  • 1 replies
  • 0 kudos

Delta - open source?

Delta is open source but certain features such as OPTIMIZE, ZORDER are only available on managed DBR. So how open sourced is it really?

  • 804 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Honored Contributor III
  • 0 kudos

Some of the feature is exclusively added by datbricks on top of delta not by comunity so comapny has right whether it wants to open source or not

  • 0 kudos
User16789201666
by Databricks Employee
  • 1227 Views
  • 1 replies
  • 0 kudos
  • 1227 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16789201666
Databricks Employee
  • 0 kudos

There isn’t a problem purging old data. When using auto loader it’ll take into account new data being added.

  • 0 kudos
User16789201666
by Databricks Employee
  • 1243 Views
  • 1 replies
  • 2 kudos

What is the best practice for generating jobs in an automated fashion?

What is the best practice for generating jobs in an automated fashion?

  • 1243 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16789201666
Databricks Employee
  • 2 kudos

There are several approaches here. You can write an automation script that programmatically accesses Databricks API’s to generate configured jobs. You can also utilize the Databricks Terraform provider. The benefit of the latter approach is that Terr...

  • 2 kudos
sajith_appukutt
by Honored Contributor II
  • 687 Views
  • 1 replies
  • 0 kudos

How can I reduce the risk of data exfiltration while using Databricks

How can I reduce the risk of data exfiltration while using Databricks

  • 687 Views
  • 1 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

Databricks enterprise security and admin features allow customers to deploy Databricks using their own managed VPC/ VNET. This enables them to have greater flexibility and control over the configuration of their deployment architectureFor Azure follo...

  • 0 kudos
Anonymous
by Not applicable
  • 1110 Views
  • 0 replies
  • 0 kudos

Newline characters mess up the table records

When creating tables from text files containing newline characters in the middle of the lines, the table records will null column values because the newline characters in the middle of the lines break the lines into two different records and fill up ...

  • 1110 Views
  • 0 replies
  • 0 kudos
jose_gonzalez
by Databricks Employee
  • 1361 Views
  • 1 replies
  • 0 kudos

how often should I vacuum my Delta table?

I would like to know how often do I need to vacuum my delta table to clean old files?

  • 1361 Views
  • 1 replies
  • 0 kudos
Latest Reply
RonanStokes_DB
Databricks Employee
  • 0 kudos

The requirements for Vacuum will depend on your application needs and the rate of arrival of new data. Vacuuming removes old versions of data.If you need to be able to query earlier versions of data many months after the original ingest time, then i...

  • 0 kudos
jose_gonzalez
by Databricks Employee
  • 2087 Views
  • 2 replies
  • 0 kudos

how to partition my Delta table?

I would like to follow best practices to partition my Delta table. Should I partition by unique ID or date?

  • 2087 Views
  • 2 replies
  • 0 kudos
Latest Reply
RonanStokes_DB
Databricks Employee
  • 0 kudos

Depending on the amount of data per partition - you may also want to consider partitioning by week, month or quarter.The partitioning decision is often tied to the tiering model of data storage. For a Bronze ingest layer, the optimal partitioning is ...

  • 0 kudos
1 More Replies
ruslan
by Databricks Employee
  • 1082 Views
  • 1 replies
  • 0 kudos

Does Spark Structured Streaming supports `OutputMode.Update` for Delta tables?

Does Spark Structured Streaming supports `OutputMode.Update` for Delta tables?

  • 1082 Views
  • 1 replies
  • 0 kudos
Latest Reply
ruslan
Databricks Employee
  • 0 kudos

Nope, it's not supported, but you could use a MERGE statement inside of a forEachBatch streaming sync Documentation on MERGEhttps://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-merge-into.htmlDocumentation for arbitrary streaming ...

  • 0 kudos
patputnam-db
by Databricks Employee
  • 1449 Views
  • 1 replies
  • 0 kudos

When should Change Data Feed be used?

IHAC who has a Change Data Capture data flowing into a Delta table. They would like to propagate these changes from this table into another table downstream. Is this a good application for using Change Data Feed?

  • 1449 Views
  • 1 replies
  • 0 kudos
Latest Reply
patputnam-db
Databricks Employee
  • 0 kudos

CDF simplifies the process of identifying the set of records that are updated, inserted, or deleted with each version of a Delta table. It helps to avoid having to implement downstream 'custom' filtration to identify these changes. This makes it an i...

  • 0 kudos
User16789201666
by Databricks Employee
  • 1625 Views
  • 1 replies
  • 1 kudos

How to make recursive calls to python/pandas UDF? For example, unzipping arbitrarily nested zip files.

There are files that are zip files and have many zip files within them, many levels. How do you read/parse the content?

  • 1625 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16789201666
Databricks Employee
  • 1 kudos

'tail-recurse' is a python API that can help.

  • 1 kudos
RonanStokes_DB
by Databricks Employee
  • 1252 Views
  • 0 replies
  • 1 kudos

Questions on Bronze / Silver / Gold data set layering

I have a DB-savvy customer who is concerned their silver/gold layer is becoming too expensive.  These layers are heavily denormalized, focused on logical business entities (customers, claims, services, etc), and maintained by MERGEs.  They cannot pre...

  • 1252 Views
  • 0 replies
  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels