cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ruslan
by Databricks Employee
  • 1234 Views
  • 1 replies
  • 0 kudos

Does Spark Structured Streaming supports `OutputMode.Update` for Delta tables?

Does Spark Structured Streaming supports `OutputMode.Update` for Delta tables?

  • 1234 Views
  • 1 replies
  • 0 kudos
Latest Reply
ruslan
Databricks Employee
  • 0 kudos

Nope, it's not supported, but you could use a MERGE statement inside of a forEachBatch streaming sync Documentation on MERGEhttps://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-merge-into.htmlDocumentation for arbitrary streaming ...

  • 0 kudos
patputnam-db
by Databricks Employee
  • 1743 Views
  • 1 replies
  • 0 kudos

When should Change Data Feed be used?

IHAC who has a Change Data Capture data flowing into a Delta table. They would like to propagate these changes from this table into another table downstream. Is this a good application for using Change Data Feed?

  • 1743 Views
  • 1 replies
  • 0 kudos
Latest Reply
patputnam-db
Databricks Employee
  • 0 kudos

CDF simplifies the process of identifying the set of records that are updated, inserted, or deleted with each version of a Delta table. It helps to avoid having to implement downstream 'custom' filtration to identify these changes. This makes it an i...

  • 0 kudos
User16789201666
by Databricks Employee
  • 1775 Views
  • 1 replies
  • 1 kudos

How to make recursive calls to python/pandas UDF? For example, unzipping arbitrarily nested zip files.

There are files that are zip files and have many zip files within them, many levels. How do you read/parse the content?

  • 1775 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16789201666
Databricks Employee
  • 1 kudos

'tail-recurse' is a python API that can help.

  • 1 kudos
RonanStokes_DB
by Databricks Employee
  • 1381 Views
  • 0 replies
  • 1 kudos

Questions on Bronze / Silver / Gold data set layering

I have a DB-savvy customer who is concerned their silver/gold layer is becoming too expensive.  These layers are heavily denormalized, focused on logical business entities (customers, claims, services, etc), and maintained by MERGEs.  They cannot pre...

  • 1381 Views
  • 0 replies
  • 1 kudos
ruslan
by Databricks Employee
  • 1280 Views
  • 1 replies
  • 0 kudos

Does Delta Live Table supports MERGE?

Does Delta Live Table supports MERGE? 

  • 1280 Views
  • 1 replies
  • 0 kudos
Latest Reply
ruslan
Databricks Employee
  • 0 kudos

Delta Live Table currently does not support MERGE statement. This is work in progress.For now, you could use Structured Streaming + MERGE inside of a forEachBatch()

  • 0 kudos
User16788317466
by Databricks Employee
  • 990 Views
  • 1 replies
  • 0 kudos

When can Horovod be used for an ML problem?

When can Horovod be used for an ML problem?

  • 990 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16788317466
Databricks Employee
  • 0 kudos

Only when you have a gradient-descent problem. Pytorch and Tensorflow are the only candidate frameworks to use here. When using Horovod, start with single node, multi-GPU and measure training performance. If this is not sufficient, look at a multi-no...

  • 0 kudos
User16789201666
by Databricks Employee
  • 1495 Views
  • 1 replies
  • 0 kudos

Resolved! With SQL ACL’s, who can DROP a table?

Can the database owner always drop a table?

  • 1495 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16789201666
Databricks Employee
  • 0 kudos

Table owner or administrator. Before DBR 7.x, the database owner can. As of DBR 7.x, the database owner cannot. This will be changing soon.

  • 0 kudos
Anonymous
by Not applicable
  • 1495 Views
  • 1 replies
  • 1 kudos

What's the best way to develop Apache Spark Jobs from an IDE (such as IntelliJ/Pycharm)?

A number of people like developing locally using an IDE and then deploying. What are the recommended ways to do that with Databricks jobs?

  • 1495 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

The Databricks Runtime and Apache Spark use the same base API. One can create Spark jobs that run locally and have them run on Databricks with all available Databricks features.It is required that one uses SparkSession.builder.getOrCreate() to create...

  • 1 kudos
User16783854357
by New Contributor III
  • 1212 Views
  • 1 replies
  • 1 kudos

How to run a Delta Live Table pipeline with a different runtime?

I would like to run a DLT pipeline with the 8.2 runtime.

  • 1212 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16783854357
New Contributor III
  • 1 kudos

You can add the below JSON property to the Delta Live Table pipeline specification at the parent level:"dbr_version": "8.2"

  • 1 kudos
User16776430979
by New Contributor III
  • 2889 Views
  • 0 replies
  • 0 kudos

How to optimize and convert a Spark DataFrame to Arrow?

Example use case: When connecting a sample Plotly Dash application to a large dataset, in order to test the performance, I need the file format to be in either hdf5 or arrow. According to this doc: Optimize conversion between PySpark and pandas DataF...

  • 2889 Views
  • 0 replies
  • 0 kudos
Anonymous
by Not applicable
  • 1474 Views
  • 1 replies
  • 0 kudos

Resolved! Configuring airflow

Should we create a Databricks user for airflow and generate a personal access token for it? We also have gsuite SSO enabled, does that mean I need to create a gsuite account for the user as well?

  • 1474 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16783855117
Contributor II
  • 0 kudos

I would recommend having the 'user' the Databricks Jobs are triggered by as a dedicated user. This is what I would consider a 'Service Account' and I'll drop a definition for that type of user below.Seeing that you have SSO enabled, I might create th...

  • 0 kudos
Anonymous
by Not applicable
  • 1176 Views
  • 1 replies
  • 0 kudos
  • 1176 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Full support for Databricks Runtime versions lasts for six months, with the exception of Long Term Support (LTS) versions, which Databricks supports for two years.https://docs.databricks.com/release-notes/runtime/databricks-runtime-ver.html

  • 0 kudos
Anonymous
by Not applicable
  • 1328 Views
  • 1 replies
  • 0 kudos
  • 1328 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16783855117
Contributor II
  • 0 kudos

It really depends on your business intentions! You can remove files no longer referenced by a Delta table and are older than the retention threshold by running the vacuum command on the table. vacuum is not triggered automatically. The default retent...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels