cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

ruslan
by New Contributor II
  • 997 Views
  • 1 replies
  • 0 kudos

Does Delta Live Table supports MERGE?

Does Delta Live Table supports MERGE? 

  • 997 Views
  • 1 replies
  • 0 kudos
Latest Reply
ruslan
New Contributor II
  • 0 kudos

Delta Live Table currently does not support MERGE statement. This is work in progress.For now, you could use Structured Streaming + MERGE inside of a forEachBatch()

  • 0 kudos
User16788317466
by New Contributor II
  • 657 Views
  • 1 replies
  • 0 kudos

When can Horovod be used for an ML problem?

When can Horovod be used for an ML problem?

  • 657 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16788317466
New Contributor II
  • 0 kudos

Only when you have a gradient-descent problem. Pytorch and Tensorflow are the only candidate frameworks to use here. When using Horovod, start with single node, multi-GPU and measure training performance. If this is not sufficient, look at a multi-no...

  • 0 kudos
User16789201666
by Contributor II
  • 1126 Views
  • 1 replies
  • 0 kudos

Resolved! With SQL ACL’s, who can DROP a table?

Can the database owner always drop a table?

  • 1126 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16789201666
Contributor II
  • 0 kudos

Table owner or administrator. Before DBR 7.x, the database owner can. As of DBR 7.x, the database owner cannot. This will be changing soon.

  • 0 kudos
Anonymous
by Not applicable
  • 1206 Views
  • 1 replies
  • 1 kudos

What's the best way to develop Apache Spark Jobs from an IDE (such as IntelliJ/Pycharm)?

A number of people like developing locally using an IDE and then deploying. What are the recommended ways to do that with Databricks jobs?

  • 1206 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

The Databricks Runtime and Apache Spark use the same base API. One can create Spark jobs that run locally and have them run on Databricks with all available Databricks features.It is required that one uses SparkSession.builder.getOrCreate() to create...

  • 1 kudos
User16783854357
by New Contributor III
  • 908 Views
  • 1 replies
  • 1 kudos

How to run a Delta Live Table pipeline with a different runtime?

I would like to run a DLT pipeline with the 8.2 runtime.

  • 908 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16783854357
New Contributor III
  • 1 kudos

You can add the below JSON property to the Delta Live Table pipeline specification at the parent level:"dbr_version": "8.2"

  • 1 kudos
Anonymous
by Not applicable
  • 1065 Views
  • 1 replies
  • 0 kudos

Resolved! Configuring airflow

Should we create a Databricks user for airflow and generate a personal access token for it? We also have gsuite SSO enabled, does that mean I need to create a gsuite account for the user as well?

  • 1065 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16783855117
Contributor II
  • 0 kudos

I would recommend having the 'user' the Databricks Jobs are triggered by as a dedicated user. This is what I would consider a 'Service Account' and I'll drop a definition for that type of user below.Seeing that you have SSO enabled, I might create th...

  • 0 kudos
Anonymous
by Not applicable
  • 883 Views
  • 1 replies
  • 0 kudos
  • 883 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Full support for Databricks Runtime versions lasts for six months, with the exception of Long Term Support (LTS) versions, which Databricks supports for two years.https://docs.databricks.com/release-notes/runtime/databricks-runtime-ver.html

  • 0 kudos
Anonymous
by Not applicable
  • 1002 Views
  • 1 replies
  • 0 kudos
  • 1002 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16783855117
Contributor II
  • 0 kudos

It really depends on your business intentions! You can remove files no longer referenced by a Delta table and are older than the retention threshold by running the vacuum command on the table. vacuum is not triggered automatically. The default retent...

  • 0 kudos
Anonymous
by Not applicable
  • 1081 Views
  • 2 replies
  • 0 kudos

Resolved! Best practices to query logs

We dump our logs in S3 currently. Can you give us best practices to make these logs easier to query?

  • 1081 Views
  • 2 replies
  • 0 kudos
Latest Reply
sajith_appukutt
Honored Contributor II
  • 0 kudos

And if it is generic logs which gets landed on S3 , it'd be worth taking a look at Autoloader. Here is a blog post on processing crowdstrike logs in a similar way

  • 0 kudos
1 More Replies
Anonymous
by Not applicable
  • 3199 Views
  • 1 replies
  • 0 kudos

Resolved! Backfill Delta table

What is the recommended way to backfill a delta table using a series of smaller date partitioned jobs?

  • 3199 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16783855117
Contributor II
  • 0 kudos

Another approach you might consider is creating a template notebook to query a known date range with widgets. For example, two date widgets, start time and end time. Then from there you could use Databricks Jobs to update these parameters for each ru...

  • 0 kudos
User16790091296
by Contributor II
  • 704 Views
  • 0 replies
  • 5 kudos

Some Tips & Tricks for Optimizing costs and performance (Clusters and Ganglia): [Note: This list is not exhaustive] Leverage the DataFrame or Spar...

Some Tips & Tricks for Optimizing costs and performance (Clusters and Ganglia):[Note: This list is not exhaustive]Leverage the DataFrame or SparkSQL API’s first. They use the same execution process resulting in parity in performance but they also com...

  • 704 Views
  • 0 replies
  • 5 kudos
Anonymous
by Not applicable
  • 2431 Views
  • 1 replies
  • 0 kudos

Resolved! Delta vs parquet

When does it make sense to use Delta over parquet? Are there any instances when you would rather use parquet?

  • 2431 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 0 kudos

Users should almost always choose Delta over parquet. Keep in mind that delta is a storage format that sits on top of parquet so the performance of writing to both formats is similar. However, reading data and transforming data with delta is almost a...

  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels
Top Kudoed Authors