cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by Not applicable
  • 530 Views
  • 0 replies
  • 0 kudos

Newline characters mess up the table records

When creating tables from text files containing newline characters in the middle of the lines, the table records will null column values because the newline characters in the middle of the lines break the lines into two different records and fill up ...

  • 530 Views
  • 0 replies
  • 0 kudos
jose_gonzalez
by Moderator
  • 865 Views
  • 1 replies
  • 0 kudos

how often should I vacuum my Delta table?

I would like to know how often do I need to vacuum my delta table to clean old files?

  • 865 Views
  • 1 replies
  • 0 kudos
Latest Reply
RonanStokes_DB
New Contributor III
  • 0 kudos

The requirements for Vacuum will depend on your application needs and the rate of arrival of new data. Vacuuming removes old versions of data.If you need to be able to query earlier versions of data many months after the original ingest time, then i...

  • 0 kudos
jose_gonzalez
by Moderator
  • 1329 Views
  • 2 replies
  • 0 kudos

how to partition my Delta table?

I would like to follow best practices to partition my Delta table. Should I partition by unique ID or date?

  • 1329 Views
  • 2 replies
  • 0 kudos
Latest Reply
RonanStokes_DB
New Contributor III
  • 0 kudos

Depending on the amount of data per partition - you may also want to consider partitioning by week, month or quarter.The partitioning decision is often tied to the tiering model of data storage. For a Bronze ingest layer, the optimal partitioning is ...

  • 0 kudos
1 More Replies
ruslan
by New Contributor II
  • 599 Views
  • 1 replies
  • 0 kudos

Does Spark Structured Streaming supports `OutputMode.Update` for Delta tables?

Does Spark Structured Streaming supports `OutputMode.Update` for Delta tables?

  • 599 Views
  • 1 replies
  • 0 kudos
Latest Reply
ruslan
New Contributor II
  • 0 kudos

Nope, it's not supported, but you could use a MERGE statement inside of a forEachBatch streaming sync Documentation on MERGEhttps://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-merge-into.htmlDocumentation for arbitrary streaming ...

  • 0 kudos
patputnam-db
by New Contributor II
  • 891 Views
  • 1 replies
  • 0 kudos

When should Change Data Feed be used?

IHAC who has a Change Data Capture data flowing into a Delta table. They would like to propagate these changes from this table into another table downstream. Is this a good application for using Change Data Feed?

  • 891 Views
  • 1 replies
  • 0 kudos
Latest Reply
patputnam-db
New Contributor II
  • 0 kudos

CDF simplifies the process of identifying the set of records that are updated, inserted, or deleted with each version of a Delta table. It helps to avoid having to implement downstream 'custom' filtration to identify these changes. This makes it an i...

  • 0 kudos
User16789201666
by Contributor II
  • 1088 Views
  • 1 replies
  • 1 kudos

How to make recursive calls to python/pandas UDF? For example, unzipping arbitrarily nested zip files.

There are files that are zip files and have many zip files within them, many levels. How do you read/parse the content?

  • 1088 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16789201666
Contributor II
  • 1 kudos

'tail-recurse' is a python API that can help.

  • 1 kudos
RonanStokes_DB
by New Contributor III
  • 854 Views
  • 0 replies
  • 1 kudos

Questions on Bronze / Silver / Gold data set layering

I have a DB-savvy customer who is concerned their silver/gold layer is becoming too expensive.  These layers are heavily denormalized, focused on logical business entities (customers, claims, services, etc), and maintained by MERGEs.  They cannot pre...

  • 854 Views
  • 0 replies
  • 1 kudos
ruslan
by New Contributor II
  • 812 Views
  • 1 replies
  • 0 kudos

Does Delta Live Table supports MERGE?

Does Delta Live Table supports MERGE? 

  • 812 Views
  • 1 replies
  • 0 kudos
Latest Reply
ruslan
New Contributor II
  • 0 kudos

Delta Live Table currently does not support MERGE statement. This is work in progress.For now, you could use Structured Streaming + MERGE inside of a forEachBatch()

  • 0 kudos
User16788317466
by New Contributor II
  • 434 Views
  • 1 replies
  • 0 kudos

When can Horovod be used for an ML problem?

When can Horovod be used for an ML problem?

  • 434 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16788317466
New Contributor II
  • 0 kudos

Only when you have a gradient-descent problem. Pytorch and Tensorflow are the only candidate frameworks to use here. When using Horovod, start with single node, multi-GPU and measure training performance. If this is not sufficient, look at a multi-no...

  • 0 kudos
User16789201666
by Contributor II
  • 846 Views
  • 1 replies
  • 0 kudos

Resolved! With SQL ACL’s, who can DROP a table?

Can the database owner always drop a table?

  • 846 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16789201666
Contributor II
  • 0 kudos

Table owner or administrator. Before DBR 7.x, the database owner can. As of DBR 7.x, the database owner cannot. This will be changing soon.

  • 0 kudos
Anonymous
by Not applicable
  • 928 Views
  • 1 replies
  • 1 kudos

What's the best way to develop Apache Spark Jobs from an IDE (such as IntelliJ/Pycharm)?

A number of people like developing locally using an IDE and then deploying. What are the recommended ways to do that with Databricks jobs?

  • 928 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

The Databricks Runtime and Apache Spark use the same base API. One can create Spark jobs that run locally and have them run on Databricks with all available Databricks features.It is required that one uses SparkSession.builder.getOrCreate() to create...

  • 1 kudos
Labels