- 797 Views
- 0 replies
- 0 kudos
When creating tables from text files containing newline characters in the middle of the lines, the table records will null column values because the newline characters in the middle of the lines break the lines into two different records and fill up ...
When we run the sql statements "DROP TABLE .... CREATE TABLE" for the same table in multiple places (different notebooks, jobs, ...) some notebooks may not see the most recent schema / content.
To speed up cluster provisioning in the case of use Container Services, how do you pre-install it to a pool.
I would like to know how often do I need to vacuum my delta table to clean old files?
The requirements for Vacuum will depend on your application needs and the rate of arrival of new data. Vacuuming removes old versions of data.If you need to be able to query earlier versions of data many months after the original ingest time, then i...
I would like to follow best practices to partition my Delta table. Should I partition by unique ID or date?
Depending on the amount of data per partition - you may also want to consider partitioning by week, month or quarter.The partitioning decision is often tied to the tiering model of data storage. For a Bronze ingest layer, the optimal partitioning is ...
Does Spark Structured Streaming supports `OutputMode.Update` for Delta tables?
Nope, it's not supported, but you could use a MERGE statement inside of a forEachBatch streaming sync Documentation on MERGEhttps://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-merge-into.htmlDocumentation for arbitrary streaming ...
IHAC who has a Change Data Capture data flowing into a Delta table. They would like to propagate these changes from this table into another table downstream. Is this a good application for using Change Data Feed?
CDF simplifies the process of identifying the set of records that are updated, inserted, or deleted with each version of a Delta table. It helps to avoid having to implement downstream 'custom' filtration to identify these changes. This makes it an i...
There are files that are zip files and have many zip files within them, many levels. How do you read/parse the content?
I have a DB-savvy customer who is concerned their silver/gold layer is becoming too expensive. These layers are heavily denormalized, focused on logical business entities (customers, claims, services, etc), and maintained by MERGEs. They cannot pre...
No, unfortunately Delta Live Tables only supports append. However, Merge is likely to be added in the near future.
Does Delta Live Table supports MERGE?
Delta Live Table currently does not support MERGE statement. This is work in progress.For now, you could use Structured Streaming + MERGE inside of a forEachBatch()
When can Horovod be used for an ML problem?
Only when you have a gradient-descent problem. Pytorch and Tensorflow are the only candidate frameworks to use here. When using Horovod, start with single node, multi-GPU and measure training performance. If this is not sufficient, look at a multi-no...
Can the database owner always drop a table?
Table owner or administrator. Before DBR 7.x, the database owner can. As of DBR 7.x, the database owner cannot. This will be changing soon.
A number of people like developing locally using an IDE and then deploying. What are the recommended ways to do that with Databricks jobs?
The Databricks Runtime and Apache Spark use the same base API. One can create Spark jobs that run locally and have them run on Databricks with all available Databricks features.It is required that one uses SparkSession.builder.getOrCreate() to create...
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group