Databricks Community

User16857281974 · 07-30-2021

@Ryan Chynoweth and @Sean Owen are both right, but I have a different perspective on this.Quick side note: you can also configure your cluster to execute with only a driver, and thus reducing the cost to the cheapest single VM available. In the cl...

User16857281974 · 07-30-2021

This article is based in part on the course produced by Databricks Academy called Optimizing Apache Spark on Databricks. These courses are 100% free, but also goes a bit deeper into the considerations required for making this decision, including usag...

User16857281974 · 07-30-2021

Databrick's curriculum team solved this problem by creating our own Maven repo and it's easier than it sounds. To do this, we took an S3 bucket, converted it to a public website, allowing for standard file downloads, and then within that bucket creat...

User16857281974 · 07-30-2021

Apache Spark does not have the features of a relational database wherein you can do a search on a primary key for example. It is forced to read in 100% of the data (generally speaking), which hurts performance at Gigabyte+ scales and test every singl...

User16857281974 · 07-30-2021

You would solve this just like we solve this problem for all lose string references. Namely, that is to create a constant that represents the key-value you want to ensure doesn't get mistyped.Naturally, if you type it wrong the first time, it will be...

Databricks Community

User Stats

User Activity

Re: How large should a dataset be so that it’s worth using Spark?

Re: Cluster Sizing

Re: Issue loading spark Scala library

Re: what are the benefits to do use Z-Ordering

Re: Is there a way to validate the values of spark configs?