Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Some Tips & Tricks for Optimizing costs and performance (Clusters and Ganglia):[Note: This list is not exhaustive]Leverage the DataFrame or SparkSQL API’s first. They use the same execution process resulting in parity in performance but they also com...
I'm relatively new to Scala. In the past, I was able to do the following python:
def foo(p1, p2):
import datetime as dt
dt.datetime(2014, 4, 17, 12, 34)
result = [
(1, "1", 1.1, dt.datetime(2014, 4, 17, 1, 0)),
(2, "2", 2...
Hello,
Just in case, here is an example for proposed solution above:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
import org.apache.spark.sql.types._
val data = Seq(("A", Seq((3,4),(5,6),(7,10))), ("B", Seq((-1,...