I have noticed some inconsistent behavior between calling the 'split' fuction on databricks and on my local installation.Running it in a databricks notebook givesspark.sql("SELECT split('abc', ''), size(split('abc',''))").show()So the string is split...
When running some SQL queries using spark.sql(...), we sometimes get a variant of the following error:AnalysisException: Undefined function: current_timestamp. This function is neither a built-in/temporary function, nor a persistent function that is ...
Let's say I have a DataFrame with a timestamp and an offset column in milliseconds respectively in the timestamp and long format. E.g.from datetime import datetime
df = spark.createDataFrame(
[
(datetime(2021, 1, 1), 1500, ),
(dat...
I have a Merge into statement that I use to update existing entries or create new entries in a dimension table based on a natural business key.When creating new entries I would like to also create a unique uuid for that entry that I can use to crossr...
Is there a way to resolve this issue without using ml clusters? Due to our current setup, I'm currently limited in which clusters I can manually create and a quick workaround for development purposes would be helpful here.
Thank you for the suggestion, but even with the same spark version there seems to be a difference between what is happening locally and what happens on a databricks cluster.
Hi,My databricks cluster runs spark 3.3, but does give a length of 3.Is there something different about the databricks implementation of pyspark or should it use the same standards?