cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

TylerTamasaucka
by New Contributor II
  • 28714 Views
  • 5 replies
  • 2 kudos

org.apache.spark.sql.AnalysisException: Undefined function: 'MAX'

I am trying to create a JAR for a Azure Databricks job but some code that works when using the notebook interface does not work when calling the library through a job. The weird part is that the job will complete the first run successfully but on an...

  • 28714 Views
  • 5 replies
  • 2 kudos
Latest Reply
skaja
New Contributor II
  • 2 kudos

I am facing similar issue when trying to use from_utc_timestamp function. I am able to call the function from databricks notebook but when I use the same function inside my java jar and running as a job in databricks, it is giving below error. Analys...

  • 2 kudos
4 More Replies
osoucy
by New Contributor II
  • 1126 Views
  • 0 replies
  • 1 kudos

Is it possible to join two aggregated streams of data?

ObjectiveWithin the context of a delta live table, I'm trying to merge two streams aggregation, but run into challenges. Is it possible to achieve such a join?ContextAssume- table trades stores a list of trades with their associated time stamps- tabl...

  • 1126 Views
  • 0 replies
  • 1 kudos
KKDataEngineer
by New Contributor III
  • 1250 Views
  • 0 replies
  • 2 kudos

Spark Structred Streaming, An Aggregation DF with Watermark in Append mode to Delta table is not writing the most recent aggregation to the Delta table even after crossing the water mark boundary. This is causing dataloss

Team,  I am struggling with a unique issue. I am not sure if my understanding is wrong or this is a bug with spark. I am reading a stream from events hub ( Extract) Pivoting and Aggregating the above dataframe ( Transformation). This is a WATERMARKED...

  • 1250 Views
  • 0 replies
  • 2 kudos
FrancisLau
by New Contributor
  • 3968 Views
  • 2 replies
  • 0 kudos

Resolved! agg function not working for multiple aggregations

Data has 2 columns: |requestDate|requestDuration| | 2015-06-17| 104| Here is the code: avgSaveTimesByDate = gridSaves.groupBy(gridSaves.requestDate).agg({"requestDuration": "min", "requestDuration": "max","requestDuration": "avg"}) avgSaveTimesBy...

  • 3968 Views
  • 2 replies
  • 0 kudos
Latest Reply
ReKa
New Contributor III
  • 0 kudos

My guess is that the reason this may not work is the fact that the dictionary input does not have unique keys. With this syntax, column-names are keys and if you have two or more aggregation for the same column, some internal loops may forget the no...

  • 0 kudos
1 More Replies
Labels