Data Engineering

Forum Posts

Sorted by:

by TylerTamasaucka • New Contributor II

11-18-2019 12:59:11 PM

29304 Views
5 replies
2 kudos

org.apache.spark.sql.AnalysisException: Undefined function: 'MAX'

I am trying to create a JAR for a Azure Databricks job but some code that works when using the notebook interface does not work when calling the library through a job. The weird part is that the job will complete the first run successfully but on an...

Data Engineering

29304 Views
5 replies
2 kudos

11-18-2019 12:59:11 PM

View Replies

Latest Reply

skaja
New Contributor II

10-12-2022 12:57:55 AM

2 kudos

I am facing similar issue when trying to use from_utc_timestamp function. I am able to call the function from databricks notebook but when I use the same function inside my java jar and running as a job in databricks, it is giving below error. Analys...

2 kudos

10-12-2022 12:57:55 AM

4 More Replies

by osoucy • New Contributor II

09-08-2022 10:10:56 AM

1234 Views
0 replies
1 kudos

Is it possible to join two aggregated streams of data?

ObjectiveWithin the context of a delta live table, I'm trying to merge two streams aggregation, but run into challenges. Is it possible to achieve such a join?ContextAssume- table trades stores a list of trades with their associated time stamps- tabl...

Data Engineering

1234 Views
0 replies
1 kudos

09-08-2022 10:10:56 AM

by KKDataEngineer • New Contributor III

01-28-2022 8:21:11 AM

1372 Views
0 replies
2 kudos

Spark Structred Streaming, An Aggregation DF with Watermark in Append mode to Delta table is not writing the most recent aggregation to the Delta table even after crossing the water mark boundary. This is causing dataloss

Team, I am struggling with a unique issue. I am not sure if my understanding is wrong or this is a bug with spark. I am reading a stream from events hub ( Extract) Pivoting and Aggregating the above dataframe ( Transformation). This is a WATERMARKED...

Data Engineering

1372 Views
0 replies
2 kudos

01-28-2022 8:21:11 AM

by FrancisLau • New Contributor

07-30-2015 8:58:10 PM

4304 Views
2 replies
0 kudos

Resolved! agg function not working for multiple aggregations

Data has 2 columns: |requestDate|requestDuration| | 2015-06-17| 104| Here is the code: avgSaveTimesByDate = gridSaves.groupBy(gridSaves.requestDate).agg({"requestDuration": "min", "requestDuration": "max","requestDuration": "avg"}) avgSaveTimesBy...

Data Engineering

4304 Views
2 replies
0 kudos

07-30-2015 8:58:10 PM

View Replies

Latest Reply

ReKa
New Contributor III

11-12-2016 12:41:47 PM

0 kudos

My guess is that the reason this may not work is the fact that the dictionary input does not have unique keys. With this syntax, column-names are keys and if you have two or more aggregation for the same column, some internal loops may forget the no...

0 kudos

11-12-2016 12:41:47 PM

1 More Replies

Databricks Community

org.apache.spark.sql.AnalysisException: Undefined function: 'MAX'

Is it possible to join two aggregated streams of data?

Spark Structred Streaming, An Aggregation DF with Watermark in Append mode to Delta table is not writing the most recent aggregation to the Delta table even after crossing the water mark boundary. This is causing dataloss

Resolved! agg function not working for multiple aggregations