Databricks Community

Alessio_F · ‎05-21-2026

Hi everyone,

I'm using Azure Databricks with a customer who has a SQL Server database federated on the Unity Catalog.

It seems that, while converting some date functions to the SQL Server dialect, Databricks uses the function "extract", which is not recognized by SQL Server.

For example, this query in a SQL cell:

WITH cte AS (

SELECT

CAST(year(`Date`) AS int) AS RefYear,

CAST(year(`Date`) AS int) AS RefMonth,

COUNT(*) AS TabACount

FROM tabA

GROUP BY 1, 2

)

SELECT

CAST(year(tabB.`Date`) AS int) AS RefYear,

CAST(month(tabB.`Date`) AS int) AS RefMonth,

MAX(cte.TabACount) AS TabACount,

COUNT(*) AS TabBCount

FROM tabB

JOIN cte

ON CAST(year(tabB.`Date`) AS int) = cte.RefYear

AND CAST(month(tabB.`Date`) AS int) = cte.RefMonth

GROUP BY 1, 2

ORDER BY 1, 2

Let's ignore the fact that this query is not well written and there are functions in joins: I've simplified the original query to avoid to publicly reveal details of the customer's database.

The point is that, using "year" and "month" functions in joins gives the following error:

com.microsoft.sqlserver.jdbc.SQLServerException: 'EXTRACT' is not a recognized built-in function name. at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:265) at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1676) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:620) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:540) at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7627) at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3916) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:268) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:242) at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:459) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:350) at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406) at org.apache.spark.rdd.RDD.iterator(RDD.scala:373) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60) at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406) at org.apache.spark.rdd.RDD.iterator(RDD.scala:373) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60) at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:409) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406) at org.apache.spark.rdd.RDD.iterator(RDD.scala:373) at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:88) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:87) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:58) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:39) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:201) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:186) at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:151) at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:45) at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104) at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:109) at scala.util.Using$.resource(Using.scala:269) at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:108) at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:145) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$9(Executor.scala:961) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:107) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:964) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:856) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

From the exception stack, it seems that somewhere, between the Spark executor and the com.microsoft.sqlserver.jdbc driver, the YEAR and MONTH functions are converted to the SQL EXECUTE function, which is not supported by SQL Server.

Is this a known bug? I'm experiencing it since at least two years.

Any workarounds?

Thanks

szymon_dybczak · ‎05-21-2026

Hi @Alessio_F ,

This happens because in Databricks SQL both year and month functions are just aliases over following patterns:

- extract (YEAR FROM expr)

- extract(MONTH FROM expr)

When Databricks pushes a predicate or expression down to the remote SQL Server via JDBC, it translates these functions back to their canonical EXTRACT(...) form - which is not a recognized in SQL Server's T-SQL dialect.

You can try to rewrite your query or you can try to use remote_query() function which lets you run SQL queries directly against external database using native SQL syntax of th remote system.

Query external databases using the remote_query function | Databricks on Google Cloud

If my answer was helpful, please conisder marking it as accepted solution.

Databricks Community

Extract SQL function in SQL Server federated database

🌟 Community Pulse: Your Weekly Roundup! June 22 – 28, 2026

Solution Accelerator Series | Product Quality Inspection

Upcoming Community BrickTalk: Bringing (Geo)Spatial Awareness to your Conversational Agents

Databricks Community Champion - June 2026 - Amira Bedhiafi

Build apps without jumping through hoops