Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2024 10:21 AM
Hi All,
I am wondering how you would go about translating either of the below to Spark SQL in Databricks. They are more or less equivalent statements in T-SQL.
Please note that I am attempting to pair each unique Policy (IPI_ID) record with its highest numbered Location (IL_ID) record. There can be many Location records for each Policy record. The Location table links to the Policy table via Policy.IPI_ID = Location.IL_IPI_ID.
I have tried to utilize LIMIT 1 in certain ways (example further below) but either receive errors or the results do not match.
Any help or suggestions are appreciated!
T-SQL:
select
ipi.IPI_ID
,loc.IL_ID
from Policy ipi
outer apply
(
select top 1 il.IL_ID
from Location il
where il.IL_IPI_ID = ipi.IPI_ID
order by
il.IL_ID desc
) loc
--
select
ipi.IPI_ID
,il.IL_ID
from Policy ipi
left join Location il
on il.IL_ID =
(
select top 1 il2.IL_ID
from Location il2
where il2.IL_IPI_ID = ipi.IPI_ID
order by
il2.IL_ID desc
)
Errors out in Databricks Spark SQL:
select
ipi.IPI_ID
,il.IL_ID
from Policy ipi
left join Location il
on il.IL_ID =
(
select il2.IL_ID
from Location il2
where il2.IL_IPI_ID = ipi.IPI_ID
order by
il2.IL_ID desc
limit 1
);