03-26-2023 09:50 PM
I'm using spark version 3.2.1 on databricks (DBR 10.4 LTS), and I'm trying to convert sql server sql query to a new sql query that runs on a spark cluster using spark sql in sql syntax. However, spark sql does not seem to support XML PATH as a function and I wonder if there is an alternative way to convert this sql server query into a sql query that spark sql will accept. The original sql server sql query looks like this:
DROP TABLE if exists UserCountry;
CREATE TABLE if not exists UserCountry (
UserID INT,
Country VARCHAR(5000)
);
INSERT INTO UserCountry
SELECT
L.UserID AS UserID,
COALESCE(
STUFF(
(SELECT ', ' + LC.Country FROM UserStopCountry LC WHERE L.UserID = LC.UserID FOR XML PATH (''))
, 1, 2, '')
, '') AS Country
FROM
LK_ETLRunUserID L
When I run the query above in databricks spark sql, I get the following error:
ParseException:
mismatched input 'FOR' expecting {')', '.', '[', 'AND', 'BETWEEN', 'CLUSTER', 'DISTRIBUTE', 'DIV', 'EXCEPT', 'GROUP', 'HAVING', 'IN', 'INTERSECT', 'IS', 'LIKE', 'ILIKE', 'LIMIT', NOT, 'OR', 'ORDER', 'QUALIFY', RLIKE, 'MINUS', 'SORT', 'UNION', 'WINDOW', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', '&', '|', '||', '^', ':', '::'}(line 6, pos 80)
== SQL ==
INSERT INTO UserCountry
SELECT
L.UserID AS UserID,
COALESCE(
STUFF(
(SELECT ', ' + LC.Country FROM UserStopCountry LC WHERE L.UserID = LC.UserID FOR XML PATH (''))
--------------------------------------------------------------------------------^^^
, 1, 2, '')
, '') AS Country
FROM
LK_ETLRunUserID L
Given that the UserStopCountry looks like this:
I believe the output will be:
03-30-2023 05:59 AM
Posting the solution that I ended up using:
%sql
DROP TABLE if exists UserCountry;
CREATE TABLE if not exists UserCountry (
UserID INT,
Country VARCHAR(5000)
);
INSERT INTO UserCountry
SELECT
L.UserID AS UserID,
CONCAT_WS(',', collect_list(LC.Country)) AS Country
COALESCE(
STUFF(
(SELECT ', ' + LC.Country FROM UserStopCountry LC WHERE L.UserID = LC.UserID FOR XML PATH (''))
, 1, 2, '')
, '') AS Country
FROM LK_ETLRunUserID L
INNER JOIN UserStopCountry LC
ON L.UserID = LC.UserID
GROUP By L.UserID
03-29-2023 02:47 AM
03-30-2023 12:35 AM
Hi @Jay Yang
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
03-30-2023 05:59 AM
Posting the solution that I ended up using:
%sql
DROP TABLE if exists UserCountry;
CREATE TABLE if not exists UserCountry (
UserID INT,
Country VARCHAR(5000)
);
INSERT INTO UserCountry
SELECT
L.UserID AS UserID,
CONCAT_WS(',', collect_list(LC.Country)) AS Country
COALESCE(
STUFF(
(SELECT ', ' + LC.Country FROM UserStopCountry LC WHERE L.UserID = LC.UserID FOR XML PATH (''))
, 1, 2, '')
, '') AS Country
FROM LK_ETLRunUserID L
INNER JOIN UserStopCountry LC
ON L.UserID = LC.UserID
GROUP By L.UserID
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group