I have query in sql server which I want to simplify so that it can be Hive compatible.
This is the query written in SQL format
SELECT session_id,
Substring ((SELECT ( ';' + tag_name )
FROM session_tag st2
WHERE st2.session_id = st.session_id
FOR xml path ( '' )), 2, 1000) AS tags
FROM session_tag
GROUP BY session_id;
This is the result of this query
Once again I don't want to pass select query inside the substring function.
So I tried to simplify these query to get the same result the first query shows nothing and second one throws an error that subquery returned more than one result.
SELECT SUBSTRING(';'+tag_name,2,1000) as tag from session_tag st1
where st1.session_id = (select st2.session_id from session_tag st2 where st1.session_id = st2.session_id for xml path (''))
and
SELECT SUBSTRING(';'+tag_name,2,1000) as tag from session_tag st1
where st1.session_id = (select st2.session_id from session_tag st2 where st1.session_id = st2.session_id) for xml path('')
Finally I got the solution. The simplified version of this query is
SELECT session_id, Substring(d.tagList,2,1000) as tag from (
select distinct session_id from session_tag) a
cross apply
(
select ';'+tag_name from session_tag as b
where a.session_id = b.session_id for xml path ('')
) d (tagList)
Related
I have a query which uses join, then group by caseId and then a concat-like function using STUFF.
SELECT distinct [CaseID], STUFF((SELECT ';' +space(1)+ A.[AssignedPathologist]+' ' FROM CTE1 A
WHERE A.[CaseID]=B.[CaseID] FOR XML PATH('')),1,1,'') As [AssignedPathologist]
From CTE1 B
Group By [CaseID]
The problem is that this query is super, super-slow and I tried to optimize it using CONCAT instead.
SELECT distinct A.[CaseID], [AssignedPathologist] = CASE A.AssignedPathologist = B.AssignedPathologist
WHEN 1 THEN A.AssignedPathologist
ELSE CONCAT(A.AssignedPathologist, ' ', B.AssignedPathologist)
END
FROM CTE1 A
INNER JOIN CTE1 B ON A.[CaseID]=B.[CaseID]
END
but it gives me syntax error here
[AssignedPathologist] = CASE A.AssignedPathologist = B.AssignedPathologist
which is logic because I used twice = here.
Is there any method to optimize my query using CONCAT or another methods ?
Thank you
I would try with this :
SELECT [CaseID],
STUFF( (SELECT CONCAT('; ', A.[AssignedPathologist])
FROM CTE1 A
WHERE A.[CaseID] = B.[CaseID]
FOR XML PATH('')
),1, 1, ''
) As [AssignedPathologist]
FROM (SELECT DISTINCT CaseID CTE1 B) B;
For newer versions you can use string_agg() :
SELECT CASEID, STRING_AGG(AssignedPathologist, '; ') AS AssignedPathologist
FROM CTE1 C1
GROUP BY CASEID;
I have a SQL script, that I need to convert to redshift.
Here is the part, where I have a problem:
LEFT JOIN
(
SELECT STUFF((
SELECT ','+ clo.name
FROM public.label_entities cl
JOIN public.label_history clo
ON clo.id = cl.labelid
WHERE clo.parentid = 993
AND cl.entityid = clv.contactid
FOR XML PATH('')
) ,1,1,'') AS Services
) AS labelServices
I have read that I can use SELECT LISTAGG and try to use it like this:
LEFT JOIN
(
SELECT LISTAGG((
SELECT ','+ clo.name
FROM public.label_entities cl
JOIN public.label_history clo
ON clo.id = cl.labelid
WHERE clo.parentid = 993
AND cl.entityid = clv.contactid
FOR XML PATH('')
) ,1,1,'') AS Services
) AS labelServices
But it does not work.
So how I can rewrite it to be correct?
You don't need all the XML stuff. In fact, it is XML that is doing the aggregation in SQL Server, not STUFF(). STUFF() is just used for beautifying the string after it is created.
So, something like this:
LEFT JOIN
(SELECT cl.entityid, LISTAGG(clo.name, ', ') WITHIN GROUP (ORDER BY clo.name) as names
FROM public.label_entities cl JOIN
public.label_history clo
ON clo.id = cl.labelid
WHERE clo.parentid = 993
GROUP BY cl.entityid
) AS labelServices
ON labelServices.entityid = clv.contactid
use replace() instead of stuff()
select STUFF(', hai, hello, fine', 1, 1, '')
select replace(','+', hai, hello, fine', ',,', '') --- ', hai, hello, fine' would be
--- returned by inner select
EDIT 1
select REPLACE(','+
(
SELECT ','+ clo.name
FROM public.label_entities cl
JOIN public.label_history clo
ON clo.id = cl.labelid
WHERE clo.parentid = 993
AND cl.entityid = clv.contactid
FOR XML PATH('')
)
,',,'
,''
)
I have this query:
SELECT DISTINCT
f1.CourseEventKey,
STUFF
(
(
SELECT '; ' + Title
FROM (
SELECT DISTINCT
ces.CourseEventKey,
f.Title
FROM CourseEventSchedule ces
INNER JOIN Facility f ON f.FacilityKey = ces.FacilityKey
WHERE ces.CourseEventKey IN
(
SELECT CourseEventKey
FROM #CourseEvents
)
) f2
WHERE f2.CourseEventKey = f1.CourseEventKey
FOR XML PATH('')
), 1, 2, ''
)
FROM (
SELECT DISTINCT
ces.CourseEventKey,
f.Title
FROM CourseEventSchedule ces
INNER JOIN Facility f ON f.FacilityKey = ces.FacilityKey
WHERE ces.CourseEventKey IN
(
SELECT CourseEventKey
FROM #CourseEvents
)
) f1
It produces this result set:
CourseEventKey Titles
-------------- ----------------------------------
29 Test Facility 1
30 Memphis Training Room
32 Drury Inn & Suites Creve Coeur
The data is accurate, but I can't have FOR XML PATH('') because it escapes certain special characters.
To be clear, I'm using FOR XML PATH('') because it is possible for records with the same CourseEventKey to have multiple Facility titles associated with them.
How can I retain the data returned by this query without using FOR XML PATH('')?
Change the "for xml path('') )" section to "for xml path(''), root('root'), type).query('root').value ('.', 'varchar(max)')" this will unescape the characters correctly.
Sorry for the poor formatting but not at my computer right now. I can give a full example later if you need.
I have two tables, one containing a list of files and one containing a list of tags which are linked by a fileID.
Currently I select this as follows which works fine so far.
How do I have to amend this if I want to count the tags per file and show this in addition to the selected data ?
What I want to do is show how many tags are assigned to each file.
My SP:
SELECT C.fileTitle,
C.fileID,
(
SELECT T.fileTag
FROM Files_Tags T
WHERE T.fileID = C.fileID
ORDER BY T.fileTag
FOR XML PATH(''), ELEMENTS, TYPE
) AS tags
FROM Files C
ORDER BY C.fileTitle
FOR XML PATH('files'), ELEMENTS, TYPE, ROOT('root')
Many thanks for any help with this, Tim.
You can add a subquery:
SELECT C.fileTitle,
C.fileID,
(
SELECT COUNT(*)
FROM Files_Tags T
WHERE T.fileID = C.fileID
) AS NumTags,
(
SELECT T.fileTag
FROM Files_Tags T
WHERE T.fileID = C.fileID
ORDER BY T.fileTag
FOR XML PATH(''), ELEMENTS, TYPE
) AS tags
You could also put in a join and aggregation in the outer query. But, your query already has to use a nested select for the concatenation, so you might as well use the same structure for the count.
Can't you just do this?:
SELECT C.fileTitle,
C.fileID,
(
SELECT T.fileTag
FROM Files_Tags T
WHERE T.fileID = C.fileID
ORDER BY T.fileTag
FOR XML PATH(''), ELEMENTS, TYPE
) AS tags,
(
SELECT
COUNT(*)
FROM
Files_Tags T
WHERE
T.fileID = C.fileID
) AS NumberOfTages
FROM Files C
ORDER BY C.fileTitle
FOR XML PATH('files'), ELEMENTS, TYPE, ROOT('root')
Anding a sub query for the count
I am trying to generate
SELECT DISTINCT
P.DOMAIN_ID,
P.SOURCE_SYSTEM_ID
FROM EDW.dbo.DOMAIN_VALUE AS P
WHERE P.ID = 4
AND CURRENT_FLAG = 'Y'
EXCEPT
( SELECT F.DOMAIN_ID,
F.SOURCE_SYSTEM_ID
FROM EDW.dbo.DOMAIN AS F
WHERE F.ID = 4
AND F.CURRENT_FLAG = 'Y'
)
FOR XML PATH('DOMAIN'),
ROOT('DOMAIN_VALUE')
The output value in XML in Result tab as
<REFERENCE_DOMAIN_VALUE>
<REFERENCE_DOMAIN>
<REFERENCE_DOMAIN_ID>10799</REFERENCE_DOMAIN_ID>
<REFERENCE_SOURCE_SYSTEM_ID>7452-001</REFERENCE_SOURCE_SYSTEM_ID>
</REFERENCE_DOMAIN>
</REFERENCE_DOMAIN_VALUE>
Now I need to convert this XML out to varchar(max) but the result needs to be same.
Just subquery it into a scalar value and convert it. The trick here is that FOR XML in a subquery and EXCEPT on top don't mix, so subquery the EXCEPT part first.
SELECT CONVERT(varchar(max), (
SELECT * FROM (
SELECT DISTINCT P.DOMAIN_ID, P.SOURCE_SYSTEM_ID
FROM EDW.dbo.DOMAIN_VALUE AS P
WHERE P.ID = 4 AND CURRENT_FLAG = 'Y'
EXCEPT (
SELECT F.DOMAIN_ID, F.SOURCE_SYSTEM_ID
FROM EDW.dbo.DOMAIN AS F
WHERE F.ID = 4 AND F.CURRENT_FLAG = 'Y' )
) I
FOR XML PATH('DOMAIN'), ROOT('DOMAIN_VALUE')
))