Combining row values from multiple tables into one result cell - sql

I'm looking to create a report of sorts and am having a hard time wrapping my head around how this portion could be done with a single select in SQL (my experience is limited to a database course and some working knowledge - more of a front end dev).
I should mention that joining the question table and question tag bindings/tags table isn't an issue for me - what I can't wrap my head around is how multiple values could be added to the same result cell without multiple nasty T-SQL loops.
Any tips on how to get started would be a huge help.
Table 1: Question Table
ID Content CategoryName
---------------------------
1 ABC Q1
2 DEF Q3
3 GEH Q3
Table 2: Tag Table
Tag Id Tag Name
---------------------------------
4 Dream
5 Light
6 Recover
Table 3: Question Tag Bindings
BoundQuestion ID BoundTagId
---------------------------------
1 4
2 5
3 6
3 4
Desired Result Table (Question table with added Tags column)
ID Content CategoryName Tags
----------------------------------------
1 ABC Q1 Dream
2 DEF Q3 Light
3 GEH Q3 Recover, Light
Thanks to anybody who looks at this, hope you're all staying safe.

You could join the three tables and aggregate to generate tag list. I guess that a lateral join should also be an efficient option here, since it avoids outer aggregation:
select q.*, t.*
from questions q
outer apply(
select string_agg(tag_name, ', ') tags
from questionTags qt
inner join tags t on t.TagID = qt.BoundTagID
where qt.BoundQuestionID = q.ID
) t
Note that string_agg() was added in SQL Server 2017.
In earlier versions, we can resort the for xml path solution:
select
q.*,
stuff(
(
', ' + tag_name tags
from questionTags qt
inner join tags t on t.TagID = qt.BoundTagID
where qt.BoundQuestionID = q.ID
order by tag_name
for xml path('')
),
1, 2, ''
) tags
from questions q

Related

SQL Server 'AS' alias unexpected syntax

I've come across following T-SQL today:
select c from (select 1 union all select 1) as d(c)
that yields following result:
c
-----------
1
1
The part that got me confused was d(c)
While trying to understand what's going on I've modified T-SQL into:
select c, b from (select 1, 2 union all select 3, 4) m(c, b)
which yields following result:
c b
----------- -----------
1 2
3 4
It was clear that d & m are table reference while letters in brackets c & b are reference to columns.
I wasn't able to find relevant documentation on msdn, but curious if
You're aware of such syntax?
What would be useful use case scenario?
select c from (select 1 union all select 1) as d(c)
is the same as
select c from (select 1 as c union all select 1) as d
In the first query you did not name the column(s) in your subquery, but named them outside the subquery,
In the second query you name the column(s) inside the subquery
If you try it like this (without naming the column(s) in the subquery)
select c from (select 1 union all select 1) as d
You will get following error
No column name was specified for column 1 of 'd'
This is also in the Documentation
As for the usage, some like to write it the first method, some in the second, whatever you prefer. It's all the same
An observation: Using the table constructor values gives you no way of naming the columns, which makes it neccessary to use column naming after the table alias:
select * from
(values
(1,2) -- can't give a column name here
,(3,4)
) as tableName(column1,column2) -- gotta do it here
You've already had comments that point you to the documentation of how derived tables work, but not to answer you question regarding useful use cases for this functionality.
Personally I find this functionality to be useful whenever I want to create a set of addressable values that will be used extensively in your statement, or when I want to duplicate rows for whatever reason.
An example of addressable values would be a much more compelx version of the following, in which the calculated values in the v derived table can be used many times over via more sensible names, rather than repeated calculations that will be hard to follow:
select p.ProductName
,p.PackPricePlusVAT - v.PackCost as GrossRevenue
,etc
from dbo.Products as p
cross apply(values(p.UnitsPerPack * p.UnitCost
,p.UnitPrice * p.UnitsPerPack * 1.2
,etc
)
) as v(PackCost
,PackPricePlusVAT
,etc
)
and an example of being able to duplicate rows could be in creating an exception report for use in validating data, which will output one row for every DataError condition that the dbo.Product row satisfies:
select p.ProductName
,e.DataError
from dbo.Products as p
cross apply(values('Missing Units Per Pack'
,case when p.SoldInPacks = 1 and isnull(p.UnitsPerPack,0) < 1 then 1 end
)
,('Unusual Price'
,case when p.Price > (p.UnitsPerPack * p.UnitCost) * 2 then 1 end
)
,(etc)
) as e(DataError
,ErrorFlag
)
where e.ErrorFlag = 1
If you can understand what these two scripts are doing, you should find numerous examples of where being able to generate additional values or additional rows of data would be very helpful.

Alternative for GROUP BY and STUFF in SQL

I am writing some SQL queries in AWS Athena. I have 3 tables search, retrieval and intent. In search table I have 2 columns id and term i.e.
id term
1 abc
1 bcd
2 def
1 ghd
What I want is to write a query to get:
id term
1 abc, bcd, ghd
2 def
I know this can be done using STUFF and FOR XML PATH but, in Athena all the features of SQL are yet not supported. Is there any other way to achieve this. My current query is:
select search.id , STUFF(
(select ',' + search.term
from search
FOR XML PATH('')),1,1,'')
FROM search
group by search.id
Also, I have one more question. I have retrieval table that consist of 3 columns i.e.:
id time term
1 0 abc
1 20 bcd
1 100 gfh
2 40 hfg
2 60 lkf
What I want is:
id time term
1 100 gfh
2 60 lkf
I want to write a query to get the id and term on the basis of max value of time. Here is my current query:
select retrieval.id, max(retrieval.time), retrieval.term
from search
group by retrieval.id, retrieval.term
order by max(retrieval.time)
I am getting duplicate id's along with the term. I think it is because, I am doing group by on id and term both. But, I am not sure how can I achieve it without using group by.
The XML method is brokenness in SQL Server. No reason to attempt it in any other database.
One method uses arrays:
select s.id, array_agg(s.term)
from search s
group by s.id;
Because the database supports arrays, you should learn to use them. You can convert the array to a string:
select s.id, array_join(array_agg(s.term), ',') as terms
from search s
group by s.id;
Group by is a group operation: think that you are clubbing the results and have to find min, max, count etc.
I am answering only one question. Use it to find the answer to question 1
For question 2:
select
from (select id, max(time) as time
from search
group by id, term
order by max(time)
) search_1, search as search_2
where search_1.id = search_2.id
and search_1.time = search_2.time

SQL list multiple Duplicates

running a SQL query in access that is giving me matches where A = record 1, and B also = record 1 , C= record 2 and D E and F also = record 2.
I want my results to display (only max Value)
B =record 1
F= record 2. ( this is a matching query)
basically i want to eliminate duplicates and select "distinct" does not seem to be working for me.
SELECT
FEED_2.ID AS FEED_2_ID,
FEED_3.field_ID,
FEED_3.ID AS FEED_3_ID
FROM FEED_2 INNER JOIN FEED_3 ON FEED_2.[field_ID] = FEED_3.[field_ID]
order by FEED_3.ID
im getting results where feed 2 ID #1,3, and 5 all equal feed 3 - ID #1
i only want feed 2, #5 = feed 3 #1. no Dupes
sorry - hope that helps
It's a shot in the dark but, is something like this you are looking for?
SELECT max(Column_With_ABCDEF), Column_With_record from TABLE_NAME GROUP BY Column_With_record;
If this is not what you are asking for, please do edit your question with your table schema and/or the query you are using so we can help.
---------------- EDIT ----------------
Ok so you can try this:
Select max(FEED_2_ID), field_ID , FEED_3_ID
from (
SELECT FEED_2.ID AS FEED_2_ID, FEED_3.field_ID As field_ID, FEED_3.ID AS FEED_3_ID
FROM FEED_2 INNER JOIN FEED_3
ON FEED_2.[field_ID] = FEED_3.[field_ID]
)
GROUP BY FEED_3_ID, field_ID
ORDER BY FEED_3_ID
The main select is going to group the result from the subquery, that way you should not get duplicated values.
Hope this help

Get every possible combination of product related tags

The database structure is like this:
The Tags table looks something like this:
ID NAME
----------------------
1 Blue
2 Green
3 Small
4 Large
5 Red
They would be related to products through the ProductTag table.
What I'm trying to return is every single combination of tags related to a product, like this:
IDs TAGS
----------------------
1 Blue
2 Green
1,3 Small, Blue
2,3 Small, Green
1,4 Large, Blue
2,4 Large, Green
3 Small
4 Large
5 Red
(Every single one of these combinations has products)
I think SQL 2005 has something called WITH CUBE to help accomplish something like this, but unfortunately this doesn't seem to work in SQL 2008. Does anyone know how to accomplish this?
Something like this must help you.
;WITH TagsCTE
AS
(
SELECT P.ProductID, T.ID,T.Name
FROM ProductTags P
JOIN Tag T
ON P.TagID = T.ID
)
SELECT
STUFF((SELECT ',' + CAST(ID as varchar) FROM TagsCTE TC WHERE TC.ProductID = TT.ProductID FOR XML PATH('')),1,1,'') IDs,
STUFF((SELECT ',' + Name FROM TagsCTE TC WHERE TC.ProductID = TT.ProductID FOR XML PATH('')),1,1,'') Tags
FROM TagsCTE TT
GROUP BY ProductID
SQL FIDDLE DEMO

Concatenate several fields into one with SQL

I have three tables tag, page, pagetag
With the data below
page
ID NAME
1 page 1
2 page 2
3 page 3
4 page 4
tag
ID NAME
1 tag 1
2 tag 2
3 tag 3
4 tag 4
pagetag
ID PAGEID TAGID
1 2 1
2 2 3
3 3 4
4 1 1
5 1 2
6 1 3
I would like to get a string containing the correspondent tag names for each page with SQL in a single query. This is my desired output.
ID NAME TAGS
1 page 1 tag 1, tag 2, tag 3
2 page 2 tag 1, tag 3
3 page 3 tag 4
4 page 4
Is this possible with SQL?
I am using MySQL. Nonetheless, I would like a database vendor independent solution if possible.
Sergio del Amo:
However, I am not getting the pages without tags. I guess i need to write my query with left outer joins.
SELECT pagetag.id, page.name, group_concat(tag.name)
FROM
(
page LEFT JOIN pagetag ON page.id = pagetag.pageid
)
LEFT JOIN tag ON pagetag.tagid = tag.id
GROUP BY page.id;
Not a very pretty query, but should give you what you want - pagetag.id and group_concat(tag.name) will be null for page 4 in the example you've posted above, but the page shall appear in the results.
Yep, you can do it across the 3 something like the below:
SELECT page_tag.id, page.name, group_concat(tags.name)
FROM tag, page, page_tag
WHERE page_tag.page_id = page.page_id AND page_tag.tag_id = tag.id;
Has not been tested, and could be probably be written a tad more efficiently, but should get you started!
Also, MySQL is assumed, so may not play so nice with MSSQL! And MySQL isn't wild about hyphens in field names, so changed to underscores in the above examples.
As far as I'm aware SQL92 doesn't define how string concatenation should be done. This means that most engines have their own method.
If you want a database independent method, you'll have to do it outside of the database.
(untested in all but Oracle)
Oracle
SELECT field1 | ', ' | field2
FROM table;
MS SQL
SELECT field1 + ', ' + field2
FROM table;
MySQL
SELECT concat(field1,', ',field2)
FROM table;
PostgeSQL
SELECT field1 || ', ' || field2
FROM table;
I got a solution playing with joins. The query is:
SELECT
page.id AS id,
page.name AS name,
tagstable.tags AS tags
FROM page
LEFT OUTER JOIN
(
SELECT pagetag.pageid, GROUP_CONCAT(distinct tag.name) AS tags
FROM tag INNER JOIN pagetag ON tagid = tag.id
GROUP BY pagetag.pageid
)
AS tagstable ON tagstable.pageid = page.id
GROUP BY page.id
And this will be the output:
id name tags
---------------------------
1 page 1 tag2,tag3,tag1
2 page 2 tag1,tag3
3 page 3 tag4
4 page 4 NULL
Is it possible to boost the query speed writing it another way?
pagetag.id and group_concat(tag.name) will be null for page 4 in the example you've posted above, but the page shall appear in the results.
You can use the COALESCE function to remove the Nulls if you need to:
select COALESCE(pagetag.id, '') AS id ...
It will return the first non-null value from it's list of parameters.
I think you may need to use multiple updates.
Something like (not tested):
select ID as 'PageId', Name as 'PageName', null as 'Tags'
into #temp
from [PageTable]
declare #lastOp int
set #lastOp = 1
while #lastOp > 0
begin
update p
set p.tags = isnull(tags + ', ', '' ) + t.[Tagid]
from #temp p
inner join [TagTable] t
on p.[PageId] = t.[PageId]
where p.tags not like '%' + t.[Tagid] + '%'
set #lastOp == ##rowcount
end
select * from #temp
Ugly though.
That example's T-SQL, but I think MySql has equivalents to everything used.