Identify Duplicate Xml Nodes - sql

I have a set of tables (with several one-many relationships) that form a single "unit". I need to ensure that we weed out duplicates, but determining duplicates requires consideration of all the data.
To make matters worse, the DB in question is still in Sql 2000 compatibility mode, so it can't use any newer features.
Create Table UnitType
(
Id int IDENTITY Primary Key,
Action int not null,
TriggerType varchar(25) not null
)
Create Table Unit
(
Id int IDENTITY Primary Key,
TypeId int Not Null,
Message varchar(100),
Constraint FK_Unit_Type Foreign Key (TypeId) References UnitType(Id)
)
Create Table Item
(
Id int IDENTITY Primary Key,
QuestionId int not null,
Sequence int not null
)
Create Table UnitCondition
(
Id int IDENTITY Primary Key,
UnitId int not null,
Value varchar(10),
ItemId int not null
Constraint FK_UnitCondition_Unit Foreign Key (UnitId) References Unit(Id),
Constraint FK_UnitCondition_Item Foreign Key (ItemId) References Item(Id)
)
Insert into Item (QuestionId, Sequence)
Values (1, 1),
(1, 2)
Insert into UnitType(Action, TriggerType)
Values (1, 'Changed')
Insert into Unit (TypeId, Message)
Values (1, 'Hello World'),
(1, 'Hello World')
Insert into UnitCondition(UnitId, Value, ItemId)
Values (1, 'Test', 1),
(1, 'Hello', 2),
(2, 'Test', 1),
(2, 'Hello', 2)
I've created a SqlFiddle demonstrating a simple form of this issue.
A Unit is considered a Duplicate with all (non-Id) fields on the Unit, and all conditions on that Unit combined are exactly matched in every detail. Considering it like Xml - A Unit Node (containing the Unit info, and a Conditions sub-collection) is unique if no other Unit node exists that is an exact string copy
Select
Action,
TriggerType,
U.TypeId,
U.Message,
(
Select C.Value, C.ItemId, I.QuestionId, I.Sequence
From UnitCondition C
Inner Join Item I on C.ItemId = I.Id
Where C.UnitId = U.Id
For XML RAW('Condition')
) as Conditions
from UnitType T
Inner Join Unit U on T.Id = U.TypeId
For XML RAW ('Unit'), ELEMENTS
But the issue I have is that I can't seem to get the XML for each Unit to appear as a new record, and I'm not sure how to compare the Unit Nodes to look for Duplicates.
How Can I run this query to determine if there are duplicate Xml Unit nodes within the collection?

If you want to determine whether record is duplicate or not, you don't need to combine all values into one string. You can do this with ROW_NUMBER function like this:
SELECT
Action,
TriggerType,
U.Id,
U.TypeId,
U.Message,
C.Value,
I.QuestionId,
I.Sequence,
ROW_NUMBER () OVER (PARTITION BY <LIST OF FIELD THAT SHOULD BE UNIQUE>
ORDER BY <LIST OF FIELDS>) as DupeNumber
FROM UnitType T
Inner Join Unit U on T.Id = U.TypeId
Inner Join UnitCondition C on U.Id = C.UnitId
Inner Join Item I on C.ItemId = I.Id;
If DupeNumber is greater than 1, then record id a duplicate.

give this a try
this would find the pairs not unique
how to build that into you final answer - not sure - but possibly a start
select u1.id, u2.id
from unit as u1
join unit as u2
on ui.ID < u2.id
join UnitCondition uc1
on uc1.unitID = u1.ID
full outer join uc2
on uc2.unitID = u2.ID
and uc2.itemID = uc1.itemID
where uc2.itemID is null or uc1.itemID is null

So, I managed to figure out what I needed to do. It's a little clunky though.
First, you need to wrap the Xml Select statement in another select against the Unit table, in order to ensure that we end up with xml representing only that unit.
Select
Id,
(
Select
Action,
TriggerType,
IU.TypeId,
IU.Message,
(
Select C.Value, I.QuestionId, I.Sequence
From UnitCondition C
Inner Join Item I on C.ItemId = I.Id
Where C.UnitId = IU.Id
Order by C.Value, I.QuestionId, I.Sequence
For XML RAW('Condition'), TYPE
) as Conditions
from UnitType T
Inner Join Unit IU on T.Id = IU.TypeId
WHERE IU.Id = U.Id
For XML RAW ('Unit')
)
From Unit U
Then, you can wrap this in another select, grouping the xml up by content.
Select content, count(*) as cnt
From
(
Select
Id,
(
Select
Action,
TriggerType,
IU.TypeId,
IU.Message,
(
Select C.Value, C.ItemId, I.QuestionId, I.Sequence
From UnitCondition C
Inner Join Item I on C.ItemId = I.Id
Where C.UnitId = IU.Id
Order by C.Value, I.QuestionId, I.Sequence
For XML RAW('Condition'), TYPE
) as Conditions
from UnitType T
Inner Join Unit IU on T.Id = IU.TypeId
WHERE IU.Id = U.Id
For XML RAW ('Unit')
) as content
From Unit U
) as data
group by content
having count(*) > 1
This will allow you to group entire units where the whole content is identical.
One thing to watch out for though, is that to test "uniqueness", you need to guarantee that the data on the inner Xml selection(s) is always the same. To that end, you should apply ordering on the relevant data (i.e. the data in the xml) to ensure consistency. What order you apply doesn't really matter, so long as two identical collections will output in the same order.

Related

How to retrieve the properties stored in SQL with multiple inheritance

I'm storing the records in SQL that represent a multiple inheritance relationship similar to the one in C++. Like that:
CREATE TABLE Classes
(
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE Inheritance
(
class_id INTEGER NOT NULL,
base_class_id INTEGER NOT NULL,
FOREIGN KEY (class_id) REFERENCES Classes(id),
FOREIGN KEY (base_class_id) REFERENCES Classes(id)
);
The classes have properties of two types. These properties are inherited by the classes, but in different ways. The first type type of property whenever defined for the class overrides the value of the same property used in any of base classes. The other type accumulates the value: the property is actually a set of values, each class inherits all values of it's base classes, plus may add an additional (single) value to this set:
CREATE TABLE OverridableValues
(
class_id INTEGER PRIMARY KEY,
value TEXT NOT NULL,
FOREIGN KEY (class_id) REFERENCES Classes(id)
);
CREATE TABLE AccumulableValues
(
class_id INTEGER PRIMARY KEY,
value TEXT NOT NULL,
FOREIGN KEY (class_id) REFERENCES Classes(id)
);
The caveat with OverridableValues: there are no cases when the same property is overridden on different paths of multiple inheritance.
I'm trying to design queries using common table expressions that would return the value/values for a given property and class.
The approach that I'm trying to use is to start from the root (assume for simplicity that there is a single root class), and then to build the tree of paths from the root to every other class. The problem is how to pass the information about properties from the parents to children. For example below is an incorrect attempt to do that:
WITH ParentProperty (id, value) AS
(
SELECT c.id, a.value
FROM Classes c
LEFT JOIN AccumulableValues a
ON a.class_id = c.id
WHERE c.id = 1 --This is the root
UNION ALL
SELECT i.class_id, IFNULL(a.value, ba.value)
FROM ParentProperty p
JOIN Inheritance i
ON i.base_class_id = p.id
LEFT JOIN AccumulableValues a
ON a.class_id = i.class_id
LEFT JOIN AccumulableValues ba
ON ba.class_id = i.base_class_id
)
SELECT id, value
FROM ParentProperty;
I feel like I need one more UNION ALL inside the CTE, which is not allowed. But without it I either miss proper values or inherited ones. So far I've failed to design the query for both types of properties.
I'm using SQLite as my database engine.
Finally I've found a solution. I'm describing it below, but more efficient ones are still welcomed.
Let's start with the Accumulable property. My problem was that I tried to add more than one UNION ALL into a single CTE. I've solved that with adding additional CTE (see the AcquiresFrom)
WITH AcquiresFrom (class_id, from_class_id, value) AS
(
SELECT a.class_id, a.class_id, a.value
FROM AccumulatableValues a
UNION ALL
SELECT i.class_id, i.base_class_id, NULL
FROM Inheritance i
),
ClassProperty (class_id, value) AS
(
SELECT c.id, NULL
FROM Classes c
LEFT JOIN Inheritance i
ON i.class_id = c.id
WHERE i.base_class_id IS NULL
UNION ALL
SELECT a.class_id, IFNULL(a.value, p.value)
FROM ClassProperty p
JOIN AcquiresFrom a
ON (a.from_class_id = p.class_id AND a.from_class_id != a.class_id) OR
(a.class_id = p.class_id AND a.class_id = a.from_class_id AND p.value IS NULL)
)
SELECT DISTINCT class_id, value
FROM ClassProperty
WHERE value IS NOT NULL
ORDER BY class_id;
The AcquiresFrom means the way to aquire the value: the class either introduces a new value (the first clause) or to inherits it (the second clause). The ClassProperty incrementally propagates the values from base classes to derived. The only thing left to do is to eliminate duplicates and NULL values (the last clause SELECT DISTINCT / WHERE value IS NOT NULL).
The overridable property is more complex.
WITH Roots (id, value) AS
(
SELECT c.id, o.value
FROM Classes c
LEFT JOIN Inheritance i
ON i.class_id = c.id
LEFT JOIN OverridableValues o
ON o.class_id = c.id
WHERE i.base_class_id IS NULL
),
PossibleValues (id, acquired_from_id, value) AS
(
SELECT r.id, r.id, r.value
FROM Roots r
UNION ALL
SELECT i.class_id, CASE WHEN o.value IS NULL THEN p.acquired_from_id ELSE i.class_id END, IFNULL(o.value, p.value)
FROM PossibleValues p
JOIN Inheritance i
ON i.base_class_id = p.id
LEFT JOIN OverridableValues o
ON o.class_id = i.class_id
),
Split (class_id, base_class_id, direct) AS (
SELECT i.class_id, i.base_class_id, 1
FROM Inheritance i
UNION ALL
SELECT i.class_id, i.base_class_id, 0
FROM Inheritance i
),
Ancestors (id, ancestor_id) AS (
SELECT r.id, NULL
FROM Roots r
UNION ALL
SELECT s.class_id, CASE WHEN s.direct == 1 THEN a.id ELSE a.ancestor_id END
FROM Ancestors a
JOIN Split s
ON s.base_class_id = a.id
)
SELECT DISTINCT p.id, p.value
FROM PossibleValues p
WHERE p.acquired_from_id NOT IN
(
SELECT a.ancestor_id
FROM PossibleValues p1
JOIN PossibleValues p2
ON p2.id = p1.id
JOIN Ancestors a
ON a.id = p1.acquired_from_id AND a.ancestor_id = p2.acquired_from_id
WHERE p1.id = p.id
);
The Roots is obviously the list of classes that have no parents. The PossibleValues CTE propagates/overrides the values from roots to final classes, and breaks multiple inheritance cycles making the structure a tree-like. All valid id/value pairs are present in the result of this query, however some invalid values are present as well. These invalid values are those that were overridden on one of the branches, but this fact is not known on another branch. The acquired_from_id allows us to reconstruct who was that class that first introduced this value (that may be useful whenever two different classes intruduce the same value).
The last thing left is to resolve the ambiguity caused by multiple inheritance. Knowing the class and two possible values we need to know whether one value overrides the other. That is resolved with the Ancestors expression.

How to insert data in multiple rows of temp tables in sql

How I can insert in same row for example I want to insert all these columns data in first row then second and so on. But my query is inserting data when customer name data is complete, status data is inserted after one row of customer number last data.
CREATE TABLE #tblCustomer
(
CustomerNumber NVARCHAR(1000),
Status NVARCHAR (1000),
CustomerType NVARCHAR (1000)
)
INSERT
INTO #tblCustomer (CustomerNumber)
Select c.CustomerNumber
From Customer.Customer c
INSERT
INTO #tblCustomer (Status)
Select ses.Name
From Customer.Customer c
Left Outer Join COM.StatusEngine_EntityStatus sees
On c.Status = sees.EntityStatusId
And sees.EntityId = 'CustomerStatus'
Join COM.StatusEngine_Status ses
On sees.Status = ses.Status
INSERT
INTO #tblCustomer (CustomerType)
select t.Description
From Customer.Customer c
Join Customer.Type t
On c.TypeId = t.pkTypeId
Receiving output:
0001 null null
0002 null null
NULL active null
NULL active null
NULL null individual
NULL null individual
Expected Output:
0001 active individual
0002 active individual
Without knowing more about your tables, you can insert the first records like so...
INSERT INTO #tblCustomer (CustomerNumber)
select c.CustomerNumber from Customer.Customer c
And then update the remaining columns this way...
UPDATE #tblCustomer
set #tblCustomer.Status = c.Status
from Customer.Customer c
left outer join COM.StatusEngine_EntityStatus sees
on c.Status = sees.EntityStatusId and sees.EntityId = 'CustomerStatus'
join COM.StatusEngine_Status ses
on sees.Status = ses.Status
join #tblCustomer temp
on c.CustomerNumber = temp.CustomerNumber
However doing it like this is really inefficient, you should strive to create an insert that updates all columns in one go.
You can do it like this (I have verified the code with the Northwind sample database from Microsoft - I have chosen that one since you can use it for each SQL server version since SQL 2000):
declare #NumberOfItems int = 10;
CREATE TABLE #tblCustomer (
CustomerNumber NVARCHAR(1000)
,Name NVARCHAR (1000)
,CustomerType NVARCHAR (1000))
insert into #tblCustomer
select CustomerNumber, Name, Status from (select top(#NumberOfItems) ROW_NUMBER() OVER(ORDER BY CustomerID) as No, CustomerID as CustomerNumber from Customers) c
left join (select * from (select top(#NumberOfItems) ROW_NUMBER() OVER(ORDER BY ContactName) as No, ContactName as Name from Customers) q2) j1 on c.No=j1.No
left join (select * from (select top(#NumberOfItems) ROW_NUMBER() OVER(ORDER BY ContactTitle) as No, ContactTitle as Status from Customers) q3) j2 on c.No=j2.No
select * from #tblCustomer
drop table #tblCustomer
It will create a column with numbers from 1 to n for each element you want to import and then it joins it together.
The result of this query is:
Note: While this works, it is not the preferred way to do it, because there is no primary key - normally one would look for primary key / foreign key relationships to join the data together. The way you're intending to fill it puts data together which doesn't necessarily belong together (here each column is sorted and then put together by its row number - i.e. it picks values from rows sorted by its extract column and then putting them together again). If you have no primary key because you're importing data from other sources, you can add WHERE clauses to create a better connection between the inner and the outer select statements - you can find a nice article which might help you with such kind of subqueries here.
This is untested, however, I believe this is what you're after:
INSERT INTO #tblCustomer (CustomerNumber, [Status], CustomerType))
SELECT c.CustomerNumber, ses.[Name], t.[Description]
FROM Customer.Customer c
JOIN COM.StatusEngine_EntityStatus sees ON c.Status = sees.EntityStatusId --Changed to JOIN, as it is turned into a implicit INNER join by the next JOIN
AND sees.EntityId = 'CustomerStatus'
JOIN COM.StatusEngine_Status ses ON sees.[Status] = ses.[Status];
Note my comment regarding your LEFT OUTER JOIN, in that I've changed it to an INNER JOIN.
straight forward SQL here:
CREATE TABLE #tblCustomer
(
CustomerNumber NVARCHAR(1000),
Status NVARCHAR (1000),
CustomerType NVARCHAR (1000)
)
INSERT INTO #tblCustomer (CustomerNumber, Status, CustomerType)
SELECT DISTINCT
c.CustomerNumber,
ses.Name,
t.Description
FROM Customer.Customer c
LEFT OUTER JOIN COM.StatusEngine_EntityStatus sees
On c.Status = sees.EntityStatusId
And sees.EntityId = 'CustomerStatus'
LEFT OUTER JOIN COM.StatusEngine_Status ses
On sees.Status = ses.Status
LEFT OUTER JOIN Customer.Type t
On c.TypeId = t.pkTypeId

SELECT from subquery without having to specify all columns in GROUP BY

Idea is to query an article table where an article has a given tag, and then to STRING_AGG all (even unrelated) tags that belong to that article row.
Example tables and query:
CREATE TABLE article (id SERIAL, body TEXT);
CREATE TABLE article_tag (article INT, tag INT);
CREATE TABLE tag (id SERIAL, title TEXT);
SELECT DISTICT ON (id)
q.id, q.body, STRING_AGG(q.tag_title, '|') tags
FROM (
SELECT a.*, tag.title tag_title
FROM article a
LEFT JOIN article_tag x ON a.id = tag.article
LEFT JOIN tag ON tag.id = x.tag
WHERE tag.title = 'someTag'
) q
GROUP BY q.id
Running the above, postgres require that the q.body must be included in GROUP BY:
ERROR: column "q.body" must appear in the GROUP BY clause or be used in an aggregate function
As I understand it, it's because subquery q doesn't include any PRIMARY key.
I naively thought that the DISTINCT ON would supplement that, but it doesn't seem so.
Is there a way to mark a column in a subquery as PRIMARY so that we don't have to list all columns in GROUP BY clause?
If we do have to list all columns in GROUP BY clause, does that incur significant perf cost?
EDIT: to elaborate, since PostgreSQL 9.1 you don't have to supply non-primary (i.e. functionally dependent) keys when using GROUP BY, e.g. following query works fine:
SELECT a.id, a.body, STRING_AGG(tag.title, '|') tags
FROM article a
LEFT JOIN article_tag x ON a.id = tag.article
LEFT JOIN tag ON tag.id = x.tag
GROUP BY a.id
I was wondering if I can leverage the same behavior, but with a subquery (by somehow indicating that q.id is a PRIMARY key).
It sadly doesn't work when you wrap your primary key in subquery and I don't know of any way to "mark it" as you suggested.
You can try this workaround using window function and distinct:
CREATE TABLE test1 (id serial primary key, name text, value text);
CREATE TABLE test2 (id serial primary key, test1_id int, value text);
INSERT INTO test1(name, value)
values('name1', 'test01'), ('name2', 'test02'), ('name3', 'test03');
INSERT INTO test2(test1_id, value)
values(1, 'test1'), (1, 'test2'), (3, 'test3');
SELECT DISTINCT ON (id) id, name, string_agg(value2, '|') over (partition by id)
FROM (SELECT test1.*, test2.value AS value2
FROM test1
LEFT JOIN test2 ON test2.test1_id = test1.id) AS sub;
id name string_agg
1 name1 test1|test2
2 name2 null
3 name3 test3
Demo
Problem is in outer SELECT - you should either aggregate columns either
group by them. Postgres wants you to specify what to do with q.body - group by it or calculate aggregate. Looks little bit awkward but should work.
SELECT DISTICT ON (id)
q.id, q.body, STRING_AGG(q.tag_title, '|') tags
FROM (
SELECT a.*, tag.title tag_title
FROM article a
LEFT JOIN article_tag x ON a.id = tag.article
LEFT JOIN tag ON tag.id = x.tag
WHERE tag.title = 'someTag'
) q
GROUP BY q.id, q.body
-- ^^^^^^
Another way is to make a query to get id and aggregated tags then join body to it. If you wish I can make an example.

How can I condense strings in several string rows into a single field?

For a class project, a few others and I have decided to make a (very ugly) limited clone of StackOverflow. For this purpose, we're working on one query:
Home Page: List all the questions, their scores (calculated from votes), and the user corresponding to their first revision, and the number of answers, sorted in date-descending order according to the last action on the question (where an action is an answer, an edit of an answer, or an edit of the question).
Now, we've gotten the entire thing figured out, except for how to represent tags on questions. We're currently using a M-N mapping of tags to questions like this:
CREATE TABLE QuestionRevisions (
id INT IDENTITY NOT NULL,
question INT NOT NULL,
postDate DATETIME NOT NULL,
contents NTEXT NOT NULL,
creatingUser INT NOT NULL,
title NVARCHAR(200) NOT NULL,
PRIMARY KEY (id),
CONSTRAINT questionrev_fk_users FOREIGN KEY (creatingUser) REFERENCES
Users (id) ON DELETE CASCADE,
CONSTRAINT questionref_fk_questions FOREIGN KEY (question) REFERENCES
Questions (id) ON DELETE CASCADE
);
CREATE TABLE Tags (
id INT IDENTITY NOT NULL,
name NVARCHAR(45) NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE QuestionTags (
tag INT NOT NULL,
question INT NOT NULL,
PRIMARY KEY (tag, question),
CONSTRAINT qtags_fk_tags FOREIGN KEY (tag) REFERENCES Tags(id) ON
DELETE CASCADE,
CONSTRAINT qtags_fk_q FOREIGN KEY (question) REFERENCES Questions(id) ON
DELETE CASCADE
);
Now, for this query, if we just join to QuestionTags, then we'll get the questions and titles over and over and over again. If we don't, then we have an N query scenario, which is just as bad. Ideally, we'd have something where the result row would be:
+-------------+------------------+
| Other Stuff | Tags |
+-------------+------------------+
| Blah Blah | TagA, TagB, TagC |
+-------------+------------------+
Basically -- for each row in the JOIN, do a string join on the resulting tags.
Is there a built in function or similar which can accomplish this in T-SQL?
Here's one possible solution using recursive CTE:
The methods used are explained here
TSQL to set up the test data (I'm using table variables):
DECLARE #QuestionRevisions TABLE (
id INT IDENTITY NOT NULL,
question INT NOT NULL,
postDate DATETIME NOT NULL,
contents NTEXT NOT NULL,
creatingUser INT NOT NULL,
title NVARCHAR(200) NOT NULL)
DECLARE #Tags TABLE (
id INT IDENTITY NOT NULL,
name NVARCHAR(45) NOT NULL
)
DECLARE #QuestionTags TABLE (
tag INT NOT NULL,
question INT NOT NULL
)
INSERT INTO #QuestionRevisions
(question,postDate,contents,creatingUser,title)
VALUES
(1,GETDATE(),'Contents 1',1,'TITLE 1')
INSERT INTO #QuestionRevisions
(question,postDate,contents,creatingUser,title)
VALUES
(2,GETDATE(),'Contents 2',2,'TITLE 2')
INSERT INTO #Tags (name) VALUES ('Tag 1')
INSERT INTO #Tags (name) VALUES ('Tag 2')
INSERT INTO #Tags (name) VALUES ('Tag 3')
INSERT INTO #Tags (name) VALUES ('Tag 4')
INSERT INTO #Tags (name) VALUES ('Tag 5')
INSERT INTO #Tags (name) VALUES ('Tag 6')
INSERT INTO #QuestionTags (tag,question) VALUES (1,1)
INSERT INTO #QuestionTags (tag,question) VALUES (3,1)
INSERT INTO #QuestionTags (tag,question) VALUES (5,1)
INSERT INTO #QuestionTags (tag,question) VALUES (4,2)
INSERT INTO #QuestionTags (tag,question) VALUES (2,2)
Here's the action part:
;WITH CTE ( id, taglist, tagid, [length] )
AS ( SELECT question, CAST( '' AS VARCHAR(8000) ), 0, 0
FROM #QuestionRevisions qr
GROUP BY question
UNION ALL
SELECT qr.id
, CAST(taglist + CASE WHEN [length] = 0 THEN '' ELSE ', ' END + t.name AS VARCHAR(8000) )
, t.id
, [length] + 1
FROM CTE c
INNER JOIN #QuestionRevisions qr ON c.id = qr.question
INNER JOIN #QuestionTags qt ON qr.question=qt.question
INNER JOIN #Tags t ON t.id=qt.tag
WHERE t.id > c.tagid )
SELECT id, taglist
FROM ( SELECT id, taglist, RANK() OVER ( PARTITION BY id ORDER BY length DESC )
FROM CTE ) D ( id, taglist, rank )
WHERE rank = 1;
This was the solution I ended up settling on. I checkmarked Mack's answer because it works with arbitrary numbers of tags, and because it matches what I asked for in my question. I ended up though going with this, however, simply because I understand what this is doing, while I have no idea how Mack's works :)
WITH tagScans (qRevId, tagName, tagRank)
AS (
SELECT DISTINCT
QuestionTags.question AS qRevId,
Tags.name AS tagName,
ROW_NUMBER() OVER (PARTITION BY QuestionTags.question ORDER BY Tags.name) AS tagRank
FROM QuestionTags
INNER JOIN Tags ON Tags.id = QuestionTags.tag
)
SELECT
Questions.id AS id,
Questions.currentScore AS currentScore,
answerCounts.number AS answerCount,
latestRevUser.id AS latestRevUserId,
latestRevUser.caseId AS lastRevUserCaseId,
latestRevUser.currentScore AS lastRevUserScore,
CreatingUsers.userId AS creationUserId,
CreatingUsers.caseId AS creationUserCaseId,
CreatingUsers.userScore AS creationUserScore,
t1.tagName AS tagOne,
t2.tagName AS tagTwo,
t3.tagName AS tagThree,
t4.tagName AS tagFour,
t5.tagName AS tagFive
FROM Questions
INNER JOIN QuestionRevisions ON QuestionRevisions.question = Questions.id
INNER JOIN
(
SELECT
Questions.id AS questionId,
MAX(QuestionRevisions.id) AS maxRevisionId
FROM Questions
INNER JOIN QuestionRevisions ON QuestionRevisions.question = Questions.id
GROUP BY Questions.id
) AS LatestQuestionRevisions ON QuestionRevisions.id = LatestQuestionRevisions.maxRevisionId
INNER JOIN Users AS latestRevUser ON latestRevUser.id = QuestionRevisions.creatingUser
INNER JOIN
(
SELECT
QuestionRevisions.question AS questionId,
Users.id AS userId,
Users.caseId AS caseId,
Users.currentScore AS userScore
FROM Users
INNER JOIN QuestionRevisions ON QuestionRevisions.creatingUser = Users.id
INNER JOIN
(
SELECT
MIN(QuestionRevisions.id) AS minQuestionRevisionId
FROM Questions
INNER JOIN QuestionRevisions ON QuestionRevisions.question = Questions.id
GROUP BY Questions.id
) AS QuestionGroups ON QuestionGroups.minQuestionRevisionId = QuestionRevisions.id
) AS CreatingUsers ON CreatingUsers.questionId = Questions.id
INNER JOIN
(
SELECT
COUNT(*) AS number,
Questions.id AS questionId
FROM Questions
INNER JOIN Answers ON Answers.question = Questions.id
GROUP BY Questions.id
) AS answerCounts ON answerCounts.questionId = Questions.id
LEFT JOIN tagScans AS t1 ON t1.qRevId = QuestionRevisions.id AND t1.tagRank = 1
LEFT JOIN tagScans AS t2 ON t2.qRevId = QuestionRevisions.id AND t2.tagRank = 2
LEFT JOIN tagScans AS t3 ON t3.qRevId = QuestionRevisions.id AND t3.tagRank = 3
LEFT JOIN tagScans AS t4 ON t4.qRevId = QuestionRevisions.id AND t4.tagRank = 4
LEFT JOIN tagScans AS t5 ON t5.qRevId = QuestionRevisions.id AND t5.tagRank = 5
ORDER BY QuestionRevisions.postDate DESC
This is a common question that comes up quite often phrased in a number of different ways (concatenate rows as string, merge rows as string, condense rows as string, combine rows as string, etc.). There are two generally accepted ways to handle combining an arbitrary number of rows into a single string in SQL Server.
The first, and usually the easiest, is to abuse XML Path combined with the STUFF function like so:
select rsQuestions.QuestionID,
stuff((select ', '+ rsTags.TagName
from #Tags rsTags
inner join #QuestionTags rsMap on rsMap.TagID = rsTags.TagID
where rsMap.QuestionID = rsQuestions.QuestionID
for xml path(''), type).value('.', 'nvarchar(max)'), 1, 1, '')
from #QuestionRevisions rsQuestions
Here is a working example (borrowing some slightly modified setup from Mack). For your purposes you could store the results of that query in a common table expression, or in a subquery (I'll leave that as an exercise).
The second method is to use a recursive common table expression. Here is an annotated example of how that would work:
--NumberedTags establishes a ranked list of tags for each question.
--The key here is using row_number() or rank() partitioned by the particular question
;with NumberedTags (QuestionID, TagString, TagNum) as
(
select QuestionID,
cast(TagName as nvarchar(max)) as TagString,
row_number() over (partition by QuestionID order by rsTags.TagID) as TagNum
from #QuestionTags rsMap
inner join #Tags rsTags on rsTags.TagID = rsMap.TagID
),
--TagsAsString is the recursive query
TagsAsString (QuestionID, TagString, TagNum) as
(
--The first query in the common table expression establishes the anchor for the
--recursive query, in this case selecting the first tag for each question
select QuestionID,
TagString,
TagNum
from NumberedTags
where TagNum = 1
union all
--The second query in the union performs the recursion by joining the
--anchor to the next tag, and so on...
select NumberedTags.QuestionID,
TagsAsString.TagString + ', ' + NumberedTags.TagString,
NumberedTags.TagNum
from NumberedTags
inner join TagsAsString on TagsAsString.QuestionID = NumberedTags.QuestionID
and NumberedTags.TagNum = TagsAsString.TagNum + 1
)
--The result of the recursive query is a list of tag strings building up to the final
--string, of which we only want the last, so here we select the longest one which
--gives us the final result
select QuestionID, max(TagString)
from TagsAsString
group by QuestionID
And here is a working version. Again, you could use the results in a common table expression or subquery to join against your other tables to get your ultimate result. Hopefully the annotations help you understand a little more how the recursive common table expression works (though the link in Macks answer also goes into some detail about the method).
There is, of course, another way to do it, which doesn't handle an arbitrary number of rows, which is to join against your table aliased multiple times, which is what you did in your answer.

Find the top value for each parent

I'm sure this is a common request but I wouldn't know how to ask for it formally.
I encountered this a long time ago when I was in the Army. A soldier has multiple physical fitness tests but the primary test that counts in the most recent. The soldier also has multiple marksmanship qualifications but only the most recent qualification to the weapon assigned is significant.
How do you create a view that itemizes the most significant child of the parent?
Use:
SELECT p.*, x.*
FROM PARENT p
JOIN CHILD x ON x.parent_id = p.id
JOIN (SELECT c.id,
c.parent_id,
MAX(c.date_column) AS max_date
FROM CHILD c
GROUP BY c.id, c.parent_id) y ON y.id = x.id
AND y.parent_id = x.parent_id
AND y.max_date = x.date
Assuming SQL Server 2005+:
WITH summary AS (
SELECT p.*,
c.*,
ROW_NUMBER() OVER (PARTITION BY p.id
ORDER BY c.date DESC) AS rank
FROM PARENT p
JOIN CHILD c ON c.parent_id = p.id)
SELECT s.*
FROM summary s
WHERE s.rank = 1
Although I'm not quite sure what you are implying by "itemizing", you can do something like so:
Select ..
From Soldier
Left Join FitnessTest
On FitnessTest.SoldierId = Soldier.Id
And FitnessTest.TestDate = (
Select Max(FT1.TestDate)
From FitnessTest As FT1
Where FT1.SoldierId = FitnessTest.SoldierId
)
Left Join MarksmanshipTest
On MarksmanshipTest.SoldierId = Soldier.Id
And MarksmanshipTest.TestDate = (
Select Max(MT1.TestDate)
From MarksmanshipTest As MT1
Where MT1.SoldierId = MarksmanshipTest.SoldierId
)
This assumes that a solider cannot have two test datetime values for a fitness test or a marksmanship test.
No significant differnce from previous two answer but a little more detail perhaps:
create table soldier ( soldierId int primary key,
name varchar(100) )
create table fitnessTest ( soldierId int foreign key references soldier,
occurred datetime, result int )
create table marksmanshipTest ( soldierId int foreign key references soldier,
occurred datetime, result int )
;with
mostRecentFitnessTest as
(
select
fitnessTest.soldierId,
fitnessTest.result,
row_number() over (order by occurred desc) as row
from fitnessTest
),
mostRecentMarksmanshipTest as
(
select
marksmanshipTest.soldierId,
marksmanshipTest.result,
row_number() over (order by occurred desc) as row
from marksmanshipTest
)
select
soldier.soldierId,
soldier.name,
mostRecentFitnessTest.result,
mostRecentMarksmanshipTest.result
from soldier
left outer join mostRecentFitnessTest on
mostRecentFitnessTest.soldierId = soldier.soldierId
and mostRecentFitnessTest.row = 1
left outer join mostRecentMarksmanshipTest on
mostRecentMarksmanshipTest.soldierId = soldier.soldierId
and mostRecentMarksmanshipTest.row = 1