Here is a description of my problem with dummy data:
I have a table in SQL Server as below:
Table
id parentid extid Isparent
0 a m 0
1 a m 1
2 a s 0
3 a s 0
4 b q 1
5 b z 0
for each group of records with the same parentid, there is only one record with Isparent = 1.
for each record, I want to find the extid of their parents.
So for id = 0, the parent record is id=1, and extid=m for id=1 is the value I need.
Here is the output I want.
childid parentid child_extid parent_extid
0 a m m
1 a m m
2 a s m
3 a s m
4 b q q
5 b z q
I'm doing this with a self join, but since the table is large, the performance is really slow, I also need to do this multiple times for a few different tables which makes things even worse.
SELECT
a.Id AS 'ChildId',
a.parentid As 'ParentId',
a.extid AS 'child_extid ',
b.extid AS 'parent_extid '
FROM Table a
LEFT JOIN Table b ON (a.parentid = b.parentid)
WHERE b.isparent = 1
Just wondering if there is a better way to do this.
Thanks!
This method of design is incredibly unorthodox. not only is this structure incapable of representing true hierarchy structures, but it also appears to represent a trinary relationship rather than a binary one. If you have control over the design of this data, and this was not your intent, I can help you reformat it for your intent if desired. Until then, this is a very basic representation of what you might be looking for, but falls short because there are unanswered questions about the intent of the data, such as what happens if you generate a row with the values ParentID = a, ExitID = y, IsParent = 1?
Having said that, here's a crack at it. Please note that this execution will be a self-semi join, and will require an index to work properly. This also excludes an order by clause because of the question above.
The code below will be in TSQL format until the DBMS is clarified.
CREATE FUNCTION ParentExitID (#ChildID INT)
AS
BEGIN
RETURN (
SELECT TOP 1 a.ParentID
FROM SampleTable A
WHERE EXISTS (
SELECT 1
FROM SampleTable B
WHERE A.ParentID = B.ParentID
AND A.IsParent = 1
AND B.ChildID = #ChildID
)
END
The "faster" way is to use a relational database, relationally and employ normalization in the design. What the question is demonstrating is cramming as much as possible into a table without any normalization and then building complex (and inefficient) querying logic on top of it.
create table #thing ( id int ); --id is PK
create table #ext ( id varchar(1) ); -- id is PK
create table #thingext ( thing int, ext varchar(1) ); -- thing,ext is PK
create table #parent ( id varchar(1), ext varchar(1)); -- id is PK, ext is FK
create table #thingparent ( thing int, parent varchar(1)); -- thing,parent is PK
insert into #thing values (0),(1),(2),(3),(4),(5);
insert into #ext values ('m'),('s'),('q'),('z');
insert into #thingext values (0,'m'),(1,'m'),(2,'s'),(3,'s'),(4,'q'),(5,'z');
insert into #parent values ('a','m'),('b','q');
insert into #thingparent values (0,'a'),(1,'a'),(2,'a'),(3,'a'),(4,'b'),(5,'b');
select t.id as childid, p.ext as extid
from #thing t
join #thingparent tp on t.id = tp.thing
join #parent p on tp.parent = p.id
What's getting jumbled up in the complexity of the original question is that extid is dependent upon the parent primary key but it is not dependent upon the child primary key. And a model has to reflect that.
Not sure if it would really speed it up.
But you can run this without a WHERE clause by putting that condition on b.isparent in the JOIN.
Example using a table variable:
declare #Table table (id int identity(0,1) primary key, parentid varchar(30), extid varchar(30), isparent bit);
insert into #Table (parentid, extid, isparent) values
('a','m',0),
('a','m',1),
('a','s',0),
('a','s',0),
('b','q',1),
('b','z',0);
SELECT
a.Id AS 'ChildId',
a.parentid AS 'ParentId',
a.extid AS 'child_extid',
b.extid AS 'parent_extid'
FROM #Table a
LEFT JOIN #Table b ON (a.parentid = b.parentid and b.isparent = 1);
But the Explain Plan will probably be the same as in your query.
Adding an non-unique index on parentid could speed things up.
Related
Currently I'm having the following table structure.
Master table Documents:
ID
Filename
1
document1.pdf
2
document2.pdf
3
document3.pdf
Detail table Keywords:
ID
DocumentID
Keyword
1
1
KeywordA
2
1
KeywordB
3
1
KeywordC
4
2
KeywordB
5
3
KeywordA
6
3
KeywordD
Code to create this:
CREATE TABLE Documents (
ID int IDENTITY(1,1) PRIMARY KEY,
Filename nvarchar(255) NOT NULL
);
CREATE TABLE Keywords (
ID int IDENTITY(1,1) PRIMARY KEY,
DocumentID int NOT NULL,
Keyword nvarchar(255) NOT NULL
);
INSERT INTO Documents(Filename) VALUES
('document1.pdf'), ('document2.pdf'), ('document3.pdf');
INSERT INTO Keywords(DocumentID, Keyword) VALUES
(1, 'KeywordA'),
(1, 'KeywordB'),
(1, 'KeywordC'),
(2, 'KeywordB'),
(3, 'KeywordA'),
(3, 'KeywordD');
SQL Fiddle for this.
Finding with one keyword
I'm looking for a way to get all documents matching a certain keyword.
This could be e.g. written with the following T-SQL query:
SELECT Documents.*
FROM Documents
WHERE Documents.ID IN
(
SELECT Keywords.DocumentID
FROM Keywords
WHERE Keywords.Keyword = 'KeywordA'
)
This works successfully.
Finding with multiple keywords
What I'm currently stuck with is when I want to find all documents that match multiple keyword, combined with logical AND.
E.g. find a document that has three detail records with keyword A, B and C.
I think the following might work, but I don't know whether this performant or elegant at all:
SELECT Documents.*
FROM Documents
WHERE Documents.ID IN
(
SELECT Keywords.DocumentID
FROM Keywords
WHERE
Keywords.Keyword = 'KeywordA' OR
Keywords.Keyword = 'KeywordB'
GROUP BY Keywords.DocumentID HAVING COUNT(*) = 2
)
SQL Fiddle for that.
My question
How to write a (performant) SQL query to find all documents that have multiple keywords associated.
If it is easier, a solution with a constant number of keywords (e.g. 3) would be sufficient.
I hope the following query can help you
SELECT D.ID
FROM Documents D
JOIN Keywords K ON K.DocumentID = D.ID
WHERE K.Keyword IN ('KeywordA', 'KeywordB', 'KeywordC')
GROUP BY D.ID
HAVING COUNT(DISTINCT K.Keyword) = 3
Demo
The technique you are trying to do is called Relational Division With Remainder, in other words: find all groups which contain a particular set of rows.
Your current query is one of the standard ways of doing this, there are others.
If you had the keywords in a table variable or TVP, ...
DECLARE #keywords AS TABLE (Keyword varchar(50));
INSERT #keywords VALUES
('KeywordA'), ('KeywordB'), ('KeywordC');
... you could make it much neater with the following:
SELECT d.*
FROM Documents d
WHERE d.ID IN
(
SELECT k.DocumentID
FROM Keywords k
JOIN #keywords kt ON kt.Keyword = k.Keyword
GROUP BY k.DocumentID
HAVING COUNT(*) = (SELECT COUNT(*) FROM #keywords)
);
Another option:
SELECT d.*
FROM Documents d
WHERE EXISTS (SELECT 1
FROM #keywords kt
LEFT JOIN Keywords k ON kt.Keyword = k.Keyword
AND k.DocumentID = d.ID
HAVING COUNT(*) = COUNT(k.keywords) -- there are no missing matches
);
And another, slightly confusing one:
SELECT d.*
FROM Documents d
WHERE NOT EXISTS (SELECT 1
FROM #keywords kt
WHERE NOT EXISTS (SELECT 1
FROM Keywords k
WHERE k.Keyword = kt.Keyword
AND K.DocumentID = d.ID
)
);
-- For each document, there are no keywords for which there is no match
I asked a question earlier, but I wasn't really able to explain myself clearly.
I made a graphic to hopefully help explain what I'm trying to do.
I have two separate tables inside the same database. One table called 'Consumers' with about 200 fields including one called 'METER_NUMBERS*'. And then one other table called 'Customer_Info' with about 30 fields including one called 'Meter'. These two meter fields are what the join or whatever method would be based on. The problem is that not all the meter numbers in the two tables match and some are NULL values and some are a value of 0 in both tables.
I want to join the information for the records that have matching meter numbers between the two tables, but also keep the NULL and 0 values as their own records. There are NULL and 0 values in both tables but I don't want them to join together.
There are also a few duplicate field names, like Location shown in the graphic. If it's easier to fix these duplicate field names manually I can do that, but it'd be cool to be able to do it programmatically.
The key is that I need the result in a NEW table!
This process will be a one time thing, not something I would do often.
Hopefully, I explained this clearly and if anyone can help me out that'd be awesome!
If any more information is needed, please let me know.
Thanks.
INSERT INTO new_table
SELECT * FROM
(SELECT a.*, b.* FROM Consumers a
INNER JOIN CustomerInfo b ON a.METER_NUMBER = b.METER and a.Location = b.Location
WHERE a.METER_NUMBER IS NOT NULL AND a.METER_NUMBER <> 0
UNION ALL
SELECT a.*, NULL as Meter, NULL as CustomerInfo_Location, NULL as Field2, NULL as Field3
FROM Consumers a
WHERE a.METER_NUMBER IS NULL OR a.METER_NUMBER = 0
UNION ALL
SELECT NULL as METER_NUMBER, NULL as Location, NULL as Field4, NULL as Field5, b.*
FROM CustomerInfo b
WHERE b.METER IS NULL OR b.METER = 0) c
I know to create a new table from other table you can use the following snip:
CREATE TABLE New_table
AS (SELECT customers.Meter_number, customers_info.Meter_number, ...
FROM customers, customers_info
WHERE customers.Meter_number = customers_info.Meter_number
OR customers.Meter_number IS NULL OR customers_info.Meter_number = 0);
I didn't test it out, but you should be able to do something with that.
I guess full outer join is what you need.
Create table #consumers (
meter_number int,
location varchar(50),
field4 varchar(50),
field5 varchar(50)
)
Create table #Customer_info (
meter int,
location varchar(50),
field1 varchar(50),
field2 varchar(50)
)
Insert into #consumers(meter_number ,location , field4 , field5 )
values (1234,'Dallas','a','1')
,(null, 'Denver','b','2')
,(5678,'Houston','c','3')
,(null,'Omaha','d','4')
,(0,'Portland','e','5')
,(2222,'Sacramento','f','6')
Insert into #Customer_info(meter , location )
values (1234,'Dallas')
,(null, 'Kansas')
,(5678,'Houston')
,(Null,'Denver')
,(0,'Boston')
,(4444,'NY')
Select c.*
,i.*
From #consumers c
full outer join #Customer_info i on c.meter_number=i.meter
and c.location=i.location
select * into New_Table From (select METER_NUMBER,Consumers.Location AS Location,Field4,Field5,Meter,Customer_Info.Location As Customer_Info_Location,Field2,Field3 From Consumers full outer Join Customer_Info on Consumers.METER_NUMBER=Customer_Info.Meter And Consumers.Location=Customer_Info.Location) AS t
This is my 1st table :
this is another table on which i want to perform join operation :
I want to retrieve first_name for "activity_cc" column
For example, I want to show Pritam,Niket for activity_id=2
How can I retrieve those values?
From http://mikehillyer.com/articles/an-introduction-to-database-normalization/
The first normal form (or 1NF) requires that the values in each column
of a table are atomic. By atomic we mean that there are no sets of
values within a column.
Your database design violates the first normal form of database design. It is a simply unworkable design and it must be changed (and frankly the database designer who created this should be fired as this is gross incompetence) or there will be severe performance problems and querying will always be difficult. There is a reason why the very first rule of database design is never store more than one piece of information in a field.
Yes you could use some hack methods to get the answer you want, but they will cause performance issues and they are the wrong thing to do. A hack to fix this data into a related table used one-time is fine, a hack to continuallly query your database is simply a poor choice. It will cost less time in the long run to fix this cancer at the heart of your database right now. But in general the process to fix this is to split the data out into a related table using some version of fn_split (look up the various implementations of this for a script to create the function). You can use a temp table in your query or do the right thing and fix the database.
If you want to retrieve the result on the basis of Join then why don't you join your both tables on the "registration_id" by using inner-join. And please clearify me you want to perform the join on the active_cc, but its actually not present in your second table. So how you preform join in that case.
I completely agree with #HLGEM, but to solve this particular problem cost will be high.
I had given a try to want you want to achive here. Please modify the join if needed.
Let me know if any further help needed.
Sample Schema
create table tableC (ACTIVITY_ID int, REG_ID int,PROJ_ID int,DOSS_ID int,ACTIVITY_TO int, ACTIVITY_CC varchar(500))
insert into tableC select 4, 1,1,1,1, '3,4';
insert into tableC select 5, 2,2,2,2, '5,6';
insert into tableC select 6, 3,3,3,3, '3,5';
create table tableD (REG_ID int , FIRST_NAME VARCHAR(100), LAST_NAME VARCHAR(100))
insert into tableD select 3, 'Pritam', 'Sharma';
insert into tableD select 4, 'Pratik', 'Gupta';
insert into tableD select 5, 'Niket', 'Vaidya';
insert into tableD select 6, 'Ajinkya', 'Satwa';
Sample Query
with names as
(
select C.ACTIVITY_ID,C.ACTIVITY_CC
,Names = D.FIRST_NAME
from tableC C
inner join tableD D on charindex(cast(D.REG_ID as varchar), C.ACTIVITY_CC) > 0
)
select
C.ACTIVITY_ID,C.REG_ID,PROJ_ID,DOSS_ID,ACTIVITY_TO,ACTIVITY_CC
,Names = stuff
(
(
select ',' + Names
from names n
where n.ACTIVITY_ID = D.REG_ID
for xml path('')
)
, 1
, 1
, ''
)
from tableD D
inner join tableC C on C.ACTIVITY_ID = D.REG_ID
Added to SQLFiddle also
Considering Pratik's structure
CREATE TABLE tableC
(
ACTIVITY_ID int,
REG_ID int,
PROJ_ID int,
DOSS_ID int,
ACTIVITY_TO int,
ACTIVITY_CC varchar(500)
);
INSERT INTO tableC select 4, 1,1,1,1, '3,4';
INSERT INTO tableC select 5, 2,2,2,2, '5,6';
INSERT INTO tableC select 6, 3,3,3,3, '3,5';
CREATE TABLE tableD
(
REG_ID int,
FIRST_NAME VARCHAR(100),
LAST_NAME VARCHAR(100)
);
INSERT INTO tableD select 3, 'Pritam', 'Sharma';
INSERT INTO tableD select 4, 'Pratik', 'Gupta';
INSERT INTO tableD select 5, 'Niket', 'Vaidya';
INSERT INTO tableD select 6, 'Ajinkya', 'Satwa';
You can do this:
SELECT tableD.FIRST_NAME
FROM tableD
JOIN tableC ON tableC.ACTIVITY_CC LIKE CONCAT('%', tableD.REG_ID, '%')
GROUP BY tableD.FIRST_NAME;
OR
SELECT FIRST_NAME
FROM tableD, tableC
WHERE tableC.ACTIVITY_CC LIKE CONCAT('%', tableD.REG_ID, '%')
GROUP BY tableD.FIRST_NAME;
For a class project, a few others and I have decided to make a (very ugly) limited clone of StackOverflow. For this purpose, we're working on one query:
Home Page: List all the questions, their scores (calculated from votes), and the user corresponding to their first revision, and the number of answers, sorted in date-descending order according to the last action on the question (where an action is an answer, an edit of an answer, or an edit of the question).
Now, we've gotten the entire thing figured out, except for how to represent tags on questions. We're currently using a M-N mapping of tags to questions like this:
CREATE TABLE QuestionRevisions (
id INT IDENTITY NOT NULL,
question INT NOT NULL,
postDate DATETIME NOT NULL,
contents NTEXT NOT NULL,
creatingUser INT NOT NULL,
title NVARCHAR(200) NOT NULL,
PRIMARY KEY (id),
CONSTRAINT questionrev_fk_users FOREIGN KEY (creatingUser) REFERENCES
Users (id) ON DELETE CASCADE,
CONSTRAINT questionref_fk_questions FOREIGN KEY (question) REFERENCES
Questions (id) ON DELETE CASCADE
);
CREATE TABLE Tags (
id INT IDENTITY NOT NULL,
name NVARCHAR(45) NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE QuestionTags (
tag INT NOT NULL,
question INT NOT NULL,
PRIMARY KEY (tag, question),
CONSTRAINT qtags_fk_tags FOREIGN KEY (tag) REFERENCES Tags(id) ON
DELETE CASCADE,
CONSTRAINT qtags_fk_q FOREIGN KEY (question) REFERENCES Questions(id) ON
DELETE CASCADE
);
Now, for this query, if we just join to QuestionTags, then we'll get the questions and titles over and over and over again. If we don't, then we have an N query scenario, which is just as bad. Ideally, we'd have something where the result row would be:
+-------------+------------------+
| Other Stuff | Tags |
+-------------+------------------+
| Blah Blah | TagA, TagB, TagC |
+-------------+------------------+
Basically -- for each row in the JOIN, do a string join on the resulting tags.
Is there a built in function or similar which can accomplish this in T-SQL?
Here's one possible solution using recursive CTE:
The methods used are explained here
TSQL to set up the test data (I'm using table variables):
DECLARE #QuestionRevisions TABLE (
id INT IDENTITY NOT NULL,
question INT NOT NULL,
postDate DATETIME NOT NULL,
contents NTEXT NOT NULL,
creatingUser INT NOT NULL,
title NVARCHAR(200) NOT NULL)
DECLARE #Tags TABLE (
id INT IDENTITY NOT NULL,
name NVARCHAR(45) NOT NULL
)
DECLARE #QuestionTags TABLE (
tag INT NOT NULL,
question INT NOT NULL
)
INSERT INTO #QuestionRevisions
(question,postDate,contents,creatingUser,title)
VALUES
(1,GETDATE(),'Contents 1',1,'TITLE 1')
INSERT INTO #QuestionRevisions
(question,postDate,contents,creatingUser,title)
VALUES
(2,GETDATE(),'Contents 2',2,'TITLE 2')
INSERT INTO #Tags (name) VALUES ('Tag 1')
INSERT INTO #Tags (name) VALUES ('Tag 2')
INSERT INTO #Tags (name) VALUES ('Tag 3')
INSERT INTO #Tags (name) VALUES ('Tag 4')
INSERT INTO #Tags (name) VALUES ('Tag 5')
INSERT INTO #Tags (name) VALUES ('Tag 6')
INSERT INTO #QuestionTags (tag,question) VALUES (1,1)
INSERT INTO #QuestionTags (tag,question) VALUES (3,1)
INSERT INTO #QuestionTags (tag,question) VALUES (5,1)
INSERT INTO #QuestionTags (tag,question) VALUES (4,2)
INSERT INTO #QuestionTags (tag,question) VALUES (2,2)
Here's the action part:
;WITH CTE ( id, taglist, tagid, [length] )
AS ( SELECT question, CAST( '' AS VARCHAR(8000) ), 0, 0
FROM #QuestionRevisions qr
GROUP BY question
UNION ALL
SELECT qr.id
, CAST(taglist + CASE WHEN [length] = 0 THEN '' ELSE ', ' END + t.name AS VARCHAR(8000) )
, t.id
, [length] + 1
FROM CTE c
INNER JOIN #QuestionRevisions qr ON c.id = qr.question
INNER JOIN #QuestionTags qt ON qr.question=qt.question
INNER JOIN #Tags t ON t.id=qt.tag
WHERE t.id > c.tagid )
SELECT id, taglist
FROM ( SELECT id, taglist, RANK() OVER ( PARTITION BY id ORDER BY length DESC )
FROM CTE ) D ( id, taglist, rank )
WHERE rank = 1;
This was the solution I ended up settling on. I checkmarked Mack's answer because it works with arbitrary numbers of tags, and because it matches what I asked for in my question. I ended up though going with this, however, simply because I understand what this is doing, while I have no idea how Mack's works :)
WITH tagScans (qRevId, tagName, tagRank)
AS (
SELECT DISTINCT
QuestionTags.question AS qRevId,
Tags.name AS tagName,
ROW_NUMBER() OVER (PARTITION BY QuestionTags.question ORDER BY Tags.name) AS tagRank
FROM QuestionTags
INNER JOIN Tags ON Tags.id = QuestionTags.tag
)
SELECT
Questions.id AS id,
Questions.currentScore AS currentScore,
answerCounts.number AS answerCount,
latestRevUser.id AS latestRevUserId,
latestRevUser.caseId AS lastRevUserCaseId,
latestRevUser.currentScore AS lastRevUserScore,
CreatingUsers.userId AS creationUserId,
CreatingUsers.caseId AS creationUserCaseId,
CreatingUsers.userScore AS creationUserScore,
t1.tagName AS tagOne,
t2.tagName AS tagTwo,
t3.tagName AS tagThree,
t4.tagName AS tagFour,
t5.tagName AS tagFive
FROM Questions
INNER JOIN QuestionRevisions ON QuestionRevisions.question = Questions.id
INNER JOIN
(
SELECT
Questions.id AS questionId,
MAX(QuestionRevisions.id) AS maxRevisionId
FROM Questions
INNER JOIN QuestionRevisions ON QuestionRevisions.question = Questions.id
GROUP BY Questions.id
) AS LatestQuestionRevisions ON QuestionRevisions.id = LatestQuestionRevisions.maxRevisionId
INNER JOIN Users AS latestRevUser ON latestRevUser.id = QuestionRevisions.creatingUser
INNER JOIN
(
SELECT
QuestionRevisions.question AS questionId,
Users.id AS userId,
Users.caseId AS caseId,
Users.currentScore AS userScore
FROM Users
INNER JOIN QuestionRevisions ON QuestionRevisions.creatingUser = Users.id
INNER JOIN
(
SELECT
MIN(QuestionRevisions.id) AS minQuestionRevisionId
FROM Questions
INNER JOIN QuestionRevisions ON QuestionRevisions.question = Questions.id
GROUP BY Questions.id
) AS QuestionGroups ON QuestionGroups.minQuestionRevisionId = QuestionRevisions.id
) AS CreatingUsers ON CreatingUsers.questionId = Questions.id
INNER JOIN
(
SELECT
COUNT(*) AS number,
Questions.id AS questionId
FROM Questions
INNER JOIN Answers ON Answers.question = Questions.id
GROUP BY Questions.id
) AS answerCounts ON answerCounts.questionId = Questions.id
LEFT JOIN tagScans AS t1 ON t1.qRevId = QuestionRevisions.id AND t1.tagRank = 1
LEFT JOIN tagScans AS t2 ON t2.qRevId = QuestionRevisions.id AND t2.tagRank = 2
LEFT JOIN tagScans AS t3 ON t3.qRevId = QuestionRevisions.id AND t3.tagRank = 3
LEFT JOIN tagScans AS t4 ON t4.qRevId = QuestionRevisions.id AND t4.tagRank = 4
LEFT JOIN tagScans AS t5 ON t5.qRevId = QuestionRevisions.id AND t5.tagRank = 5
ORDER BY QuestionRevisions.postDate DESC
This is a common question that comes up quite often phrased in a number of different ways (concatenate rows as string, merge rows as string, condense rows as string, combine rows as string, etc.). There are two generally accepted ways to handle combining an arbitrary number of rows into a single string in SQL Server.
The first, and usually the easiest, is to abuse XML Path combined with the STUFF function like so:
select rsQuestions.QuestionID,
stuff((select ', '+ rsTags.TagName
from #Tags rsTags
inner join #QuestionTags rsMap on rsMap.TagID = rsTags.TagID
where rsMap.QuestionID = rsQuestions.QuestionID
for xml path(''), type).value('.', 'nvarchar(max)'), 1, 1, '')
from #QuestionRevisions rsQuestions
Here is a working example (borrowing some slightly modified setup from Mack). For your purposes you could store the results of that query in a common table expression, or in a subquery (I'll leave that as an exercise).
The second method is to use a recursive common table expression. Here is an annotated example of how that would work:
--NumberedTags establishes a ranked list of tags for each question.
--The key here is using row_number() or rank() partitioned by the particular question
;with NumberedTags (QuestionID, TagString, TagNum) as
(
select QuestionID,
cast(TagName as nvarchar(max)) as TagString,
row_number() over (partition by QuestionID order by rsTags.TagID) as TagNum
from #QuestionTags rsMap
inner join #Tags rsTags on rsTags.TagID = rsMap.TagID
),
--TagsAsString is the recursive query
TagsAsString (QuestionID, TagString, TagNum) as
(
--The first query in the common table expression establishes the anchor for the
--recursive query, in this case selecting the first tag for each question
select QuestionID,
TagString,
TagNum
from NumberedTags
where TagNum = 1
union all
--The second query in the union performs the recursion by joining the
--anchor to the next tag, and so on...
select NumberedTags.QuestionID,
TagsAsString.TagString + ', ' + NumberedTags.TagString,
NumberedTags.TagNum
from NumberedTags
inner join TagsAsString on TagsAsString.QuestionID = NumberedTags.QuestionID
and NumberedTags.TagNum = TagsAsString.TagNum + 1
)
--The result of the recursive query is a list of tag strings building up to the final
--string, of which we only want the last, so here we select the longest one which
--gives us the final result
select QuestionID, max(TagString)
from TagsAsString
group by QuestionID
And here is a working version. Again, you could use the results in a common table expression or subquery to join against your other tables to get your ultimate result. Hopefully the annotations help you understand a little more how the recursive common table expression works (though the link in Macks answer also goes into some detail about the method).
There is, of course, another way to do it, which doesn't handle an arbitrary number of rows, which is to join against your table aliased multiple times, which is what you did in your answer.
My 'people' table has one row per person, and that person has a division (not unique) and a company (not unique).
I need to join people to p_features, c_features, d_features on:
people.person=p_features.num_value
people.division=d_features.num_value
people.company=c_features.num_value
... in a way that if there is a record match in p_features/d_features/c_features only, it would be returned, but if it was in 2 or 3 of the tables, the most specific record would be returned.
From my test data below, for example, query for person=1 would return
'FALSE'
person 3 returns maybe, person 4 returns true, and person 9 returns default
The biggest issue is that there are 100 features and I have queries that need to return all of them in one row. My previous attempt was a function which queried on feature,num_value in each table and did a foreach, but 100 features * 4 tables meant 400 reads and it brought the database to a halt it was so slow when I loaded up a few million rows of data.
create table p_features (
num_value int8,
feature varchar(20),
feature_value varchar(128)
);
create table c_features (
num_value int8,
feature varchar(20),
feature_value varchar(128)
);
create table d_features (
num_value int8,
feature varchar(20),
feature_value varchar(128)
);
create table default_features (
feature varchar(20),
feature_value varchar(128)
);
create table people (
person int8 not null,
division int8 not null,
company int8 not null
);
insert into people values (4,5,6);
insert into people values (3,5,6);
insert into people values (1,2,6);
insert into p_features values (4,'WEARING PANTS','TRUE');
insert into c_features values (6,'WEARING PANTS','FALSE');
insert into d_features values (5,'WEARING PANTS','MAYBE');
insert into default_features values('WEARING PANTS','DEFAULT');
You need to transpose the features into rows with a ranking. Here I used a common-table expression. If your database product does not support them, you can use temporary tables to achieve the same effect.
;With RankedFeatures As
(
Select 1 As FeatureRank, P.person, PF.feature, PF.feature_value
From people As P
Join p_features As PF
On PF.num_value = P.person
Union All
Select 2, P.person, PF.feature, PF.feature_value
From people As P
Join d_features As PF
On PF.num_value = P.division
Union All
Select 3, P.person, PF.feature, PF.feature_value
From people As P
Join c_features As PF
On PF.num_value = P.company
Union All
Select 4, P.person, DF.feature, DF.feature_value
From people As P
Cross Join default_features As DF
)
, HighestRankedFeature As
(
Select Min(FeatureRank) As FeatureRank, person
From RankedFeatures
Group By person
)
Select RF.person, RF.FeatureRank, RF.feature, RF.feature_value
From people As P
Join HighestRankedFeature As HRF
On HRF.person = P.person
Join RankedFeatures As RF
On RF.FeatureRank = HRF.FeatureRank
And RF.person = P.person
Order By P.person
I don't know if I had understood very well your question, but to use JOIN, you need your table loaded already and then use the SELECT statement with INNER JOIN, LEFT JOIN or whatever you need to show.
If you post some more information, maybe turn it easy to understand.
There are some aspects of your schema I'm not understanding, like how to relate to the default_features table if there's no match in any of the specific tables. The only possible join condition is on feature, but if there's no match in the other 3 tables, there's no value to join on. So, in my example, I've hard-coded the DEFAULT since I can't think of how else to get it.
Hopefully this can get you started and if you can clarify the model a bit more, the solution can be refined.
select p.person, coalesce(pf.feature_value, df.feature_value, cf.feature_value, 'DEFAULT')
from people p
left join p_features pf
on p.person = pf.num_value
left join d_features df
on p.division = df.num_value
left join c_features cf
on p.company = cf.num_value