Two table recursive lookup without hierarchy, is this possible? - sql

This is the current design of the SQL Server database I am working with.
I have two tables:
recipes
recipe ingredients
Recipes consist of recipe ingredients but an ingredient can be another recipe. In theory there are infinite levels as each recipe can have another ingredient that is also a recipe.
In the above data example, the Fresh Salsa recipe (ID 3047) has 7 ingredients. Six are raw Materials but one is another recipe (Recipe ID 3008). This recipe ID references another recipe in the 'recipes' table.
There is no hierarchy and I don't think I can create a hierarchy.
The goal is to extract all the recipe items for a particular recipe that have a 'sub' recipes and 'sub-sub' recipes etc.
It would seem like a recursive lookup would be the answer but because there is no hierarchy, this doesn't seem to work.
Here's my attempted query (the recipeItem list variable is a list of all the recipeitems that are also recipes created in a previous query):
<cfquery name="whatever">
WITH MenuPrepOfPreps (recipe_id, depth, otherRecipe_id, recipe_name)
AS
(
SELECT r.recipe_id,
0 as depth,
ri.otherRecipe_id,
r.recipe_name
FROM menu_recipes r
JOIN menu_recipeItems ri
ON ri.otherRecipe_id = r.recipe_id
WHERE ri.otherRecipe_id in (#recipeItemList#)
UNION ALL
-- recursive members
SELECT
mop.recipe_id,
mop.depth + 1 as depth,
ri.otherRecipe_id,
r.recipe_name
FROM menu_recipes r
JOIN menu_recipeItems ri
ON ri.otherRecipe_id = r.recipe_id
INNER JOIN MenuPrepOfPreps AS MOP
ON ri.otherRecipe_id = MOP.recipe_id
)
SELECT top(6)recipe_id, recipe_name
FROM MenuPrepOfPreps
GROUP BY recipe_id, recipe_name
</cfquery>
It keeps creating an infinite loop. When I limit the results to the first few rows (top 6), it does give the desired result.
It is possible that the design of the database is not correct so this might never work.
Any help is appreciated.
[UPDATED QUERY BASED ON #NewBie20200101 PROPOSED SOLUTION WITH CHANGES TO VARIABLE/COLUMN NAMES]
<cfquery name="whatever">
WITH MenuPrepOfPreps AS
(
SELECT otherrecipe_id,
CASE
when
otherRecipe_id = 0 then null
else
otherRecipe_id
end
as sub_recipe
FROM menu_recipeItems as a -- anchor
UNION ALL
SELECT
a.otherrecipe_id,
CASE
when
b.otherRecipe_id = 0 then null
else
b.otherRecipe_id
end
as sub_recipe
FROM menu_recipeItems as b
where b.recipe_id = a.otherRecipe_id --recursion
and a.otherRecipe_id is null --stopper
), allrecipeitems as (
SELECT recipe_id, sub_recipe
FROM MenuPrepOfPreps
)
Select
c.recipe_id,
d.otherRecipe_id
From MENU_recipes c
INNER JOIN MENU_recipeItems d on c.recipe_id = d.otherRecipe_id
Where c.recipe in (#recipelist#)
</cfquery>
Does not work and gives the following error:
The multi-part identifier "a.otherRecipe_id" could not be bound.

Not sure if this is gonna work:
With preppreprep as (
Select
Recipeid,
Case when otherrecipeId = 0 then null else otherrecipeID end as otherrecipeID, ——remove 0 it might be a problem
From
Recipeitems as a ——————anchor
Left outer join recipeitems as b on a.otherrecipeID = b.recipeID
Union all
Select
C.recipeid,
Case when c.otherrecipeID = 0 the null else c.other recipeID end as otherrecipeID, also remove 0
From preprepprep as c
Left outer join recipeitems as d
Where c.recipeid = d.otherrecipeID———--recursion
), allrecipeitems as (
Select
RecipeID,
OtherrecipeID
From preprepprep
)
Select
C.RecipeID
D.OtherRecipeID
From recipe c
Inner recipeitems d on c.recipeid = d.recipeid
Where c.recipe in (##)
——extract unpacked sub recipes based on recipe
If you think there are more than 99 levels, add option max recursion 0

I figured this out myself with help from all contributors (thank you). First off as was mentioned by #Charlieface I was approaching this from the wrong direction. With this solution I started from the bottom of the hierarchy and worked up until there is no more data to get in the hierarchy.
The last query has multiple filters and grouping in order to get exactly the information I need.
If anyone is interested, here's the solution:
WITH MenuPrepOfPreps (recipe_id, otherRecipe_id, depth, recipe_name)
AS (
SELECT
mria.recipe_id, mria.otherRecipe_id, 0, mr.recipe_name
FROM
menu_recipeItems mria, menu_recipes mr
where
mria.otherrecipe_id <> 0
and mria.recipe_id = mr.recipe_id
UNION ALL
SELECT
MenuPrepOfPreps.recipe_id, mri.otherrecipe_id, MenuPrepOfPreps.depth+1, mr.recipe_name
FROM
menu_recipeItems mri, MenuPrepOfPreps, menu_recipes mr
WHERE MenuPrepOfPreps.otherrecipe_id = mri.recipe_id
AND MRI.otherrecipe_id <> 0
AND mr.recipe_id = mri.recipe_id
)
SELECT
mopsA.otherrecipe_id, mopsA.recipe_id, mopsA.depth, mra.recipe_name AS thePrepRecipeName
FROM
MenuPrepOfPreps mopsA, MENU_recipes MRA
WHERE
mopsA.recipe_id in (#recipelist#)
AND mopsA.otherRecipe_id = MRA.recipe_id
AND MRA.recipeType_id = 2
GROUP BY mra.recipe_name, mopsA.otherrecipe_id, mopsA.recipe_id, mopsA.depth
I will have to see how this behaves with a larger dataset than the sample set I have now, but so far it's pretty fast.

I would create a table valued function that returns all leaf recipeItem for a single recipe_id (that flattens the hierarchy), without a recursion but a loop that resolves one recursion level at a time (instead of following each single branch). If you have a million sub recipes down to 10 levels deep it takes 12 iterations to get the data (10 (the depth) + 1 (determines nothing has changed anymore) + 1 (to get all the leaves from the recipes).
This approach has also the advantage that no limit must be set, if there is an endless loop defined in the database it does not bother us the slightest.
Here the code for the table valued function:
CREATE FUNCTION [dbo].[GetRecipeItemsFlat](#recipe_id INT)
RETURNS #recipeItems TABLE (recipeItem_id INT NOT NULL)
AS
BEGIN
DECLARE #recipeList TABLE (recipe_id INT);
DECLARE #previousCount INT = 0;
DECLARE #currentCount INT = 1;
-- Get all recipe_ids recursively until infinity (no endless loop possible)
INSERT INTO #recipeList SELECT #recipe_id;
WHILE (#previousCount < #currentCount) BEGIN
-- Adding not yet added child recipe_ids
INSERT INTO #recipeList SELECT DISTINCT otherRecipe_id FROM recipeItem WHERE recipe_id IN (SELECT recipe_id FROM #recipeList) AND (otherRecipe_id IS NOT NULL) AND (otherRecipe_id != 0) AND (otherRecipe_id NOT IN (SELECT recipe_id FROM #recipeList));
SET #previousCount = #currentCount ;
SET #currentCount = (SELECT COUNT(*) FROM #recipeList);
END
INSERT INTO #recipeItems SELECT recipeItem_ID FROM recipeItem WHERE recipe_id IN (SELECT * FROM #recipeList) AND (rawMaterial_id IS NOT NULL) AND (rawMaterial_id != 0);
RETURN;
END
And then I would create a view that provides all leaf recipeItem for a recipe like this:
CREATE VIEW recipeItemFlat
AS
SELECT c.recipe_id, c.recipeItem_id, c.ItemQuantity, c.rawMaterial_id
FROM recipe a
CROSS APPLY dbo.GetRecipeItemsFlat(recipe_id) b
INNER JOIN recipeItem c ON b.recipeItem_id = c.recipeItem_id;
And then the query to get all items for recipe 3047 is trivial:
SELECT * FROM recipeItemFlat WHERE recipe_id = 3047
There is one possible issue with this solution which I usually use for rights and roles resolutions and such things where it does not matter but here it could:
If there is a recipe A that has two recipes B and C and both B and C have a recipe D, then D is going to be only once in the result and adding up the quantities would give a wrong result. If that is relevant, don't use DISTINCT to determine the end of the loop in the table valued function but a maximum level like you did in your example.

Related

Change SQL where clause based on a parameter?

I need to alter a query to do something like this (following is generic pseudo-code):
if (tag list contains all tags in the database) {
select every product regardless of tag, even products with null tag
}
else { //tag list is only a few tags long
select only the products that have a tag in the tag list
}
I have tried doing stuff like this, but it doesn't work:
SELECT p.Id
FROM Tags t
JOIN Products p ON p.TagId = t.Id
WHERE ((EXISTS(select Id from Tags EXCEPT select item from dbo.SplitString(#tagList,',')) AND p.TagId in (select item from dbo.SplitString(#tagList,',')))
OR (p.TagId in (select item from dbo.SplitString(#tagList,',')) or p.TagId is null))
This will take place inside of a large query with a large WHERE clause, so putting two slightly different queries in an IF ELSE statement is not ideal.
What should I do to get this working?
First things first: you should use properly normalized input parameters. Ideally this would be a Table-Valued parameter, however if you cannot do that then you could insert the split values into a table variable
DECLARE #tags TABLE (TagId int PRIMARY KEY);
INSERT #tags (TagId)
SELECT item
FROM dbo.SplitString(#tagList, ',');
Next, the easiest way is probably to just find out first whether all tags match, and store that in a variable.
DECLARE #isAllTags bit = CASE WHEN EXISTS(
SELECT t.Id
FROM Tags t
EXCEPT
SELECT tList.Id
FROM #tags tList
) THEN 0 ELSE 1 END;
SELECT p.Id
FROM Products p
WHERE #isAllTags = 1
OR EXISTS (SELECT 1
FROM #tags tList
WHERE tList.TagId = p.TagId);
You could merge these queries, but it's unlikely to be more performant.
You could even do it in a very set-based fashion, but it's probably going to be really slow
SELECT p.Id
FROM Products p
WHERE EXISTS (SELECT 1
FROM Tags t
LEFT JOIN #tags tList ON tList.TagId = t.Id
CROSS APPLY (VALUES (CASE WHEN p.TagId = tList.TagId THEN 1 END )) v(ProductMatch)
HAVING COUNT(t.Id) = COUNT(tList.TagId) -- all exist
OR COUNT(v.ProductMatch) > 0 -- at least one match
);
Try this, this might work.
SELECT p.Id
FROM
Products p LEFT JOIN
Tags t ON p.TagId = t.Id
WHERE
t.Id is null
OR
(t.id is not null and
t.Id in (SELECT value FROM STRING_SPLIT(#tagList, ',')))
I just tested - works

Query to fetch all referenced entities recursively

I have a datamodels which consists of 'Claims' which (to make things simple for stackoverflow) only has an OpenAmount field. There are two other tables, 'ClaimCoupling' and 'ClaimEntryReference'.
The ClaimCoupling table directly references back to the Claim table and the ClaimEntryReference is effectively the booking of a received amount that can be booked over multiple claims (See ClaimEntry_ID). See this diagram;
For simplicity I've removed all amounts as that's not what I am currently struggling with.
What I want is a query that will start # the Claim table, and fetches all a claim with an OpenAmount which is <> 0. However I want to be able to print out an accurate report of how this OpenAmount came to be, which means I'll need to also print out any Claims coupled to this claim. To make it even more interesting the same thing applies to the bookings, if a booking was made on claim X and claim Y and only X has an open amount I want to fetch both X and Y so I can then show the payment which was booked as a whole.
I've attempted to do this with a recursive CTE but this (rightfully) blows up on the circulair references. I figured I'd fix that with a simple where statement where I would say only recursively add records which are not yet part of CTE but this is not allowed....
WITH coupledClaims AS (
--Get all unique combinations
SELECT cc.SubstractedFromClaim_ID AS Claim_ID,
cc.AddedToClaim_ID AS Linked_Claim_ID FROM dbo.ClaimCoupling cc
UNION
SELECT cc.AddedToClaim_ID AS Claim_ID,
cc.SubstractedFromClaim_ID AS Linked_Claim_ID FROM dbo.ClaimCoupling cc
),
MyClaims as
(
SELECT * FROM Claim WHERE OpenAmount <> 0
UNION ALL
SELECT c.* FROM coupledClaims JOIN MyClaims mc ON coupledClaims.claim_id = mc.ID JOIN claim c ON c.ID = coupledClaims.linked_Claim_ID
WHERE c.ID NOT IN (SELECT ID FROM MyClaims)
)
SELECT * FROM MyClaims
After fiddling around with that for way too long I decided I'd do it with an actual loop... ##Rowcount and simply manually add them to a table variable but as I was writing this solution (which I'm sure I can get to work) I figured I'd ask here first because I don't like writing loops in TSQL as I always feel it's ugly and inefficient.
See the following sql Fiddle for the data models and some test data (I commented out the recursive part as otherwise I was not allowed to create a link);
http://sqlfiddle.com/#!6/129ad5/7/0
I'm hoping someone here will have a great way of handling this problem (likely I'm doing something wrong with the recursive CTE). For completion this is done on MS SQL 2016.
So here is what I've learned and done so far. Thanks to the comment of habo which refers to the following question; Infinite loop in CTE when parsing self-referencing table
Firstly I decided to at least 'solve' my problem and wrote some manual recursion, this solves my problem but is not as 'pretty' as the CTE solution which I was hoping/thinking would be easier to read as well as out perform the manual recursion solution.
Manual Recursion
/****************************/
/* CLAIMS AND PAYMENT LOGIC */
/****************************/
DECLARE #rows as INT = 0
DECLARE #relevantClaimIds as Table(
Debtor_ID INT,
Claim_ID int
)
SET NOCOUNT ON
--Get anchor condition
INSERT INTO #relevantClaimIds (Debtor_ID, Claim_ID)
select Debtor_ID, ID
from Claim c
WHERE OpenAmount <> 0
--Do recursion
WHILE #rows <> (SELECT COUNT(*) FROM #relevantClaimIds)
BEGIN
set #rows = (SELECT COUNT(*) FROM #relevantClaimIds)
--Subtracted
INSERT #relevantClaimIds (Debtor_ID, Claim_ID)
SELECT DISTINCT c.Debtor_ID, c.id
FROM claim c
inner join claimcoupling cc on cc.SubstractedFromClaim_ID = c.ID
JOIN #relevantClaimIds rci on rci.Claim_ID = cc.AddedToClaim_ID
--might be multiple paths to this recursion so eliminate duplicates
left join #relevantClaimIds dup on dup.Claim_ID = c.id
WHERE dup.Claim_ID is null
--Added
INSERT #relevantClaimIds (Debtor_ID, Claim_ID)
SELECT DISTINCT c.Debtor_ID, c.id
FROM claim c
inner join claimcoupling cc on cc.AddedToClaim_ID = c.ID
JOIN #relevantClaimIds rci on rci.Claim_ID = cc.SubstractedFromClaim_ID
--might be multiple paths to this recursion so eliminate duplicates
left join #relevantClaimIds dup on dup.Claim_ID = c.id
WHERE dup.Claim_ID is null
--Payments
INSERT #relevantClaimIds (Debtor_ID, Claim_ID)
SELECT DISTINCT c.Debtor_ID, c.id
FROM #relevantClaimIds f
join ClaimEntryReference cer on f.Claim_ID = cer.Claim_ID
JOIN ClaimEntryReference cer_linked on cer.ClaimEntry_ID = cer_linked.ClaimEntry_ID AND cer.ID <> cer_linked.ID
JOIN Claim c on c.ID = cer_linked.Claim_ID
--might be multiple paths to this recursion so eliminate duplicates
left join #relevantClaimIds dup on dup.Claim_ID = c.id
WHERE dup.Claim_ID is null
END
Then after I received and read the comment I decided to try the CTE solution which looks like this;
CTE Recursion
with Tree as
(
select Debtor_ID, ID AS Claim_ID, CAST(ID AS VARCHAR(MAX)) AS levels
from Claim c
WHERE OpenAmount <> 0
UNION ALL
SELECT c.Debtor_ID, c.id, t.levels + ',' + CAST(c.ID AS VARCHAR(MAX)) AS levels
FROM claim c
inner join claimcoupling cc on cc.SubstractedFromClaim_ID = c.ID
JOIN Tree t on t.Claim_ID = cc.AddedToClaim_ID
WHERE (','+T.levels+',' not like '%,'+cast(c.ID as varchar(max))+',%')
UNION ALL
SELECT c.Debtor_ID, c.id, t.levels + ',' + CAST(c.ID AS VARCHAR(MAX)) AS levels
FROM claim c
inner join claimcoupling cc on cc.AddedToClaim_ID = c.ID
JOIN Tree t on t.Claim_ID = cc.SubstractedFromClaim_ID
WHERE (','+T.levels+',' not like '%,'+cast(c.ID as varchar(max))+',%')
UNION ALL
SELECT c.Debtor_ID, c.id, t.levels + ',' + CAST(c.ID AS VARCHAR(MAX)) AS levels
FROM Tree t
join ClaimEntryReference cer on t.Claim_ID = cer.Claim_ID
JOIN ClaimEntryReference cer_linked on cer.ClaimEntry_ID = cer_linked.ClaimEntry_ID AND cer.ID <> cer_linked.ID
JOIN Claim c on c.ID = cer_linked.Claim_ID
WHERE (','+T.levels+',' not like '%,'+cast(c.ID as varchar(max))+',%')
)
select DISTINCT Tree.Debtor_ID, Tree.Claim_ID
from Tree
This solution is indeed a lot 'shorter' and easier on the eyes but does it actually perform better?
Performance differences
Manual; CPU 16, Reads 1793, Duration 13
CTE; CPU 47, Reads 4001, Duration 48
Conclusion
Not sure if it's due to the varchar cast that is required in the CTE solution or that it has to do one extra iteration before completing it's recursion but it actually requires more resources on all fronts than the manual recursion.
In the end it is possible with CTE however looks aren't everything (thank god ;-)) performance wise sticking with the manual recursion seems like a better route.

How can I efficiently query and index a view that involves a full outer join?

We have a data processing application that has two separate paths that should eventually produce similar results. We also have a database-backed monitoring service that compares and utilizes the results of this processing. At any point in time, either of the two paths may or may not have produced results for the operation, but I want to be able to query a view that tells me about any results that have been produced.
Here's a simplified example of the schema I started with:
create table LeftResult (
DateId int not null,
EntityId int not null,
ProcessingValue int not null
primary key ( DateId, EntityId ) )
go
create table RightResult (
DateId int not null,
EntityId int not null,
ProcessingValue int not null
primary key ( DateId, EntityId ) )
go
create view CombinedResults
as
select
DateId = isnull( l.DateId, r.DateId ),
EntityId = isnull( l.EntityId, r.EntityId ),
LeftValue = l.ProcessingValue,
RightValue = r.ProcessingValue,
MaxValue = case
when isnull( l.ProcessingValue, 0 ) > isnull( r.ProcessingValue, 0 )
then isnull( l.ProcessingValue, 0 )
else isnull( r.ProcessingValue, 0 )
end
from LeftResult l
full outer join RightResult r
on l.DateId = r.DateId
and l.EntityId = r.EntityId
go
The problem with this is that Sql Server always chooses to scan the PK on LeftResult and RightResult rather than seek, even when queries to the view include DateId and EntityId as predicates. This seems to be due to the isnull() checks on the results. (I've even tried using index hints and forceseek, but without avail -- the query plan still shows a scan.)
However, I can't simply replace the isnull() results, since either the left or right side could be missing from the join (because the associated process hasn't populated the table yet).
I don't particularly want to duplicate the MaxValue logic across all of the consumers of the view (in reality, it's quite a bit more complex calculation, but the same idea applies.)
Is there a good strategy I can use to structure this view or queries against it so that the
query plan will utilize a seek rather than a scan?
try using left outer join for one of the tables, then union those results with the excluded rows from the other table.
like:
select (...)
from LeftResult l
left outer join RightResult r
on l.DateId = r.DateId
and l.EntityId = r.EntityId
(...)
UNION ALL
select (...)
from RightResult r
leftouter join LeftResult l
on l.DateId = r.DateId
and l.EntityId = r.EntityId
WHERE
l.dateid is null

SQL Condition based dataset

I have a SQL Server database that I did not design.
The employees have degrees, licensures and credentials stored in a few different tables.
I have written the query to join all of this information together so I can see an over all result of what the data looks like. I have been asked to create a view for this data that returns only the highest degree they have obtained and the two highest certifications.
The problem is, as it is pre existing data, there is no hierarchy built into the data. All of the degrees and certifications are simply stored as a string associated with their employee number.
The first logical step was to create an adjacency list(I believe this is the correct term).
For example 'MD' is the highest degree you can obtain in our list. So I have given that the "ranking" of 1. The next lower degree is "ranked" as 2. and so forth.
I can join on the text field that contains these and return their associated rank.
The problem I am having is returning only the two highest based on this ranking.
If the employee has multiple degrees or certifications they are listed on a second or third row. From a logical standpoint, I need to group the employee ID, First name and Last name. Then some how concatenate the degrees, certifications and licensures based on the "ranking" I created for them. It is not a true hierarchy in the way that I am thinking about it because I only need to know the highest two and not necessarily the relationship between the results.
Another potential caveat is that the database must remain in SQL Server 2000 compatibility mode.
Any help that can be given would be much appreciated. Thank you.
select a.EduRank as 'Licensure Rank',
b.EduRank as 'Degree Rank',
EmpComp.EecEmpNo,
EmpPers.EepNameFirst,
EmpPers.EepNameLast,
RTRIM(EmpEduc.EfeLevel),
RTRIM(EmpLicns.ElcLicenseID),
a.EduType,
b.EduType
from empcomp
join EmpPers on empcomp.eeceeid = EmpPers.eepEEID
join EmpEduc on empcomp.Eeceeid = EmpEduc.EfeEEID
join EmpLicns on empcomp.eeceeid = EmpLicns.ElcEEID
join yvDegreeRanks a on a.EduCode = EmpLicns.ElcLicenseID
join yvDegreeRanks b on b.EduCode = EmpEduc.EfeLevel
I think I can see what your problem is - however I'm not sure. Joining the tables together has given you "double rows". The "quick-and-dirty" way to solve this query, would be to use Subqueries other than Joins. Doing so, you can select only the TOP 1 Degree, and TOP 2 certifications.
EDIT : Can you try this query ?
SELECT *
FROM employSELECT tblLicensures.EduRank as 'Licensure Rank',
tblDegrees.EduRank as 'Degree Rank',
EmpComp.EecEmpNo,
EmpPers.EepNameFirst,
EmpPers.EepNameLast,
RTRIM(tblDegrees.EfeLevel),
RTRIM(tblLicensures.ElcLicenseID),
tblLicensures.EduType,
tblDegrees.EduType
FROM EmpComp
LEFT OUTER JOIN EmpPers ON empcom.eeceeid = EmpPers.eepEEID
LEFT OUTER JOIN
-- Select TOP 2 Licensure Ranks
(
SELECT TOP 2 a.EduType, a.EduRank, EmpLicns.ElcEEID
FROM yvDegreeRanks a
INNER JOIN EmpLicns on a.EduCode = EmpLicns.ElcLicenseID
WHERE EmpLincs.ElcEEID = empcomp.eeceeid
ORDER BY a.EduRank ASC
) AS tblLicensures ON tblLicensures.ElcEEID = empcomp.Eeceeid
LEFT OUTER JOIN
-- SELECT TOP 1 Degree
(
SELECT TOP 1 b.EduType, b.EduRank, EmpEduc.EfeEEID, EmpEduc.EfeLevel
FROM yvDegreeRanks b
INNER JOIN EmpEduc on b.EduCode = EmpEduc.EfeLevel
WHERE EmpEduc.EfeEEID = empcomp.Eeceeid
ORDER BY b.EduRank ASC
) AS tblDegrees ON tblDegrees.EfeEEID = empcomp.Eeceeid
This is not the most elegant solution, but hopefully it will at least help you out in some way.
create table #dataset (
licensurerank [datatype],
degreerank [datatype],
employeeid [datatype],
firstname varchar,
lastname varchar,
efeLevel [datatype],
elclicenseid [datatype],
edutype1 [datatype],
edutype2 [datatype]
)
select distinct identity(int,1,1) [ID], EecEmpNo into #employeeList from EmpComp
declare
#count int,
#rows int,
#employeeNo int
select * from #employeeList
set #rows = ##rowcount
set #count = 1
while #count <= #ROWS
begin
select #employeeNo = EecEmpNo from #employeeList where id = #count
insert into #dataset
select top 2 a.EduRank as 'Licensure Rank',
b.EduRank as 'Degree Rank',
EmpComp.EecEmpNo,
EmpPers.EepNameFirst,
EmpPers.EepNameLast,
RTRIM(EmpEduc.EfeLevel),
RTRIM(EmpLicns.ElcLicenseID),
a.EduType,
b.EduType
from empcomp
join EmpPers on empcomp.eeceeid = EmpPers.eepEEID
join EmpEduc on empcomp.Eeceeid = EmpEduc.EfeEEID
join EmpLicns on empcomp.eeceeid = EmpLicns.ElcEEID
join yvDegreeRanks a on a.EduCode = EmpLicns.ElcLicenseID
join yvDegreeRanks b on b.EduCode = EmpEduc.EfeLevel
where EmpComp.EecEmpNo = #employeeNo
set #count = #count + 1
end
Have tables for employees, types of degrees (including a rank), types of certs (including a rank), and join tables employees_degrees and employees_certs. [It might be better to put degrees and certs in one table with a flag is_degree, if all their other fields are the same.] You can extract the existing string values and replace them with FK ids into the degree and cert tables.
The query itself is harder, because PARTITION BY is not available in SQL Server 2000 (according to Google). UW's answer has at least two problems: you need LEFT JOINs because not all employees have degrees and certs, and there is no ORDER BY to show what you want to take the best of. TOP 2 subqueries are particularly difficult to use in this context. So for that, I can't yet give an answer.

querying 2 tables with the same spec for the differences

I recently had to solve this problem and find I've needed this info many times in the past so I thought I would post it. Assuming the following table def, how would you write a query to find all differences between the two?
table def:
CREATE TABLE feed_tbl
(
code varchar(15),
name varchar(40),
status char(1),
update char(1)
CONSTRAINT feed_tbl_PK PRIMARY KEY (code)
CREATE TABLE data_tbl
(
code varchar(15),
name varchar(40),
status char(1),
update char(1)
CONSTRAINT data_tbl_PK PRIMARY KEY (code)
Here is my solution, as a view using three queries joined by unions. The diff_type specified is how the record needs updated: deleted from _data(2), updated in _data(1), or added to _data(0)
CREATE VIEW delta_vw AS (
SELECT feed_tbl.code, feed_tbl.name, feed_tbl.status, feed_tbl.update, 0 as diff_type
FROM feed_tbl LEFT OUTER JOIN
data_tbl ON feed_tbl.code = data_tbl.code
WHERE (data_tbl.code IS NULL)
UNION
SELECT feed_tbl.code, feed_tbl.name, feed_tbl.status, feed_tbl.update, 1 as diff_type
FROM data_tbl RIGHT OUTER JOIN
feed_tbl ON data_tbl.code = feed_tbl.code
where (feed_tbl.name <> data_tbl.name) OR
(data_tbl.status <> feed_tbl.status) OR
(data_tbl.update <> feed_tbl.update)
UNION
SELECT data_tbl.code, data_tbl.name, data_tbl.status, data_tbl.update, 2 as diff_type
FROM feed_tbl LEFT OUTER JOIN
data_tbl ON data_tbl.code = feed_tbl.code
WHERE (feed_tbl.code IS NULL)
)
UNION will remove duplicates, so just UNION the two together, then search for anything with more than one entry. Given "code" as a primary key, you can say:
edit 0: modified to include differences in the PK field itself
edit 1: if you use this in real life, be sure to list the actual column names. Dont use dot-star, since the UNION operation requires result sets to have exactly matching columns. This example would break if you added / removed a column from one of the tables.
select dt.*
from
data_tbl dt
,(
select code
from
(
select * from feed_tbl
union
select * from data_tbl
)
group by code
having count(*) > 1
) diffs --"diffs" will return all differences *except* those in the primary key itself
where diffs.code = dt.code
union --plus the ones that are only in feed, but not in data
select * from feed_tbl ft where not exists(select code from data_tbl dt where dt.code = ft.code)
union --plus the ones that are only in data, but not in feed
select * from data_tbl dt where not exists(select code from feed_tbl ft where ft.code = dt.code)
I would use a minor variation in the second union:
where (ISNULL(feed_tbl.name, 'NONAME') <> ISNULL(data_tbl.name, 'NONAME')) OR
(ISNULL(data_tbl.status, 'NOSTATUS') <> ISNULL(feed_tbl.status, 'NOSTATUS')) OR
(ISNULL(data_tbl.update, '12/31/2039') <> ISNULL(feed_tbl.update, '12/31/2039'))
For reasons I have never understood, NULL does not equal NULL (at least in SQL Server).
You could also use a FULL OUTER JOIN and a CASE ... END statement on the diff_type column along with the aforementioned where clause in querying 2 tables with the same spec for the differences
That would probably achieve the same results, but in one query.