EXISTS and NOT EXISTS in a correlated subquery - sql

I've been trying to work out how to do a particular query for a day or so now and it has gotten to the point where I need some outside help. Hence my question.
Given the following data;
DECLARE #Data AS TABLE
(
OrgId INT,
ThingId INT
)
DECLARE #ReplacementData AS TABLE
(
OldThingId INT,
NewThingId INT
)
INSERT INTO #Data (OrgId, ThingId)
VALUES (1, 2), (1, 3), (1, 4),
(2, 1), (2, 4),
(3, 3), (3, 4)
INSERT INTO #ReplacementData (OldThingId, NewThingId)
VALUES (3, 4), (2, 5)
I want to find any organisation that has a "thing" that has been replaced as denoted in the #ReplacementData table variable. I'd want to see the org id, the thing it is that they have that has been replaced and the id of the thing that should replace it. So for example given the data above, I should see;
Org id, Thing Id, Replacement Thing Id org doesn't have but should have
1, 2, 5 -- As Org 1 has 2, but not 5
I've had many attempts at trying to get this working, and I just can't seem to get my head around how to go about it. The following are a couple of my attempts, but I think I am just way off;
-- Attempt using correlated subqueries and EXISTS clauses
-- Show all orgs that have the old thing, but not the new thing
-- Ideally, limit results to OrgId, OldThingId and the NewThingId that they should now have too
SELECT *
FROM #Data d
WHERE EXISTS (SELECT *
FROM #Data oldstuff
WHERE oldstuff.OrgId = d.OrgId
AND oldstuff.ThingId IN
(SELECT OldThingID
FROM #ReplacementData))
AND NOT EXISTS (SELECT *
FROM #Data oldstuff
WHERE oldstuff.OrgId = d.OrgId
AND oldstuff.ThingId IN
(SELECT NewThingID
FROM #ReplacementData))
-- Attempt at using a JOIN to only include those old things that the org has (via the where clause)
-- Also try exists to show missing new things.
SELECT *
FROM #Data d
LEFT JOIN #ReplacementData rd ON rd.OldThingId = d.ThingId
WHERE NOT EXISTS (
SELECT *
FROM #Data dta
INNER JOIN #ReplacementData rep ON rep.NewThingId = dta.ThingId
WHERE dta.OrgId = d.OrgId
)
AND rd.OldThingId IS NOT NULL
Any help on this is much appreciated. I may well be going about it completely wrong, so please let me know if there is a better way of tackling this type of problem.

Try this out and let me know.
DECLARE #Data AS TABLE
(
OrgId INT,
ThingId INT
)
DECLARE #ReplacementData AS TABLE
(
OldThingId INT,
NewThingId INT
)
INSERT INTO #Data (OrgId, ThingId)
VALUES (1, 2), (1, 3), (1, 4),
(2, 1), (2, 4),
(3, 3), (3, 4)
INSERT INTO #ReplacementData (OldThingId, NewThingId)
VALUES (3, 4), (2, 5)
SELECT D.OrgId, RD.*
FROM #Data D
JOIN #ReplacementData RD
ON D.ThingId=RD.OldThingId
LEFT OUTER JOIN #Data EXCLUDE
ON D.OrgId = EXCLUDE.OrgId
AND RD.NewThingId = EXCLUDE.ThingId
WHERE EXCLUDE.OrgId IS NULL

Related

SQL Server Query to fetch nested data

I have a table like this -
declare #tmpData as table
(
MainId int,
RefId int
)
INSERT INTO #tmpData
(MainId,
RefId)
VALUES (1, NULL),
(2, 1),
(3, 2),
(4, 3),
(5, NULL),
(6, 5);
SO, If I pass a value for example - 1
then it should return all rows where value 1 is linked directly or indirectly.
And result should be - (Here 1 is ref with MainId 2, and 2 is ref with Main Id 3 and so on...) MaiId 5 and 6 is not related to 1 so output is -
Any one please provide sql server query for the same. Thanks
I tried by applying left join with same table on MainId and RefId.
But not got desired output.
You need a recursive CTE (dbfiddle)
WITH R
AS (SELECT t.MainId,
t.RefId
FROM #tmpData t
WHERE t.MainId = 1
UNION ALL
SELECT t.MainId,
t.RefId
FROM #tmpData t
JOIN R
ON t.RefId = r.MainId)
SELECT *
FROM R

Find data by multiple Lookup table clauses

declare #Character table (id int, [name] varchar(12));
insert into #Character (id, [name])
values
(1, 'tom'),
(2, 'jerry'),
(3, 'dog');
declare #NameToCharacter table (id int, nameId int, characterId int);
insert into #NameToCharacter (id, nameId, characterId)
values
(1, 1, 1),
(2, 1, 3),
(3, 1, 2),
(4, 2, 1);
The Name Table has more than just 1,2,3 and the list to parse on is dynamic
NameTable
id | name
----------
1 foo
2 bar
3 steak
CharacterTable
id | name
---------
1 tom
2 jerry
3 dog
NameToCharacterTable
id | nameId | characterId
1 1 1
2 1 3
3 1 2
4 2 1
I am looking for a query that will return a character that has two names. For example
With the above data only "tom" will be returned.
SELECT *
FROM nameToCharacterTable
WHERE nameId in (1,2)
The in clause will return every row that has a 1 or a 3. I want to only return the rows that have both a 1 and a 3.
I am stumped I have tried everything I know and do not want to resort to dynamic SQL. Any help would be great
The 1,3 in this example will be a dynamic list of integers. for example it could be 1,3,4,5,.....
Filter out a count of how many times the Character appears in the CharacterToName table matching the list you are providing (which I have assumed you can convert into a table variable or temp table) e.g.
declare #Character table (id int, [name] varchar(12));
insert into #Character (id, [name])
values
(1, 'tom'),
(2, 'jerry'),
(3, 'dog');
declare #NameToCharacter table (id int, nameId int, characterId int);
insert into #NameToCharacter (id, nameId, characterId)
values
(1, 1, 1),
(2, 1, 3),
(3, 1, 2),
(4, 2, 1);
declare #RequiredNames table (nameId int);
insert into #RequiredNames (nameId)
values
(1),
(2);
select *
from #Character C
where (
select count(*)
from #NameToCharacter NC
where NC.characterId = c.id
and NC.nameId in (select nameId from #RequiredNames)
) = 2;
Returns:
id
name
1
tom
Note: Providing DDL+DML as shown here makes it much easier for people to assist you.
This is classic Relational Division With Remainder.
There are a number of different solutions. #DaleK has given you an excellent one: inner-join everything, then check that each set has the right amount. This is normally the fastest solution.
If you want to ensure it works with a dynamic amount of rows, just change the last line to
) = (SELECT COUNT(*) FROM #RequiredNames);
Two other common solutions exist.
Left-join and check that all rows were joined
SELECT *
FROM #Character c
WHERE EXISTS (SELECT 1
FROM #RequiredNames rn
LEFT JOIN #NameToCharacter nc ON nc.nameId = rn.nameId AND nc.characterId = c.id
HAVING COUNT(*) = COUNT(nc.nameId) -- all rows are joined
);
Double anti-join, in other words: there are no "required" that are "not in the set"
SELECT *
FROM #Character c
WHERE NOT EXISTS (SELECT 1
FROM #RequiredNames rn
WHERE NOT EXISTS (SELECT 1
FROM #NameToCharacter nc
WHERE nc.nameId = rn.nameId AND nc.characterId = c.id
)
);
A variation on the one from the other answer uses a windowed aggregate instead of a subquery. I don't think this is performant, but it may have uses in certain cases.
SELECT *
FROM #Character c
WHERE EXISTS (SELECT 1
FROM (
SELECT *, COUNT(*) OVER () AS cnt
FROM #RequiredNames
) rn
JOIN #NameToCharacter nc ON nc.nameId = rn.nameId AND nc.characterId = c.id
HAVING COUNT(*) = MIN(rn.cnt)
);
db<>fiddle

Get the number of rows per tenant of several tables

Let's say I have 2 tables: Tenants and Wargles. Wargles has a Foreign Key towards Tenants called TenantId. If I want to get number of wargles per tenant, I can do this:
SELECT t.Id as TenantId, count(w.Id) as WargleCount
FROM Tenants t
JOIN Wargles w ON w.TenantId = t.Id
GROUP BY t.Id
Now, let's say I have another table, Fiddles, that, as Wargles has a FK towards Tenants. How can I add another column to the query above, so I get the number of wargles and the number of fiddles for each tenant?
I tried with this:
SELECT t.Id as TenantId, count(w.Id) as WargleCount, count(f.Id) as FiddleCount
FROM Tenants t
JOIN Wargles w ON w.TenantId = t.Id
JOIN Fiddles f ON f.TenantId = t.Id
GROUP BY t.Id
But this won't work, since it would give me the same number both for WargleCount and FiddleCount, the product of the rows from both tables.
Use two subselects
SELECT t.Id as TenantId,
(SELECT Count(1) FROM Fiddles F WHERE F.TenantId = T.Id) as FiddleCount,
(SELECT Count(1) FROM Wargles W WHERE W.TenantId = T.Id) as WargleCount
FROM Tenants t
The most efficient method is probably to use correlated subqueries:
SELECT t.Id as TenantId,
(SELECT COUNT(*)
FROM Wargles w
WHERE w.TenantId = t.Id
) as WargleCount, count(f.Id) as FiddleCount
(SELECT COUNT(*)
FROM Fiddles f
WHERE f.TenantId = t.Id
) as FiddleCount
FROM Tenants t;
In particular, this can take advantage of indexes on Wargles(TenantId) and Fiddles(TenantId).
In your case, as extendable solution, I would recommend Scalar Function usage.
/* SAMPLE DATA ARRANGE */
CREATE TABLE Tenants (Id INT, Title NVARCHAR(5)) ; INSERT INTO Tenants VALUES (1, 'A'), (2, 'B') , (3, 'C');
CREATE TABLE Wargles (Id INT,TenantId INT);INSERT INTO Wargles VALUES (1, 1), (2, 1) , (3, 1) , (4, 2), (5, 2) , (6, 1), (7, 3) , (8, 3);
CREATE TABLE Fiddles (Id INT,TenantId INT);INSERT INTO Fiddles VALUES (1, 1), (2, 1) , (3, 1) , (4, 2), (5, 2) , (6, 2), (7, 3) , (8, 2);
The Function
/*NEEDED CODE*/
CREATE FUNCTION dbo.ufnGetTenantsNo ( #Id AS INT , #Tb AS INT)
RETURNS INT
AS
BEGIN
DECLARE #Result INT = 0;
IF (#TB = 1)
SELECT #Result = COUNT(*)
FROM Wargles
WHERE TenantId = #Id
ELSE
SELECT #Result = COUNT(*)
FROM Fiddles
WHERE TenantId = #Id
RETURN #Result
END
GO
Select Statement
SELECT Id AS TenantId
,dbo.ufnGetTenantsNo(Id, 1) AS WargleCount
,dbo.ufnGetTenantsNo(Id, 2) AS FiddleCount
FROM Tenants

Sql Server While Loop with Changing Condition

I have a User Table in my database that contains two fields
user_id
manager_id
I am trying to construct a query to list all of the manager_ids that are associated with a user_id in a hierarchical structure.
So if i give a user_id, i will get that users manager, followed by that persons manager all the way to the very top.
So far i have tried but it doesnt give what i need:
WITH cte(user_id, manager_id) as (
SELECT user_id, manager_id
FROM user
WHERE manager_id=#userid
UNION ALL
SELECT u.user_id, u.manager_id,
FROM user u
INNER JOIN cte c on e.manager_id = c.employee_id
)
INSERT INTO #tbl (manager_id)
select user_id, manager_id from cte;
If anyone can point me in the right direction that would be great.
I thought about a While loop but this may not be very efficient and im not too sure how to implement that.
OP asked for a while loop, and while (ha, pun) this may not be the best way... Ask and you shall receive. (:
Here is sample data I created (in the future, please provide this):
CREATE TABLE #temp (userID int, managerID int)
INSERT INTO #temp VALUES (1, 3)
INSERT INTO #temp VALUES (2, 3)
INSERT INTO #temp VALUES (3, 7)
INSERT INTO #temp VALUES (4, 6)
INSERT INTO #temp VALUES (5, 7)
INSERT INTO #temp VALUES (6, 9)
INSERT INTO #temp VALUES (7, 10)
INSERT INTO #temp VALUES (8, 10)
INSERT INTO #temp VALUES (9, 10)
INSERT INTO #temp VALUES (10, 12)
INSERT INTO #temp VALUES (11, 12)
INSERT INTO #temp VALUES (12, NULL)
While Loop:
CREATE TABLE #results (userID INT, managerID INT)
DECLARE #currentUser INT = 1 -- Would be your parameter!
DECLARE #maxUser INT
DECLARE #userManager INT
SELECT #maxUser = MAX(userID) FROM #temp
WHILE #currentUser <= #maxUser
BEGIN
SELECT #userManager = managerID FROM #temp WHERE userID = #currentUser
INSERT INTO #results VALUES (#currentUser, #userManager)
SET #currentUser = #userManager
END
SELECT * FROM #results
DROP TABLE #temp
DROP TABLE #results
Get rid of this column list in your CTE declaration that has nothing to do with the columns you are actually selecting in the CTE:
WITH cte(employee_id, name, reports_to_emp_no, job_number) as (
Just make it this:
WITH cte as (
I recommend recursive solution:
WITH Parent AS
(
SELECT * FROM user WHERE user_id=#userId
UNION ALL
SELECT T.* FROM user T
JOIN Parent P ON P.manager_id=T.user_id
)
SELECT * FROM Parent
To see demo, run following:
SELECT * INTO #t FROM (VALUES (1,NULL),(2,1),(3,2),(4,1)) T(user_id,manager_id);
DECLARE #userId int = 3;
WITH Parent AS
(
SELECT * FROM #t WHERE user_id=#userId
UNION ALL
SELECT T.* FROM #t T
JOIN Parent P ON P.manager_id=T.user_id
)
SELECT * FROM Parent

SQL return only distinct IDs from LEFT JOIN

I've inherited some fun SQL and am trying to figure out how to how to eliminate rows with duplicate IDs. Our indexes are stored in a somewhat columnar format and then we pivot all the rows into one with the values as different columns.
The below sample returns three rows of unique data, but the IDs are duplicated. I need just two rows with unique IDs (and the other columns that go along with it). I know I'll be losing some data, but I just need one matching row per ID to the query (first, top, oldest, newest, whatever).
I've tried using DISTINCT, GROUP BY, and ROW_NUMBER, but I keep getting the syntax wrong, or using them in the wrong place.
I'm also open to rewriting the query completely in a way that is reusable as I currently have to generate this on the fly (cardtypes and cardindexes are user defined) and would love to be able to create a stored procedure. Thanks in advance!
declare #cardtypes table ([ID] int, [Name] nvarchar(50))
declare #cards table ([ID] int, [CardTypeID] int, [Name] nvarchar(50))
declare #cardindexes table ([ID] int, [CardID] int, [IndexType] int, [StringVal] nvarchar(255), [DateVal] datetime)
INSERT INTO #cardtypes VALUES (1, 'Funny Cards')
INSERT INTO #cardtypes VALUES (2, 'Sad Cards')
INSERT INTO #cards VALUES (1, 1, 'Bunnies')
INSERT INTO #cards VALUES (2, 1, 'Dogs')
INSERT INTO #cards VALUES (3, 1, 'Cat')
INSERT INTO #cards VALUES (4, 1, 'Cat2')
INSERT INTO #cardindexes VALUES (1, 1, 1, 'Bunnies', null)
INSERT INTO #cardindexes VALUES (2, 1, 1, 'playing', null)
INSERT INTO #cardindexes VALUES (3, 1, 2, null, '2014-09-21')
INSERT INTO #cardindexes VALUES (4, 2, 1, 'Dogs', null)
INSERT INTO #cardindexes VALUES (5, 2, 1, 'playing', null)
INSERT INTO #cardindexes VALUES (6, 2, 1, 'poker', null)
INSERT INTO #cardindexes VALUES (7, 2, 2, null, '2014-09-22')
SELECT TOP(100)
[ID] = c.[ID],
[Name] = c.[Name],
[Keyword] = [colKeyword].[StringVal],
[DateAdded] = [colDateAdded].[DateVal]
FROM #cards AS c
LEFT JOIN #cardindexes AS [colKeyword] ON [colKeyword].[CardID] = c.ID AND [colKeyword].[IndexType] = 1
LEFT JOIN #cardindexes AS [colDateAdded] ON [colDateAdded].[CardID] = c.ID AND [colDateAdded].[IndexType] = 2
WHERE [colKeyword].[StringVal] LIKE 'p%' AND c.[CardTypeID] = 1
ORDER BY [DateAdded]
Edit:
While both solutions are valid, I ended up using the MAX() solution from #popovitsj as it was easier to implement. The issue of data coming from multiple rows doesn't really factor in for me as all rows are essentially part of the same record. I will most likely use both solutions depending on my needs.
Here's my updated query (as it didn't quite match the answer):
SELECT TOP(100)
[ID] = c.[ID],
[Name] = MAX(c.[Name]),
[Keyword] = MAX([colKeyword].[StringVal]),
[DateAdded] = MAX([colDateAdded].[DateVal])
FROM #cards AS c
LEFT JOIN #cardindexes AS [colKeyword] ON [colKeyword].[CardID] = c.ID AND [colKeyword].[IndexType] = 1
LEFT JOIN #cardindexes AS [colDateAdded] ON [colDateAdded].[CardID] = c.ID AND [colDateAdded].[IndexType] = 2
WHERE [colKeyword].[StringVal] LIKE 'p%' AND c.[CardTypeID] = 1
GROUP BY c.ID
ORDER BY [DateAdded]
You could use MAX or MIN to 'decide' on what to display for the other columns in the rows that are duplicate.
SELECT ID, MAX(Name), MAX(Keyword), MAX(DateAdded)
(...)
GROUP BY ID;
using row number windowed function along with a CTE will do this pretty well. For example:
;With preResult AS (
SELECT TOP(100)
[ID] = c.[ID],
[Name] = c.[Name],
[Keyword] = [colKeyword].[StringVal],
[DateAdded] = [colDateAdded].[DateVal],
ROW_NUMBER()OVER(PARTITION BY c.ID ORDER BY [colDateAdded].[DateVal]) rn
FROM #cards AS c
LEFT JOIN #cardindexes AS [colKeyword] ON [colKeyword].[CardID] = c.ID AND [colKeyword].[IndexType] = 1
LEFT JOIN #cardindexes AS [colDateAdded] ON [colDateAdded].[CardID] = c.ID AND [colDateAdded].[IndexType] = 2
WHERE [colKeyword].[StringVal] LIKE 'p%' AND c.[CardTypeID] = 1
ORDER BY [DateAdded]
)
SELECT * from preResult WHERE rn = 1