How to optimize m:n relation query on 3 tables - sql

this is my sql problem - there are 3 tables:
Names Lists ListHasNames
Id Name Id Desc ListsId NamesId
=-------- ------------ ----------------
1 Paul 1 Football 1 1
2 Joe 2 Basketball 1 2
3 Jenny 3 Ping Pong 2 1
4 Tina 4 Breakfast Club 2 3
5 Midnight Club 3 2
3 3
4 1
4 2
4 3
5 1
5 2
5 3
5 4
Which means that Paul (Id=1) and Joe (Id=2) are in the Football team (Lists.Id=1), Paul and Jenny in the Basketball team, etc...
Now I need a SQL statement which returns the Lists.Id of a specific Name combination:
In which lists are Paul, Joe and Jenny the only members of that list ? Answer only Lists.Id=4 (Breakfast Club) - but not 5 (Midnight Club) because Tina is in that list, too.
I've tried it with INNER JOINS and SUB QUERIES:
SELECT Q1.Lists_id FROM
(
SELECT Lists_Id FROM
names as T1,
listhasnames as T2
WHERE
(T1.Name='Paul') and
(T1.Id=T2.Names_ID) and
( (
SELECT count(*) FROM
listhasnames as Z1
where (Z1.lists_id = T2.lists_Id)
) = 3)
) AS Q1
INNER JOIN (
SELECT Lists_Id FROM
names as T1,
listhasnames as T2
WHERE
(T1.Name='Joe') and
(T1.Id=T2.Names_ID) and
(
(SELECT count(*) FROM
listhasnames as Z1
WHERE (Z1.Lists_id = T2.lists_id)
) = 3)
) AS Q2
ON (Q1.Lists_id=Q2.Lists_id)
INNER JOIN (
SELECT Lists_Id FROM
names as T1,
listhasnames as T2
WHERE
(T1.Name='Jenny') and
(T1.Id=T2.Names_ID) and
(
(SELECT count(*) FROM
listhasnames as Z1
WHERE (Z1.Lists_id = T2.lists_id)
) = 3)
) AS Q3
ON (Q1.Lists_id=Q3.Lists_id)
Looks a little bit complicated, uh? How to optimize that?
I need only that Lists.Id in which specific names are in (and only these names and nobody else). Maybe with SELECT IN?
Regards,
Dennis

SELECT ListsId
FROM ListHasNames a
WHERE NamesId in (1, 2, 3)
AND NOT EXISTS
(SELECT * from ListHasNames b
WHERE b.ListsId = a.ListsId
AND b.NamesId not in (1, 2, 3))
GROUP BY ListsId
HAVING COUNT(*) = 3;
Edit: Corrected thanks to Chris Gow's comment; the subselect is necessary to exclude lists that have other people on them.
Edit 2 Corrected the table name thanks to Dennis' comment

Using Carl Manaster's solution as a starting point I came up with:
SELECT listsid
FROM listhasnames
GROUP BY listsid HAVING COUNT(*) = 3
INTERSECT
SELECT x.listsid
FROM listhasnames x, names n
WHERE n.name IN('Paul', 'Joe', 'Jenny')
AND n.id = x.namesid

Updated:
select a.ListsId from
(
--lists with three names only
select lhn.ListsId, count(*) as count
from ListHasNames lhn
inner join Names n on lhn.NamesId = n.Id
group by lhn.ListsId
having count(*) = 3
) a
where a.ListsId in (select ListsId from ListHasNames lhn where NamesId = (select NamesId from names where Name = 'Paul'))
and a.ListsId in (select ListsId from ListHasNames lhn where NamesId = (select NamesId from names where Name = 'Joe'))
and a.ListsId in (select ListsId from ListHasNames lhn where NamesId = (select NamesId from names where Name = 'Jenny'))

I was just solving a problem recently that may work well for your case as well. It may be overkill.
I took the approach of creating a list of candidate associations that may be the correct solution, and then using a cursor or queue table to go through the likely correct solutions to do full validation.
In my case this was implemented by doing like
select
ParentId
count(*) as ChildCount
checksum_agg(checksum(child.*) as ChildAggCrc
from parent join child on parent.parentId = child.parentId
Then you can compare the count and aggregate checksum against your lookup data (i.e. your 3 names to check for). If no rows match, you are guaranteed to have no matches. If any row matches you can then go through and do a join of that specific ParentId to validate if there are any discrepancies between the row sets.
Clear as mud? :)

Related

How to update table from another table, with 2 columns

I am using SQL Server database and want a way to update MachinesSummary.ShareCount.
Here are my two tables
MachinesSummary
ID Machine1 Machine2 ShareCount
-------------------------------
1 A J NULL
2 K S NULL
3 A E NULL
4 J A NULL
5 Y U NULL
6 S W NULL
7 G A NULL
8 W S NULL
The other table is MachineDetails
ProcessNo Machine
------------------
1 A
1 H
1 W
2 A
2 J
2 W
3 Y
3 K
4 J
4 A
I want to update ShareCount in the MachineSummary table with the count of processes that both Machine1 and Machine2 share.
For record 1 in the MachineSummary table, I want the number of processes both share in MachineDetails which is 1 in this case
While for record 4 the ShareCount is 2
I tried this
UPDATE M
SET ShareCount = COUNT(DISTINCT X.ProcessNo)
FROM
(SELECT ProcessNo, ',' + STRING_AGG(Machine,',') + ',' Machines
FROM MachineDetails
GROUP BY ProcessNo) X
INNER JOIN MachinesSummary M ON X.Machines LIKE '%'+ M.Machine1 + '%'
AND X.Machines LIKE '%'+ M.Machine2 + '%'
But I wonder if there is an easier high performance way
The MachineDetails table has 250 million rows.
Well, I would use a self-join to get the number of combinations:
UPDATE M
SET ShareCount = num_processes
FROM MachinesSummary M JOIN
(SELECT md1.Machine as machine1, md2.Machine as machine2, COUNT(*) as num_processes
FROM MachineDetails md1 JOIN
MachinesDetails md2
ON md1.processno = md2.processno
GROUP BY md1.Machine, md2.Machine
) md
ON md.Machine1 = M.machine1 AND md.Machine2 = M.machine2;
I would use an updatable CTE here:
WITH cte AS (
SELECT Machine, COUNT(*) AS cnt
FROM MachineDetails
GROUP BY Machine
),
cte2 AS (
SELECT ShareCount, COALESCE(t1.cnt, 0) AS m1_cnt, COALESCE(t2.cnt, 0) AS m2_cnt
FROM MachineSummary ms
LEFT JOIN cte t1 ON t1.Machine1 = ms.Machine
LEFT JOIN cte t2 ON t2.Machine2 = ms.Machine
)
UPDATE cte2
SET ShareCount = m1_cnt + m2_cnt;
The logic of the first CTE involving the MachineDetails table is to get the counts for every machine. The second CTE joins this counts CTE to the MachineSummary table twice, once for each of machine 1 and 2. Then, we update this second CTE and assign the sum of counts.

select distinct from join

My Tables look like this
Table 1 Table 2
Users Options
id name id user_id option
------- --- -------- --------
1 Donald 1 1 access1
2 John 2 1 access2
3 Bruce 3 1 access3
4 Paul 4 2 access1
5 Ronald 5 2 access3
6 Steve 6 3 access1
Now, i want to select join these to find a user which has only access1
If i do something like
select t1.id,t1.name,t2.id,t2.user_id,t2.option
from table1 t1, table2 t2
where t1.id=t2.user_id
and option='access1';
This does not give me unique results, as in the example i need only user_id=3 my data has has these in hundreds
I also tried something like
select user_id from table2 where option='access1'
and user_id not in (select user_id from table2 where option<>'access1')
There have been other unsuccessful attempts too but i am stuck here
You can do this using a EXISTS subquery (technically, a left semijoin):
SELECT id, name
FROM table1
WHERE EXISTS(
SELECT * FROM table2
WHERE table1.id = table2.user_id
AND table2.option = 'access1'
)
If you want only users that have access1 and not any other access, add NOT EXISTS (a left anti-semi-join; there's a term to impress your colleagues!):
AND NOT EXISTS (
SELECT * FROM table2
WHERE table1.id = table2.user_id
AND table2.option <> 'access1'
)
bool_and makes it simple
with users (id,name) as ( values
(1,'donald'),
(2,'john'),
(3,'bruce'),
(4,'paul'),
(5,'ronald'),
(6,'steve')
), options (id,user_id,option) as ( values
(1,1,'access1'),
(2,1,'access2'),
(3,1,'access3'),
(4,2,'access1'),
(5,2,'access3'),
(6,3,'access1')
)
select u.id, u.name
from
users u
inner join
options o on o.user_id = u.id
group by 1, 2
having bool_and(o.option = 'access1')
;
id | name
----+-------
3 | bruce
If you want the user that has only access1, I would use aggregation:
select user_id
from table2
group by user_id
having min(option) = max(option) and min(option) = 'access1';
WITH users(id,name) AS ( VALUES
(1,'Donald'),
(2,'John'),
(3,'Bruce'),
(4,'Paul'),
(5,'Ronald'),
(6,'Steve')
), options(id,user_id,option) AS ( VALUES
(1,1,'access1'),
(2,1,'access2'),
(3,1,'access3'),
(4,2,'access1'),
(5,2,'access3'),
(6,3,'access1')
), user_access_count AS (
SELECT op.user_id,count(op.option) AS access_count
FROM options op
WHERE EXISTS(
SELECT 1 FROM options
WHERE option = 'access1'
)
GROUP BY op.user_id
)
SELECT u.id,u.name
FROM users u
INNER JOIN user_access_count uac ON uac.user_id = u.id
WHERE uac.access_count = 1;

How to exclude records with certain values in sql select

How do I only select the stores that don't have client 5?
StoreId ClientId
------- ---------
1 4
1 5
2 5
2 6
2 7
3 8
I'm trying something like this:
SELECT SC.StoreId FROM StoreClients
INNER JOIN StoreClients SC
ON StoreClients.StoreId = SC.StoreId
WHERE SC.ClientId = 5
GROUP BY StoreClients.StoreId
That seems to get me all the stores that have that client but I can't do the opposite because if I do <> 5 ill still get Store 1 and 2 which I don't want.
I'm basically trying to use this result in another query's EXISTS IN clause
One way:
SELECT DISTINCT sc.StoreId
FROM StoreClients sc
WHERE NOT EXISTS(
SELECT * FROM StoreClients sc2
WHERE sc2.StoreId = sc.StoreId AND sc2.ClientId = 5)
SELECT SC.StoreId
FROM StoreClients SC
WHERE SC.StoreId NOT IN (SELECT StoreId FROM StoreClients WHERE ClientId = 5)
In this way neither JOIN nor GROUP BY is necessary.
SELECT DISTINCT a.StoreID
FROM tableName a
LEFT JOIN tableName b
ON a.StoreID = b.StoreID AND b.ClientID = 5
WHERE b.StoreID IS NULL
SQLFiddle Demo
OUTPUT
╔═════════╗
║ STOREID ║
╠═════════╣
║ 3 ║
╚═════════╝
SELECT StoreId
FROM StoreClients
WHERE StoreId NOT IN (
SELECT StoreId
FROM StoreClients
Where ClientId=5
)
SQL Fiddle
You can use EXCEPT syntax, for example:
SELECT var FROM table1
EXCEPT
SELECT var FROM table2
<> will surely give you all values not equal to 5.
If you have more than one record in table it will give you all except 5.
If on the other hand you have only one, you will get surely one.
Give the table schema so that one can help you properly

A Simple but complex SQL query

I have a very simple MS SQL table with the following data(with column name and datatype):
TableId PersonName Attribute AttributeValue
(int) (nvarchar 50) (nvarchar 50) (bit)
----------- ----------------------- ------------------- --------------
1 A IsHuman 1
2 A CanSpeak 1
3 A CanSee 1
4 A CanWalk 0
5 B IsHuman 1
6 B CanSpeak 1
7 B CanSee 0
8 B CanWalk 0
9 C IsHuman 0
10 C CanSpeak 1
11 C CanSee 1
12 C CanWalk 0
Now, What I need as a result is the unique PersonName that have both Attribute IsHuman and CanSpeak with AttributeValue = 1.
The expected result should be (Must not include C as this one has IsHuman = 0)
PersonName
------------
A
B
Please can any expert help me in writting a SQL Query for this.
SELECT PersonName
FROM MyTable
WHERE AttributeName = 'IsHuman'
AND AttributeValue = 1
INTERSECT
SELECT PersonName
FROM MyTable
WHERE AttributeName = 'CanSpeak'
AND AttributeValue = 1;
Obviously this approach doesn't 'scale' if the criteria can vary. It could be that the relational operator you require is division, popularly known as "the supplier who supplies all parts", specifically division with remainder.
SELECT PersonName
FROM MyTable
WHERE (AttributeName = 'IsHuman' AND AttributeValue = 1) OR
(AttributeName = 'CanSpeak' AND AttributeValue = 1)
GROUP BY PersonName
HAVING COUNT(*) > 1
or
SELECT PersonName
FROM MyTable
WHERE AttributeValue = 1 AND AttributeName IN ('IsHuman', 'CanSpeak')
GROUP BY PersonName
HAVING COUNT(*) > 1
SELECT PersonName FROM MyTable
WHERE PersonName IN
(SELECT T1.PersonName FROM MyTable T1 WHERE T1.Attribute = 'IsHuman' and T1.AttributeValue='1')
AND (Attribute = 'CanSpeak' AND AttributeValue='1')
I think two inner joins may give you alright performance depending on indexing and table sizes.
SELECT t.PersonName FROM table t
INNER JOIN table t2 ON t.PersonName=t2.PersonName AND t3.Attribute = 'IsHuman' AND t2.AttributeValue = 1
INNER JOIN table t3 ON t2.PersonName=t3.PersonName AND t3.Attribute = 'CanSpeak' AND t3.AttributeValue = 1
or
SELECT t.PersonName FROM table t
INNER JOIN table t2 ON t.PersonName=t2.PersonName
INNER JOIN table t3 ON t2.PersonName=t3.PersonName
WHERE t2.Attribute = 'IsHuman' AND t2.AttributeValue = 1 AND t3.Attribute = 'CanSpeak' AND t3.AttributeValue = 1
This solution could be simplified significantly however should the properties IsHuman and CanSpeak were in separate tables with an linking ID table between them. Sounds like this table could possibly benefit from some normalization.
If you cant progress that, a view may assist in performance. I am at home without SQL installed so I cant verify any performance aspects.
select personname from yourtablename where personname in ('a','b') group by personname
I actually use this as a screening question for interviews. None of you people would get the job.
OK, maybe you would, but while the strategies you use might or might not work, they aren't generalizable and they miss a basic notion of relational algebra, to wit, aliasing.
The right answer (in the sense that it would make me more likely to employ you as well as the less importance senses that the RDMS's optimizer understands it and it can be extended to other, arbitrarily complex, cases) is
SELECT t1.PersonName
FROM MyTable t1, MyTable t2
WHERE t2.AttributeName = 'CanSpeak'
AND t2.AttributeValue = 1
AND t1.AttributeName = 'IsHuman'
AND t1.AttributeValue = 1
AND t1.PersonName = t2.PersonName;

Using (IN operator) OR condition in Where clause as AND condition

Please look at following image, I have explained my requirements in the image.
alt text http://img30.imageshack.us/img30/5668/shippment.png
I can't use here WHERE UsageTypeid IN(1,2,3,4) because this will behave as an OR condition and fetch all records.
I just want those records, of first table, which are attached with all 4 ShipmentToID .
All others which are attached with 3 or less ShipmentToIDs are not needed in result set.
Thanks.
if (EntityId, UsageTypeId) is unique:
select s.PrimaryKeyField, s.ShipmentId from shipment s, item a
where s.PrimaryKeyField = a.EntityId and a.UsageTypeId in (1,2,3,4)
group by s.PrimaryKeyField, s.ShipmentId having count(*) = 4
otherwise, 4-way join for the 4 fields,
select distinct s.* from shipment s, item a, item b, item c, item d where
s.PrimaryKeyField = a.EntityId = b.EntityId = c.EntityId = d.EntityId and
a.UsageTypeId = 1 and b.UsageTypeId = 2 and c.UsageTypeId = 3 and
d.UsageTypeId = 4
you'll want appropriate index on (EntityId, UsageTypeId) so it doesn't hang...
If there will never be duplicates of the UsageTypeId-EntityId combo in the 2nd table, so you'll never see:
EntityUsageTypeId | EntityId | UsageTypeId
22685 | 4477 | 1
22687 | 4477 | 1
You can count matching EntityIds in that table.
WHERE (count(*) in <tablename> WHERE EntityId = 4477) = 4
DECLARE #numShippingMethods int;
SELECT #numShippingMethods = COUNT(*)
FROM shippedToTable;
SELECT tbl1.shipmentID, COUNT(UsageTypeId) as Usages
FROM tbl2 JOIN tbl1 ON tbl2.EntityId = tbl1.EntityId
GROUP BY tbl1.EntityID
HAVING COUNT(UsageTypeId) = #numShippingMethods
This way is preferred to the multiple join against same table method, as you can simply modify the IN clause and the COUNT without needing to add or subtract more tables to the query when your list of IDs changes:
select EntityId, ShipmentId
from (
select EntityId
from (
select EntityId
from EntityUsage eu
where UsageTypeId in (1,2,3,4)
group by EntityId, UsageTypeId
) b
group by EntityId
having count(*) = 4
) a
inner join Shipment s on a.EntityId = s.EntityId