how to calculate and store "groupings" in sql - sql

This is a SQL conceptual question.
Start with "Table 1" with a large number of records and a primary key.
Add a cross reference table called "Table 2" which holds key pairs from Table 1. Each key pair means that two records should be in the same group.
How do you quickly calculate those groups assuming a large number of records?
Example:
Table1
ID other data
-- ----------
A ...
B ...
C ...
D ...
E ...
F ...
Table 2
ID1 ID2
A B aka: A is equivalent to B. not a parent/child relationship
B C
D E
Final Result
ID Group
-- -----
A 1 A, B, & C are in a group
B 1
C 1
D 2 D & E are in a group
E 2
F 3 F is in a group by itself
Keep in mind that there are a large number of records. Fast processing is desirable. I'm not looking for someone to create something from scratch, but tell me if there is already a established technique for doing this sort of thing. I've already written something myself, but it seems overly complex.
Note: edited for clarification with regard to answer by Paul. Table 2 is not a parent / child relationship. Its a relationship of equivalence.

If the data in table 2 can be considered as 'parent-child' relationships (with id1 as the 'parent' and id2 as the 'child') then the result that you want can be achieved using the T-SQL below. The key assumption being made here is that ids that don't appear in the 'child' column (id2) in table2 can be used as the root elements in a group.
with groups(parent, child) as
(
select t1.id as parent, t1.id as child
from dbo.Table1 as t1
where not exists
(
select 1
from dbo.Table2 as t2
where t2.id2 = t1.id
)
union all
select g.parent, t2.id2
from dbo.table2 as t2
inner join groups as g
on g.child = t2.id1
)
select g.child as id, DENSE_RANK() over (order by g.parent) as grp
from groups as g
order by g.child

This may or may not be of any use to you, but you seem to have modeled yourself into a tricky situation. Which means, of course, that you can model your way out of it. Here are a couple of suggestions.
Since if A=B and B=C then A=C, you could enter the data in Table2 as follows. This has the advantage of leaving the structure of Table2 as it is, but it still leads to a moderately tricky query. And it severely complicates some actions such as moving A to a different group.
ID1 ID2
A A
A B
A C
D D
D E
F F
Or, if you don't mind making a slight change to Table2, the data could look like this.
GROUP ID
1 A
1 B
1 C
2 D
2 E
3 F
The advantage of this design, in case you haven't noticed already, is that it is amazingly similar to the output you wanted in the first place -- making for a correspondingly simple query. But if you work it out, you will see that the code to maintain this table will also be simple. You can easily insert a new ID as either a member of group 3 or as member of the group containing F. You can move any ID from one group to another, merge groups, split groups or even enter an ID into multiple groups (if that is allowed).
Good modeling can eliminate a lot of atrocious code.

Related

Select rows from table where a certain value in a joined table does not exist

I have two tables, playgrounds and maintenance, which are linked with a foreign key. Whenever there is a maintenance on a playground, it will be saved in the table and connected to the respective playground.
Table A (playgrounds):
playground_number
Table B (maintenance):
playground_number (foreign key),
maintenance_type (3 different types),
date
What I now want is to retrieve all the playgrounds on which a certain type of maintenance has NOT been performed yet IN a certain year. For instance all playgrounds that do not have a maintenance_type = 1 in the year 2022 connected yet, although there could be multiple other maintenance_types because they are more frequent.
This is what I have tried (pseudo):
SELECT DISTINCT A.playground_number
FROM table A
JOIN table B ON A.playground_number = B.playground_number (FK)
WHERE NOT EXISTS (SELECT B.maintenance_type FROM table B
WHERE B.maintenance_type = 1 AND year(B.date) = 2022
However this will return nothing as soon as there is only one entry with maintenance_type 1 within the table.
I am struggling with this query for a while, so would appreciate some thoughts :) Many thanks.
You need to correlate the exists subquery to the outer B table. Also, you don't even need the join.
SELECT DISTINCT a.playground_number
FROM table_a a
WHERE NOT EXISTS (
SELECT 1
FROM table_b b
WHERE b.playground_number = a.playground_number AND
b.maintenance_type = 1 AND
YEAR(b.date) = 2022
);
Please consider this. I don't think you need JOIN.
SELECT DISTINCT A.playground_number
FROM table A
WHERE A.playground_number NOT IN (SELECT B.playground_number FROM table B
WHERE B.maintenance_type = 1 AND year(B.date) = 2022)
Please let me know if I understand it incorrectly.

Get the list for Super and sub types

Am Having the Tables in SQL server as Super and Sub types like below. Now if i have to get list of Furnitures then how can i get the list?
Furniture table:
Id FurnituretypeId NoofLegs
-------------------------------
1 1 4
2 2 4
FurnitureType table:
Id Name
-----------------
1 chair
2 cot
3 table
Chair Table:
Id Name CansSwing CanDetachable FurnitureId
------------------------------------------------------------
1 Chair1 Y Y 1
Cot Table:
Id Name CotType Storage StorageType FurnitureId
-------------------------------------------------------------------
1 Cot1 Auto Y Drawer 2
How can i get the entire furniture list as some of them are chair and some of them are cot. How can i join the these tables with furniture table and get the list?
Hmmm . . . union all and join?
select cc.*, f.*
from ((select Id, Name, CansSwing, CanDetachable,
NULL as CotType, NULL as Storage, NULL as StorageType, FurnitureId
from chairs
) union all
(select Id, Name, NULL as CansSwing, NULL as CanDetachable,
CotType, Storage, StorageType, FurnitureId
from cots
)
) cc join
furniture f
on cc.furnitureid = f.id;
This is a classical learning problem, that's why I'm not giving you the code to solve this but all the insights you need to do so.
You have multiple approaches possible, but I'm describing two simple ones:
1) Use the UNION statement to join two separate queries one for Chair and the other for Cot, bare in mind that both SELECT have to return the same structure.
SELECT
a1,
a2,
etc..
FROM table1 a1
JOIN table2 a2 ON a1.some = a2.some
UNION
SELECT
a1,
a3,
etc..
FROM table1 a1
JOIN table3 a3 ON a1.some = a3.some
2) You can do it all in one SELECT statement using a LEFT JOIN to both tables and and in the select using COALESCE or ISNULL to get the values for one table or the other. In the WHERE condition you have to force one or the other join to be not null.
SELECT
a1,
COALESCE(a2,a3) as col2
FROM table1
LEFT JOIN table2 a2 ON a1.some = a2.some
LEFT JOIN table3 a3 ON a1.some = a3.some
WHERE
a2.some IS NOT NULL
OR a3.some IS NOT NULL
Mapping objects into relational models takes a degree of understanding of what is possible vs. what is wise in an RDBMS. Object oriented database systems tried to go after problems like this (generally without much success) precisely because the problem statement is arguably not the right one.
Please consider just putting all of these in one table. Then use null for the fields that don't really matter for each sub-type. You will likely end up being a lot happier in the end since you can spend less time at runtime doing joins and instead just query the information you need and use indexing on the same table to find the fasted path for each query you want to run.
SELECT * FROM CombinedTable;

compare user supplied list against table, with nulls for non-matching rows

Given a list of user-supplied numbers, which I'll refer to as myList, I want to find out which ones have a match against table MasterList, and which ones are null (no match)
so, given db contents
MasterList
----------
ID Number
1 3333333
2 4444444
3 5555555
If myList is ['1111111','2222222','3333333','4444444']
I want the following output:
1111111, null
2222222, null
3333333, 1
4444444, 2
Ideas I've tried:
This, of course, yields only the ones that match.
select Number, ID
from MasterList
where Number in('1111111','2222222','3333333','4444444')
My next idea is no more helpful:
select temp.Number, master.Number
from MasterList master
left join MasterList temp
on master.id=temp.id
and temp.Number in('1111111','2222222','3333333','4444444')
If the list were itself a table temp, it would be trivial to get the desired output:
select temp.number, master.id
from temp -- (ie, the list, where temp.number is the given list)
left join master on master.number=temp.number
-- optional where
where temp.number in('1111111','2222222','3333333','4444444')
This idea never materialized:
select temp.number, master.id
from (select '1111111','2222222','3333333','4444444') temp
left join master on master.number on....
Can this be done without a temporary table?
If not, how do you make a temporary table in DB2? (IBM documentation is helpful if you already know how to do it...)
You want an outer join, here a left outer join (we want all the rows from the left and any rows on the right that match the join condition), as you rightly say in the question. Here I'm using a Common Table Expression (CTE) to basically create a temp table on the fly.
WITH inlist (num) as ( VALUES
(111111),
(222222),
(333333),
(444444) )
SELECT num, id
FROM inlist LEFT OUTER JOIN masterlist
ON masterlist.number = inlist.num
ORDER BY num;
That yields:
NUM ID
----------- -----------
111111 -
222222 -
333333 1
444444 2
4 record(s) selected.
I'm not super-familiar with DB2 (haven't written SQL for that in at least 15 years, and not much back then), so I don't know how much you'll need to edit this to make it work, but I think this will do what you want (Edited SQL to use VALUES clause):
SELECT
my.Number1,
CASE WHEN ml.Number1 IS NULL
THEN NULL
ELSE ROW_NUMBER() OVER (ORDER BY my.Number1) END AS Indicator
FROM
(VALUES ('1111111'),('2222222'),('3333333'),('4444444'))
AS my(Number1)
LEFT JOIN
MasterList AS ml
ON
my.Number1 = ml.Number1;

SQL Query - Ensure a row exists for each value in ()

Currently struggling with finding a way to validate 2 tables (efficiently lots of rows for Table A)
I have two tables
Table A
ID
A
B
C
Table matched
ID Number
A 1
A 2
A 9
B 1
B 9
C 2
I am trying to write a SQL Server query that basically checks to make sure for every value in Table A there exists a row for a variable set of values ( 1, 2,9)
The example above is incorrect because t should have for every record in A a corresponding record in Table matched for each value (1,2,9). The end goal is:
Table matched
ID Number
A 1
A 2
A 9
B 1
B 2
B 9
C 1
C 2
C 9
I know its confusing, but in general for every X in ( some set ) there should be a corresponding record in Table matched. I have obviously simplified things.
Please let me know if you all need clarification.
Use:
SELECT a.id
FROM TABLE_A a
JOIN TABLE_B b ON b.id = a.id
WHERE b.number IN (1, 2, 9)
GROUP BY a.id
HAVING COUNT(DISTINCT b.number) = 3
The DISTINCT in the COUNT ensures that duplicates (IE: A having two records in TABLE_B with the value "2") from being falsely considered a correct record. It can be omitted if the number column either has a unique or primary key constraint on it.
The HAVING COUNT(...) must equal the number of values provided in the IN clause.
Create a temp table of values you want. You can do this dynamically if the values 1, 2 and 9 are in some table you can query from.
Then, SELECT FROM tempTable WHERE NOT IN (SELECT * FROM TableMatched)
I had this situation one time. My solution was as follows.
In addition to TableA and TableMatched, there was a table that defined the rows that should exist in TableMatched for each row in TableA. Let’s call it TableMatchedDomain.
The application then accessed TableMatched through a view that controlled the returned rows, like this:
create view TableMatchedView
select a.ID,
d.Number,
m.OtherValues
from TableA a
join TableMatchedDomain d
left join TableMatched m on m.ID = a.ID and m.Number = d.Number
This way, the rows returned were always correct. If there were missing rows from TableMatched, then the Numbers were still returned but with OtherValues as null. If there were extra values in TableMatched, then they were not returned at all, as though they didn't exist. By changing the rows in TableMatchedDomain, this behavior could be controlled very easily. If a value were removed TableMatchedDomain, then it would disappear from the view. If it were added back again in the future, then the corresponding OtherValues would appear again as they were before.
The reason I designed it this way was that I felt that establishing an invarient on the row configuration in TableMatched was too brittle and, even worse, introduced redundancy. So I removed the restriction from groups of rows (in TableMatched) and instead made the entire contents of another table (TableMatchedDomain) define the correct form of the data.

how to query with child relations to same table and order this correctly

Take this table:
id name sub_id
---------------------------
1 A (null)
2 B (null)
3 A2 1
4 A3 1
The sub_id column is a relation to his own table, to column ID.
subid --- 0:1 --- id
Now I have the problem to make a correctly SELECT query to show that the child rows (which sub_id is not null) directly selected under his parent row. So this must be a correctly order:
1 A (null)
3 A2 1
4 A3 1
2 B (null)
A normal SELECT order the id. But how or which keyword help me to order this correctly?
JOIN isn't possible I think because I want to get all the rows separated. Because the rows will be displayed on a Gridview (ASP.Net) with EntityDataSource but the child rows must be displayed directly under his parent.
Thank you.
Look at Managing Hierarchical Data in MySQL.
Since recursion is an expensive operation because basicly you're firing multiple queries to your database you could consider using the Nested Set Model. In short you're assigning numbers to ranges in your table. It's a long article but it worth reading it. I've used it during my internship as a solution not to have 1000+ queries, But bring it down to 1 query.
Your handling 'overhead' now lies at the point of updating the table by adding, updating or deleting records. Since you then have to update all the records with a bigger 'right-value'. But when you're retrieving the data, it all goes with 1 query :)
select * from table1 order by name, sub_id will in this case return your desired result but only because the parents names and the child name are similar. If you're using SQL 2005 a recursive CTE will work:
WITH recurse (id, Name, childID, Depth)
AS
(
SELECT id, Name, ISNULL(childID, id) as id, 0 AS Depth
FROM table1 where childid is null
UNION ALL
SELECT table1.id, table1.Name, table1.childID, recurse.Depth + 1 AS Depth FROM table1
JOIN recurse ON table1.childid = recurse.id
)
SELECT * FROM recurse order by childid, depth
SELECT
*
FROM
table
ORDER BY
COALESCE(id,sub_id), id
btw, this will work only for one level.. any thing more than that requires recursive/cte function