How do I do a WHERE NOT IN for Hierarchical data? - sql

I have a table that is a list of paths between points. I want to create a query to return a list with pointID and range(number of point) from a given point. But have spent a day trying to figure this out and haven't go any where, does any one know how this should be done? ( I am writing this for MS-SQL 2005)
example
-- fromPointID | toPointID |
---------------|-----------|
-- 1 | 2 |
-- 2 | 1 |
-- 1 | 3 |
-- 3 | 1 |
-- 2 | 3 |
-- 3 | 2 |
-- 4 | 2 |
-- 2 | 4 |
with PointRanges ([fromPointID], [toPointID], [Range])
AS
(
-- anchor member
SELECT [fromPointID],
[toPointID],
0 AS [Range]
FROM dbo.[Paths]
WHERE [toPointID] = 1
UNION ALL
-- recursive members
SELECT P.[fromPointID],
P.[toPointID],
[Range] + 1 AS [Range]
FROM dbo.[Paths] AS P
INNER JOIN PointRanges AS PR ON PR.[toPointID] = P.[fromPointID]
WHERE [Range] < 5 -- This part is just added to limit the size of the table being returned
--WHERE P.[fromPointID] NOT IN (SELECT [toPointID] FROM PointRanges)
--Cant do the where statment I want to because it wont allow recurssion in the sub query
)
SELECT * FROM PointRanges
--Want this returned
-- PointID | Range |
-----------|-------|
-- 1 | 0 |
-- 2 | 1 |
-- 3 | 1 |
-- 4 | 2 |

Markus Jarderot's link gives a good answer for this. I end tried using his answer and also tried rewriting my problem in C# and linq but it ended up being more of a mathematical problem than a coding problem because I had a table with several thousands of point that interlinked. This is still something I am interested in and trying to get a better understanding of by reading books on mathematics and graph theory but if anyone else runs into this problem I think Markus Jarderot's link is the best answer you will find.

Related

How do I Transform / Pivot in Access SQL but without aggregating?

Firstly, thank you to anyone that can help, I hope this is a simple question for those in the know.
I have Data which is of the form:
LeaseID | ChargeID
1 | 1
1 | 2
2 | 3
3 | 4
3 | 5
3 | 6
i.e. LeaseID 1 has 2 ChargeIDs
How can I query this in Access SQL so that the data will be reflected as
LeaseID | ChargeID | ChargeID | ChargeID
1 | 1 | 2
2 | 3
3 | 4 | 5 | 6
I know I am limited to 255 columns but this is not a problem as there will never be 255 but the number of columns should increase with the maximum number of ChargeIDs on a given lease.
I believe it is something to do with Transform / Pivot but have been unable to get it working. I keep getting the "too many crosstabs error"
Thanks,
Consider a two-step process involving a staging table:
Make-Table Query (using correlated subquery with slow performance on very large tables)
SELECT t.LeaseID, t.ChargeID, 'ChargeID' & (SELECT count(*) FROM LeaseCharge sub
WHERE sub.LeaseID = t.LeaseID
AND sub.ChargeID <= t.ChargeID) As Rank
INTO myStagingTable
FROM myTable t;
Cross-Tab Query
TRANSFORM MAX(s.ChargeID) As MaxChargeID
SELECT s.LeaseID
FROM myStagingTable s
GROUP BY s.LeaseID
PIVOT s.[Rank]
-- LeaseID ChargeID1 ChargeID2 ChargeID3
-- 1 1 2
-- 2 3
-- 3 4 5 6

Deleting rows from a table when i pass a string

Hello everyone i want to create a procedure that receives a int and a string with the ID's when they are like this:
Int CompeteID = 1
String TeamIDs = "1,8,9"
Meaning there are 3 TeamIDs, TeamID = 1, TeamID = 8 and TeamID = 9.
Here is my DBModel: https://i.gyazo.com/7920cca8000436cfe207353aaa7d172f.png
So what i want to do is to Insert on TeamCompete, the SportID and CompeteID when there are no equal SportID and CompeteID.
Like this:
TeamCompeteID TeamID CompeteID
1 1 1
4 8 1
5 9 1
6 8 1 <---- Can't do this
But i also want to delete from TeamCompete the TeamIDs i dont pass onto the procedure for example:
TeamCompeteID TeamID CompeteID
1 1 1
2 3 1 <---- Delete this
3 4 1 <---- Delete this
But I don't want to delete the TeamCompete's that are on the Event table...
Example:
EventID TeamCompeteID
5 3 <---- Can't delete TeamCompeteID 3
-- even though i didn't past it on TeamIDs
I hope you understanded my explanation.
First and foremost, passing data in this manner is very much a bad idea. You should really try to normalise your data and have one value in each field.
If however, you cannot avoid this, I would recommend you use a string splitting function such as Jeff Moden's if you are on a SQL Server version prior to 2016 (the string_split is built into SQL Server 2016) which returns a table you can join to from a given string and delimiter.
Using Jeff's function above:
select *
from dbo.DelimitedSplit8K('1,8,9',',')
Results in:
+------------+------+
| ItemNumber | Item |
+------------+------+
| 1 | 1 |
| 2 | 8 |
| 3 | 9 |
+------------+------+
So to use with your table:
declare #t table(CompeteID int,TeamID nvarchar(10));
insert into #t values(1,'1,8,9');
select t.CompeteID
,s.Item
from #t t
cross apply dbo.DelimitedSplit8K(t.TeamID,',') s
Results in:
+-----------+------+
| CompeteID | Item |
+-----------+------+
| 1 | 1 |
| 1 | 8 |
| 1 | 9 |
+-----------+------+
You can then use this table in the rest of your update/insert/delete logic.

Find spectators that have seen the same shows (match multiple rows for each)

For an assignment I have to write several SQL queries for a database stored in a PostgreSQL server running PostgreSQL 9.3.0. However, I find myself blocked with last query. The database models a reservation system for an opera house. The query is about associating the a spectator the other spectators that assist to the same events every time.
The model looks like this:
Reservations table
id_res | create_date | tickets_presented | id_show | id_spectator | price | category
-------+---------------------+---------------------+---------+--------------+-------+----------
1 | 2015-08-05 17:45:03 | | 1 | 1 | 195 | 1
2 | 2014-03-15 14:51:08 | 2014-11-30 14:17:00 | 11 | 1 | 150 | 2
Spectators table
id_spectator | last_name | first_name | email | create_time | age
---------------+------------+------------+----------------------------------------+---------------------+-----
1 | gonzalez | colin | colin.gonzalez#gmail.com | 2014-03-15 14:21:30 | 22
2 | bequet | camille | bequet.camille#gmail.com | 2014-12-10 15:22:31 | 22
Shows table
id_show | name | kind | presentation_date | start_time | end_time | id_season | capacity_cat1 | capacity_cat2 | capacity_cat3 | price_cat1 | price_cat2 | price_cat3
---------+------------------------+--------+-------------------+------------+----------+-----------+---------------+---------------+---------------+------------+------------+------------
1 | madama butterfly | opera | 2015-09-05 | 19:30:00 | 21:30:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
2 | don giovanni | opera | 2015-09-12 | 19:30:00 | 21:45:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
So far I've started by writing a query to get the id of the spectator and the date of the show he's attending to, the query looks like this.
SELECT Reservations.id_spectator, Shows.presentation_date
FROM Reservations
LEFT JOIN Shows ON Reservations.id_show = Shows.id_show;
Could someone help me understand better the problem and hint me towards finding a solution. Thanks in advance.
So the result I'm expecting should be something like this
id_spectator | other_id_spectators
-------------+--------------------
1| 2,3
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
Note based on comments: Wanted to make clear that this answer may be of limited use as it was answered in the context of SQL-Server (tag was present at the time)
There is probably a better way to do it, but you could do it with the 'stuff 'function. The only drawback here is that, since your ids are ints, placing a comma between values will involve a work around (would need to be a string). Below is the method I can think of using a work around.
SELECT [id_spectator], [id_show]
, STUFF((SELECT ',' + CAST(A.[id_spectator] as NVARCHAR(10))
FROM reservations A
Where A.[id_show]=B.[id_show] AND a.[id_spectator] != b.[id_spectator] FOR XML PATH('')),1,1,'') As [other_id_spectators]
From reservations B
Group By [id_spectator], [id_show]
This will show you all other spectators that attended the same shows.
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
In other words, you want a list of ...
all spectators that have seen all the shows that a given spectator has seen (and possibly more than the given one)
This is a special case of relational division. We have assembled an arsenal of basic techniques here:
How to filter SQL results in a has-many-through relation
It is special because the list of shows each spectator has to have attended is dynamically determined by the given prime spectator.
Assuming that (d_spectator, id_show) is unique in reservations, which has not been clarified.
A UNIQUE constraint on those two columns (in that order) also provides the most important index.
For best performance in query 2 and 3 below also create an index with leading id_show.
1. Brute force
The primitive approach would be to form a sorted array of shows the given user has seen and compare the same array of others:
SELECT 1 AS id_spectator, array_agg(sub.id_spectator) AS id_other_spectators
FROM (
SELECT id_spectator
FROM reservations r
WHERE id_spectator <> 1
GROUP BY 1
HAVING array_agg(id_show ORDER BY id_show)
#> (SELECT array_agg(id_show ORDER BY id_show)
FROM reservations
WHERE id_spectator = 1)
) sub;
But this is potentially very expensive for big tables. The whole table hast to be processes, and in a rather expensive way, too.
2. Smarter
Use a CTE to determine relevant shows, then only consider those
WITH shows AS ( -- all shows of id 1; 1 row per show
SELECT id_spectator, id_show
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
)
SELECT sub.id_spectator, array_agg(sub.other) AS id_other_spectators
FROM (
SELECT s.id_spectator, r.id_spectator AS other
FROM shows s
JOIN reservations r USING (id_show)
WHERE r.id_spectator <> s.id_spectator
GROUP BY 1,2
HAVING count(*) = (SELECT count(*) FROM shows)
) sub
GROUP BY 1;
#> is the "contains2 operator for arrays - so we get all spectators that have at least seen the same shows.
Faster than 1. because only relevant shows are considered.
3. Real smart
To also exclude spectators that are not going to qualify early from the query, use a recursive CTE:
WITH RECURSIVE shows AS ( -- produces exactly 1 row
SELECT id_spectator, array_agg(id_show) AS shows, count(*) AS ct
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
GROUP BY 1
)
, cte AS (
SELECT r.id_spectator, 1 AS idx
FROM shows s
JOIN reservations r ON r.id_show = s.shows[1]
WHERE r.id_spectator <> s.id_spectator
UNION ALL
SELECT r.id_spectator, idx + 1
FROM cte c
JOIN reservations r USING (id_spectator)
JOIN shows s ON s.shows[c.idx + 1] = r.id_show
)
SELECT s.id_spectator, array_agg(c.id_spectator) AS id_other_spectators
FROM shows s
JOIN cte c ON c.idx = s.ct -- has an entry for every show
GROUP BY 1;
Note that the first CTE is non-recursive. Only the second part is recursive (iterative really).
This should be fastest for small selections from big tables. Row that don't qualify are excluded early. the two indices I mentioned are essential.
SQL Fiddle demonstrating all three.
It sounds like you have one half of the total question--determining which id_shows a particular id_spectator attended.
What you want to ask yourself is how you can determine which id_spectators attended an id_show, given an id_show. Once you have that, combine the two answers to get the full result.
So the final answer I got, looks like this :
SELECT id_spectator, id_show,(
SELECT string_agg(to_char(A.id_spectator, '999'), ',')
FROM Reservations A
WHERE A.id_show=B.id_show
) AS other_id_spectators
FROM Reservations B
GROUP By id_spectator, id_show
ORDER BY id_spectator ASC;
Which prints something like this:
id_spectator | id_show | other_id_spectators
-------------+---------+---------------------
1 | 1 | 1, 2, 9
1 | 14 | 1, 2
Which suits my needs, however if you have any improvements to offer, please share :) Thanks again everybody!

SQL Server: Select hierarchically related items from one table

Say, I have an organizational structure that is 5 levels deep:
CEO -> DeptHead -> Supervisor -> Foreman -> Worker
The hierarchy is stored in a table Position like this:
PositionId | PositionCode | ManagerId
1 | CEO | NULL
2 | DEPT01 | 1
3 | DEPT02 | 1
4 | SPRV01 | 2
5 | SPRV02 | 2
6 | SPRV03 | 3
7 | SPRV04 | 3
... | ... | ...
PositionId is uniqueidentifier. ManagerId is the ID of employee's manager, referring PositionId from the same table.
I need a SQL query to get the hierarchy tree going down from a position, provided as parameter, including the position itself. I managed to develop this:
-- Select the original position itself
SELECT
'Rank' = 0,
Position.PositionCode
FROM Position
WHERE Position.PositionCode = 'CEO' -- Parameter
-- Select the subordinates
UNION
SELECT DISTINCT
'Rank' =
CASE WHEN Pos2.PositionCode IS NULL THEN 0 ELSE 1+
CASE WHEN Pos3.PositionCode IS NULL THEN 0 ELSE 1+
CASE WHEN Pos4.PositionCode IS NULL THEN 0 ELSE 1+
CASE WHEN Pos5.PositionCode IS NULL THEN 0 ELSE 1
END
END
END
END,
'PositionCode' = RTRIM(ISNULL(Pos5.PositionCode, ISNULL(Pos4.PositionCode, ISNULL(Pos3.PositionCode, Pos2.PositionCode)))),
FROM Position Pos1
LEFT JOIN Position Pos2
ON Pos1.PositionId = Pos2.ManagerId
LEFT JOIN Position Pos3
ON Pos2.PositionId = Pos3.ManagerId
LEFT JOIN Position Pos4
ON Pos3.PositionId = Pos4.ManagerId
LEFT JOIN Position Pos5
ON Pos4.PositionId = Pos5.ManagerId
WHERE Pos1.PositionCode = 'CEO' -- Parameter
ORDER BY Rank ASC
It works not only for 'CEO' but for any position, displaying its subordinates. Which gives me the following output:
Rank | PositionCode
0 | CEO
... | ...
2 | SPRV55
2 | SPRV68
... | ...
3 | FRMN10
3 | FRMN12
... | ...
4 | WRKR01
4 | WRKR02
4 | WRKR03
4 | WRKR04
My problems are:
The output does not include intermediate nodes - it will only output end nodes, i.e. workers and intermediate managers which have no subordinates. I need all intermediate managers as well.
I have to manually UNION the row with original position on top of the output. I there any more elegant way to do this?
I want the output to be sorted in hieararchical tree order. Not all DeptHeads, then all Supervisors, then all Foremen then all workers, but like this:
Rank | PositionCode
0 | CEO
1 | DEPT01
2 | SPRV01
3 | FRMN01
4 | WRKR01
4 | WRKR02
... | ...
3 | FRMN02
4 | WRKR03
4 | WRKR04
... | ...
Any help would be greatly appreciated.
Try a recursive CTE, the example on TechNet is almost identical to your problem I believe:
http://technet.microsoft.com/en-us/library/ms186243(v=sql.105).aspx
Thx, everyone suggesting CTE. I got the following code and it's working okay:
WITH HierarchyTree (PositionId, PositionCode, Rank)
AS
(
-- Anchor member definition
SELECT PositionId, PositionCode,
0 AS Rank
FROM Position AS e
WHERE PositionCode = 'CEO'
UNION ALL
-- Recursive member definition
SELECT e.PositionId, e.PositionCode,
Rank + 1
FROM Position AS e
INNER JOIN HierarchyTree AS d
ON e.ManagerId = d.PositionId
)
SELECT Rank, PositionCode
FROM HierarchyTree
GO
I had a similar problem to yours on a recent project but with a variable recursion length - typically between 1 and 10 levels.
I wanted to simplify the SQL side of things so I put some extra work into the logic of storing the recursive elements by storing a "hierarchical path" in addition to the direct manager Id.
So a very contrived example:
Employee
Id | JobDescription | Hierarchy | ManagerId
1 | DIRECTOR | 1\ | NULL
2 | MANAGER 1 | 1\2\ | 1
3 | MANAGER 2 | 1\3\ | 1
4 | SUPERVISOR 1 | 1\2\4 | 2
5 | SUPERVISOR 2 | 1\3\5 | 3
6 | EMPLOYEE 1 | 1\2\4\6 | 4
7 | EMPLOYEE 2 | 1\3\5\7 | 5
This means you have the power to very quickly query any level of the tree and get all descendants by using a LIKE query on the Hierarchy column
For example
SELECT * FROM dbo.Employee WHERE Hierarchy LIKE '\1\2\%'
would return
MANAGER 1
SUPERVISOR 1
EMPLOYEE 1
Additionally you can also easily get one level of the tree by using the ManagerId column.
The downside to this approach is you have to construct the hierarchy when inserting or updating records but believe me when I say this storage structure saved me a lot of pain later on without the need for unnecessary query complexity.
One thing to note is that my approach gives you the raw data - I then parse the result set into a recursive strongly typed structure in my services layer. As a rule I don't tend to format output in SQL.

Finding the difference between two sets of data from the same table

My data looks like:
run | line | checksum | group
-----------------------------
1 | 3 | 123 | 1
1 | 7 | 123 | 1
1 | 4 | 123 | 2
1 | 5 | 124 | 2
2 | 3 | 123 | 1
2 | 7 | 123 | 1
2 | 4 | 124 | 2
2 | 4 | 124 | 2
and I need a query that returns me the new entries in run 2
run | line | checksum | group
-----------------------------
2 | 4 | 124 | 2
2 | 4 | 124 | 2
I tried several things, but I never got to a satisfying answer.
In this case I'm using H2, but of course I'm interested in a general explanation that would help me to wrap my head around the concept.
EDIT:
OK, it's my first post here so please forgive if I didn't state the question precisely enough.
Basically given two run values (r1, r2, with r2 > r1) I want to determine which rows having row = r2 have a different line, checksum or group from any row where row = r1.
select * from yourtable
where run = 2 and checksum = (select max(checksum)
from yourtable)
Assuming your last run will have the higher run value than others, below SQL will help
select * from table1 t1
where t1.run in
(select max(t2.run) table1 t2)
Update:
Above SQL may not give you the right rows because your requirement is not so clear. But the overall idea is to fetch the rows based on the latest run parameters.
SELECT line, checksum, group
FROM TableX
WHERE run = 2
EXCEPT
SELECT line, checksum, group
FROM TableX
WHERE run = 1
or (with slightly different result):
SELECT *
FROM TableX x
WHERE run = 2
AND NOT EXISTS
( SELECT *
FROM TableX x2
WHERE run = 1
AND x2.line = x.line
AND x2.checksum = x.checksum
AND x2.group = x.group
)
A slightly different approach:
select min(run) run, line, checksum, group
from mytable
where run in (1,2)
group by line, checksum, group
having count(*)=1 and min(run)=2
Incidentally, I assume that the "group" column in your table isn't actually called group - this is a reserved word in SQL and would need to be enclosed in double quotes (or backticks or square brackets, depending on which RDBMS you are using).