Need to automate sorting while accounting for overlapping values in SQL - sql

I'm facing a challenging request that's had me beating my head against the keyboard. I need to implement a script which will sort and summarize a dataset while accounting for overlapping values which are associated with different identifiers. The table from which I am selecting contains the following columns:
BoxNumber (Need to group by this value, which serves as the identifier)
ProdBeg (Contains the first 'page number' for the document/record)
ProdEnd (Contains the last 'page number' for the document/record)
DateProduced (Date the document was produced)
ArtifactID (Unique identifier for each document)
NumPages (Contains the number of pages associated with each document)
Selecting a sample of the data with no conditions resembles the following (sorry for lousy formatting):
BoxNumber | ProdBeg | ProdEnd | DateProduced | ArtifactID | NumPages
1200 | ABC01 | ABC10 | 12/4/2013 | 1564589 | 10
1201 | ABC11 | ABC20 | 12/4/2013 | 1498658 | 10
1200 | ABC21 | ABC30 | 12/4/2013 | 1648596 | 10
1200 | ABC31 | ABC40 | 12/4/2013 | 1489535 | 10
Using something like the following effectively groups and sorts the data by box number while accounting for different DateProduced dates, but does not account for overlapping ProdBeg/ProdEnd values between different BoxNumbers:
SELECT BoxNumber, MIN(ProdBeg) AS 'ProdBeg', MAX(ProdEnd) AS 'ProdEnd', DateProduced, COUNT(ArtifactID) AS 'Documents', SUM(NumPages) AS 'Pages'
FROM MyTable
GROUP BY BoxNumber, DateProduced
ORDER BY ProdBeg, ProdEnd
This yields:
BoxNumber | ProdBeg | ProdEnd| DateProduced | Documents| Pages
1200 | ABC01 | ABC40 | 12/4/2013 | 3 | 30
1201 | ABC11 | ABC20 | 12/4/2013 | 1 | 10
Here, it becomes apparent that the ProdBeg/ProdEnd values for box 1201 overlap those for box 1200. No variation on the script above will work, as it will inherently ignore any overlaps and only select the min/max. We need something which will produce the following result:
BoxNumber | ProdBeg | ProdEnd | DateProduced | Documents| Pages
1200 | ABC01 | ABC10 | 12/4/2013 | 1 | 10
1201 | ABC11 | ABC20 | 12/4/2013 | 1 | 10
1200 | ABC21 | ABC40 | 12/4/2013 | 2 | 20
I'm just not sure how we can group by box number without showing only distinct values (which can result in overlaps for ProdBeg/ProdEnd). Any suggestions would be greatly appreciated! The environment version is SQL 2008 R2 (SP1).

Yuch. This would be helped at least if you had lead()/lag() as in SQL Server 2012. But it is doable.
The idea is the following:
Add a variable that is the number part of the code (the last two digits).
Calculate the next number in the sequence.
Calculate a flag if there is a gap to the next number. This is the start of a "group".
Calculate the cumulative sum of the "start of a group" flag. This is a group id.
Do the aggregation.
The following query follows this logic. I didn't include the date produced. This seems redundant with the number, unless a box can appear on multiple days. (Adding the date produced is just a matter of adding the condition to the where clauses.) The resulting query is:
with bp as (
select t.*,
cast(right(prodbeg, 2) as int) as pbeg,
cast(right(prodend, 2) as int) as pend
from mytable t
),
bp1 as (
select bp.*,
(select top 1 pbeg
from bp bp2
where bp2.pbeg < bp.pbeg and pb2.BoxNumber = pb.BoxNumber
order by bp2.pbeg desc
) as prevpend
from bp
),
bp2 as (
select bp1.*,
(select sum(case when prevpend = pbeg - 1 then 0 else 1 end)
from bp1 bp1a
where bp1a.pbeg < bp1.pbeg and pb1a.BoxNumber = pb1.BoxNumber
) as groupid
from bp1
)
select BoxNumber, MIN(ProdBeg) AS ProdBeg, MAX(ProdEnd) AS ProdEnd, DateProduced,
COUNT(ArtifactID) AS 'Documents', SUM(NumPages) AS 'Pages'
FROM bp2
GROUP BY BoxNumber, groupid
ORDER BY ProdBeg, ProdEnd;

Related

Oracle SQL Join Data Sequentially

I am trying to track the usage of material with my SQL. There is no way in our database to link when a part is used to the order it originally came from. A part simply ends up in a bin after an order arrives, and then usage of parts basically just creates a record for the number of parts used at a time of transaction. I am attempting to, as best I can, link usage to an order number by summing over the data and sequentially assigning it to order numbers.
My sub queries have gotten me this far. Each order number is received on a date. I then join the usage table records based on the USEDATE needing to be equal to or greater than the RECEIVEDATE of the order. The data produced by this is as such:
| ORDERNUM | PARTNUM | RECEIVEDATE | ORDERQTY | USEQTY | USEDATE |
|----------|----------|-------------------------|-----------|---------|------------------------|
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 1 | 11/18/2016 1:40:55 PM |
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 3 | 12/26/2016 2:19:32 PM |
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 1 | 1/3/2017 8:31:21 AM |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 1 | 11/18/2016 1:40:55 PM |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 3 | 12/26/2016 2:19:32 PM |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 1 | 1/3/2017 8:31:21 AM |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 3 | 12/26/2016 2:19:32 PM |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 1 | 1/3/2017 8:31:21 AM |
| 7812 | E1125 | 12/27/2016 10:56:01 AM | 1 | 1 | 1/3/2017 8:31:21 AM |
| 1191 | E1125 | 1/5/2017 1:12:01 PM | 2 | 0 | null |
The query for the above section looks as such:
SELECT
B.*,
NVL(B2.QTY, ‘0’) USEQTY
B2.USEDATE USEDATE
FROM <<Sub Query B>>
LEFT JOIN USETABLE B2 ON B.PARTNUM = B2.PARTNUM AND B2.USEDATE >= B.RECEIVEDATE
My ultimate goal here is to join USEQTY records sequentially until they have filled enough ORDERQTY’s. I also need to add an ORDERUSE column that represents what QTY from the USEQTY column was actually applied to that record. Not really sure how to word this any better so here is example of what I need to happen based on the table above:
| ORDERNUM | PARTNUM | RECEIVEDATE | ORDERQTY | USEQTY | USEDATE | ORDERUSE |
|----------|----------|-------------------------|-----------|---------|------------------------|-----------|
| 4412 | E1125 | 10/26/2016 1:32:25 PM | 1 | 1 | 11/18/2016 1:40:55 PM | 1 |
| 4111 | E1125 | 10/28/2016 2:54:13 PM | 1 | 3 | 12/26/2016 2:19:32 PM | 1 |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 2 | 12/26/2016 2:19:32 PM | 2 |
| 0393 | E1125 | 12/22/2016 11:52:04 AM | 3 | 1 | 1/3/2017 8:31:21 AM | 1 |
| 7812 | E1125 | 12/27/2016 10:56:01 AM | 1 | 0 | null | 0 |
| 1191 | E1125 | 1/5/2017 1:12:01 PM | 2 | 0 | null | 0 |
If I can get the query to pull the information like above, I will then be able to group the records together and sum the ORDERUSE column which would get me the information I need to know what orders have been used and which have not been fully used. So in the example above, if I were to sum the ORDERUSE column for each of the ORDERNUMs, orders 4412, 4111, 0393 would all show full usage. Orders 7812, 1191 would show not being fully used.
If i am reading this correctly you want to determine how many parts have been used. In your example it looks like you have 5 usages and with 5 orders coming to a total of 8 parts with the following orders having been used.
4412 - one part - one used
4111 - one part - one used
7812 - one part - one used
0393 - three
parts - two used
After a bit of hacking away I came up with the following SQL. Not sure if this works outside of your sample data since thats the only thing I used to test and I am no expert.
WITH data
AS (SELECT *
FROM (SELECT *
FROM sub_b1
join (SELECT ROWNUM rn
FROM dual
CONNECT BY LEVEL < 15) a
ON a.rn <= sub_b1.orderqty
ORDER BY receivedate)
WHERE ROWNUM <= (SELECT SUM(useqty)
FROM sub_b2))
SELECT sub_b1.ordernum,
partnum,
receivedate,
orderqty,
usage
FROM sub_b1
join (SELECT ordernum,
Max(rn) AS usage
FROM data
GROUP BY ordernum) b
ON sub_b1.ordernum = b.ordernum
You are looking for "FIFO" inventory accounting.
The proper data model should have two tables, one for "received" parts and the other for "delivered" or "used". Each table should show an order number, a part number and quantity (received or used) for that order, and a timestamp or date-time. I model both in CTE's in my query below, but in your business they should be two separate table. Also, a trigger or similar should enforce the constraint that a part cannot be used until it is available in stock (that is: for each part id, the total quantity used since inception, at any point in time, should not exceed the total quantity received since inception, also at the same point in time). I assume that the two input tables do, in fact, satisfy this condition, and I don't check it in the solution.
The output shows a timeline of quantity used, by timestamp, matching "received" and "delivered" (used) quantities for each part_id. In the sample data I illustrate a single part_id, but the query will work with multiple part_id's, and orders (both for received and for delivered or used) that include multiple parts (part id's) with different quantities.
with
received ( order_id, part_id, ts, qty ) as (
select '0030', '11A4', timestamp '2015-03-18 15:00:33', 20 from dual union all
select '0032', '11A4', timestamp '2015-03-22 15:00:33', 13 from dual union all
select '0034', '11A4', timestamp '2015-03-24 10:00:33', 18 from dual union all
select '0036', '11A4', timestamp '2015-04-01 15:00:33', 25 from dual
),
delivered ( order_id, part_id, ts, qty ) as (
select '1200', '11A4', timestamp '2015-03-18 16:30:00', 14 from dual union all
select '1210', '11A4', timestamp '2015-03-23 10:30:00', 8 from dual union all
select '1220', '11A4', timestamp '2015-03-23 11:30:00', 7 from dual union all
select '1230', '11A4', timestamp '2015-03-23 11:30:00', 4 from dual union all
select '1240', '11A4', timestamp '2015-03-26 15:00:33', 1 from dual union all
select '1250', '11A4', timestamp '2015-03-26 16:45:11', 3 from dual union all
select '1260', '11A4', timestamp '2015-03-27 10:00:33', 2 from dual union all
select '1270', '11A4', timestamp '2015-04-03 15:00:33', 16 from dual
),
(end of test data; the SQL query begins below - just add the word WITH at the top)
-- with
combined ( part_id, rec_ord, rec_ts, rec_sum, del_ord, del_ts, del_sum) as (
select part_id, order_id, ts,
sum(qty) over (partition by part_id order by ts, order_id),
null, cast(null as date), cast(null as number)
from received
union all
select part_id, null, cast(null as date), cast(null as number),
order_id, ts,
sum(qty) over (partition by part_id order by ts, order_id)
from delivered
),
prep ( part_id, rec_ord, del_ord, del_ts, qty_sum ) as (
select part_id, rec_ord, del_ord, del_ts, coalesce(rec_sum, del_sum)
from combined
)
select part_id,
last_value(rec_ord ignore nulls) over (partition by part_id
order by qty_sum desc) as rec_ord,
last_value(del_ord ignore nulls) over (partition by part_id
order by qty_sum desc) as del_ord,
last_value(del_ts ignore nulls) over (partition by part_id
order by qty_sum desc) as used_date,
qty_sum - lag(qty_sum, 1, 0) over (partition by part_id
order by qty_sum, del_ts) as used_qty
from prep
order by qty_sum
;
Output:
PART_ID REC_ORD DEL_ORD USED_DATE USED_QTY
------- ------- ------- ----------------------------------- ----------
11A4 0030 1200 18-MAR-15 04.30.00.000000000 PM 14
11A4 0030 1210 23-MAR-15 10.30.00.000000000 AM 6
11A4 0032 1210 23-MAR-15 10.30.00.000000000 AM 2
11A4 0032 1220 23-MAR-15 11.30.00.000000000 AM 7
11A4 0032 1230 23-MAR-15 11.30.00.000000000 AM 4
11A4 0032 1230 23-MAR-15 11.30.00.000000000 AM 0
11A4 0034 1240 26-MAR-15 03.00.33.000000000 PM 1
11A4 0034 1250 26-MAR-15 04.45.11.000000000 PM 3
11A4 0034 1260 27-MAR-15 10.00.33.000000000 AM 2
11A4 0034 1270 03-APR-15 03.00.33.000000000 PM 12
11A4 0036 1270 03-APR-15 03.00.33.000000000 PM 4
11A4 0036 21
12 rows selected.
Notes: (1) One needs to be careful if at one moment the cumulative used quantity exactly matches cumulative received quantity. All rows must be include in all the intermediate results, otherwise there will be bad data in the output; but this may result (as you can see in the output above) in a few rows with a "used quantity" of 0. Depending on how this output is consumed (for further processing, for reporting, etc.) these rows may be left as they are, or they may be discarded in a further outer-query with the condition where used_qty > 0.
(2) The last row shows a quantity of 21 with no used_date and no del_ord. This is, in fact, the "current" quantity in stock for that part_id as of the last date in both tables - available for future use. Again, if this is not needed, it can be removed in an outer query. There may be one or more rows like this at the end of the table.

SQL GROUP BY and differences on same field (for MS Access)

Hi I have the following style of table under MS Access: (I didn't make the table and cant change it)
Date_r | Id_Person |Points |Position
25/05/2015 | 120 | 2000 | 1
25/05/2015 | 230 | 1500 | 2
25/05/2015 | 100 | 500 | 3
21/12/2015 | 120 | 2200 | 1
21/12/2015 | 230 | 2000 | 4
21/12/2015 | 100 | 200 | 20
what I am trying to do is to get a list of players (identified by Id_Person) ordered by the points difference between 2 dates.
So for example if I pick date1=25/05/2015 and date2=21/12/2015 I would get:
Id_Person |Points_Diff
230 | 500
120 | 200
100 |-300
I think I need to make something like
SELECT Id_Person , MAX(Points)-MIN(Points)
FROM Table
WHERE date_r = #25/05/2015# or date_r = #21/12/2015#
GROUP BY Id_Person
ORDER BY MAX(Points)-MIN(Points) DESC
But my problem is that i don't really want to order by (MAX(Points)-MIN(Points)) but rather by (points at date2 - points at date1) which can be different because points can decrease with the time.
One method is to use first and last However, this can sometimes produce strange results, so I think that conditional aggregation is best:
SELECT Id_Person,
(MAX(IIF(date_r = #25/05/2015#, Points, 0)) -
MIN(IIF(date_r = #21/05/2015#, Points, 0))
) as PointsDiff
FROM Table
WHERE date_r IN (#25/05/2015#, #21/12/2015#)
GROUP BY Id_Person
ORDER BY (MAX(IIF(date_r = #25/05/2015#, Points, 0)) -
MIN(IIF(date_r = #21/05/2015#, Points, 0))
) DESC;
Because you have two dates, this is more easily written as:
SELECT Id_Person,
SUM(IIF(date_r = #25/05/2015#, Points, -Points)) as PointsDiff
FROM Table
WHERE date_r IN (#25/05/2015#, #21/12/2015#)
GROUP BY Id_Person
ORDER BY SUM(IIF(date_r = #25/05/2015#, Points, -Points)) DESC;

Select the difference of two consecutive columns

I have a table car that looks like this:
| mileage | carid |
------------------
| 30 | 1 |
| 50 | 1 |
| 100 | 1 |
| 0 | 2 |
| 70 | 2 |
I would like to get the average difference for each car. So for example for car 1 I would like to get ((50-30)+(100-50))/2 = 35. So I created the following query
SELECT AVG(diff),carid FROM (
SELECT (mileage-
(SELECT Max(mileage) FROM car Where mileage<mileage AND carid=carid GROUP BY carid))
AS diff,carid
FROM car GROUP BY carid)
But this doesn't work as I'm not able to use current row for the other column. And I'm quite clueless on how to actually solve this in a different way.
So how would I be able to obtain the value of the next row somehow?
The average difference is the maximum minus he minimum divided by one less than the count (you can do the arithmetic to convince yourself this is true).
Hence:
select carid,
( (max(mileage) - min(mileage)) / nullif(count(*) - 1, 0)) as avg_diff
from cars
group by carid;

how to get daily profit from sql table

I'm stucking for a solution at the problem of finding daily profits from db (ms access) table. The difference wrt other tips I found online is that I don't have in the table a field "Price" and one "Cost", but a field "Type" which distinguish if it is a revenue "S" or a cost "C"
this is the table "Record"
| Date | Price | Quantity | Type |
-----------------------------------
|01/02 | 20 | 2 | C |
|01/02 | 10 | 1 | S |
|01/02 | 3 | 10 | S |
|01/02 | 5 | 2 | C |
|03/04 | 12 | 3 | C |
|03/03 | 200 | 1 | S |
|03/03 | 120 | 2 | C |
So far I tried different solutions like:
SELECT
(SELECT SUM (RS.Price* RS.Quantity)
FROM Record RS WHERE RS.Type='S' GROUP BY RS.Data
) as totalSales,
(SELECT SUM (RC.Price*RC.Quantity)
FROM Record RC WHERE RC.Type='C' GROUP BY RC.Date
) as totalLosses,
ROUND(totalSales-totaleLosses,2) as NetTotal,
R.Date
FROM RECORD R";
in my mind it could work but obviously it doesn't
and
SELECT RC.Data, ROUND(SUM (RC.Price*RC.QuantitY),2) as DailyLoss
INTO #DailyLosses
FROM Record RC
WHERE RC.Type='C' GROUP BY RC.Date
SELECT RS.Date, ROUND(SUM (RS.Price*RS.Quantity),2) as DailyRevenue
INTO #DailyRevenues
FROM Record RS
WHERE RS.Type='S'GROUP BY RS.Date
SELECT Date, DailyRevenue - DailyLoss as DailyProfit
FROM #DailyLosses dlos, #DailyRevenues drev
WHERE dlos.Date = drev.Date";
My problem beyond the correct syntax is the approach to this kind of problem
You can use grouping and conditional summing. Try this:
SELECT data.Date, data.Income - data.Cost as Profit
FROM (
SELECT Record.Date as Date,
SUM(IIF(Record.Type = 'S', Record.Price * Record.Quantity, 0)) as Income,
SUM(IIF(Record.Type = 'C', Record.Price * Record.Quantity, 0)) as Cost,
FROM Record
GROUP BY Record.Date
) data
In this case you first create a sub-query to get separate fields for Income and Cost, and then your outer query uses subtraction to get actual profit.

Find spectators that have seen the same shows (match multiple rows for each)

For an assignment I have to write several SQL queries for a database stored in a PostgreSQL server running PostgreSQL 9.3.0. However, I find myself blocked with last query. The database models a reservation system for an opera house. The query is about associating the a spectator the other spectators that assist to the same events every time.
The model looks like this:
Reservations table
id_res | create_date | tickets_presented | id_show | id_spectator | price | category
-------+---------------------+---------------------+---------+--------------+-------+----------
1 | 2015-08-05 17:45:03 | | 1 | 1 | 195 | 1
2 | 2014-03-15 14:51:08 | 2014-11-30 14:17:00 | 11 | 1 | 150 | 2
Spectators table
id_spectator | last_name | first_name | email | create_time | age
---------------+------------+------------+----------------------------------------+---------------------+-----
1 | gonzalez | colin | colin.gonzalez#gmail.com | 2014-03-15 14:21:30 | 22
2 | bequet | camille | bequet.camille#gmail.com | 2014-12-10 15:22:31 | 22
Shows table
id_show | name | kind | presentation_date | start_time | end_time | id_season | capacity_cat1 | capacity_cat2 | capacity_cat3 | price_cat1 | price_cat2 | price_cat3
---------+------------------------+--------+-------------------+------------+----------+-----------+---------------+---------------+---------------+------------+------------+------------
1 | madama butterfly | opera | 2015-09-05 | 19:30:00 | 21:30:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
2 | don giovanni | opera | 2015-09-12 | 19:30:00 | 21:45:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
So far I've started by writing a query to get the id of the spectator and the date of the show he's attending to, the query looks like this.
SELECT Reservations.id_spectator, Shows.presentation_date
FROM Reservations
LEFT JOIN Shows ON Reservations.id_show = Shows.id_show;
Could someone help me understand better the problem and hint me towards finding a solution. Thanks in advance.
So the result I'm expecting should be something like this
id_spectator | other_id_spectators
-------------+--------------------
1| 2,3
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
Note based on comments: Wanted to make clear that this answer may be of limited use as it was answered in the context of SQL-Server (tag was present at the time)
There is probably a better way to do it, but you could do it with the 'stuff 'function. The only drawback here is that, since your ids are ints, placing a comma between values will involve a work around (would need to be a string). Below is the method I can think of using a work around.
SELECT [id_spectator], [id_show]
, STUFF((SELECT ',' + CAST(A.[id_spectator] as NVARCHAR(10))
FROM reservations A
Where A.[id_show]=B.[id_show] AND a.[id_spectator] != b.[id_spectator] FOR XML PATH('')),1,1,'') As [other_id_spectators]
From reservations B
Group By [id_spectator], [id_show]
This will show you all other spectators that attended the same shows.
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
In other words, you want a list of ...
all spectators that have seen all the shows that a given spectator has seen (and possibly more than the given one)
This is a special case of relational division. We have assembled an arsenal of basic techniques here:
How to filter SQL results in a has-many-through relation
It is special because the list of shows each spectator has to have attended is dynamically determined by the given prime spectator.
Assuming that (d_spectator, id_show) is unique in reservations, which has not been clarified.
A UNIQUE constraint on those two columns (in that order) also provides the most important index.
For best performance in query 2 and 3 below also create an index with leading id_show.
1. Brute force
The primitive approach would be to form a sorted array of shows the given user has seen and compare the same array of others:
SELECT 1 AS id_spectator, array_agg(sub.id_spectator) AS id_other_spectators
FROM (
SELECT id_spectator
FROM reservations r
WHERE id_spectator <> 1
GROUP BY 1
HAVING array_agg(id_show ORDER BY id_show)
#> (SELECT array_agg(id_show ORDER BY id_show)
FROM reservations
WHERE id_spectator = 1)
) sub;
But this is potentially very expensive for big tables. The whole table hast to be processes, and in a rather expensive way, too.
2. Smarter
Use a CTE to determine relevant shows, then only consider those
WITH shows AS ( -- all shows of id 1; 1 row per show
SELECT id_spectator, id_show
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
)
SELECT sub.id_spectator, array_agg(sub.other) AS id_other_spectators
FROM (
SELECT s.id_spectator, r.id_spectator AS other
FROM shows s
JOIN reservations r USING (id_show)
WHERE r.id_spectator <> s.id_spectator
GROUP BY 1,2
HAVING count(*) = (SELECT count(*) FROM shows)
) sub
GROUP BY 1;
#> is the "contains2 operator for arrays - so we get all spectators that have at least seen the same shows.
Faster than 1. because only relevant shows are considered.
3. Real smart
To also exclude spectators that are not going to qualify early from the query, use a recursive CTE:
WITH RECURSIVE shows AS ( -- produces exactly 1 row
SELECT id_spectator, array_agg(id_show) AS shows, count(*) AS ct
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
GROUP BY 1
)
, cte AS (
SELECT r.id_spectator, 1 AS idx
FROM shows s
JOIN reservations r ON r.id_show = s.shows[1]
WHERE r.id_spectator <> s.id_spectator
UNION ALL
SELECT r.id_spectator, idx + 1
FROM cte c
JOIN reservations r USING (id_spectator)
JOIN shows s ON s.shows[c.idx + 1] = r.id_show
)
SELECT s.id_spectator, array_agg(c.id_spectator) AS id_other_spectators
FROM shows s
JOIN cte c ON c.idx = s.ct -- has an entry for every show
GROUP BY 1;
Note that the first CTE is non-recursive. Only the second part is recursive (iterative really).
This should be fastest for small selections from big tables. Row that don't qualify are excluded early. the two indices I mentioned are essential.
SQL Fiddle demonstrating all three.
It sounds like you have one half of the total question--determining which id_shows a particular id_spectator attended.
What you want to ask yourself is how you can determine which id_spectators attended an id_show, given an id_show. Once you have that, combine the two answers to get the full result.
So the final answer I got, looks like this :
SELECT id_spectator, id_show,(
SELECT string_agg(to_char(A.id_spectator, '999'), ',')
FROM Reservations A
WHERE A.id_show=B.id_show
) AS other_id_spectators
FROM Reservations B
GROUP By id_spectator, id_show
ORDER BY id_spectator ASC;
Which prints something like this:
id_spectator | id_show | other_id_spectators
-------------+---------+---------------------
1 | 1 | 1, 2, 9
1 | 14 | 1, 2
Which suits my needs, however if you have any improvements to offer, please share :) Thanks again everybody!