SQL moving aggregate SUM without partial results - sql

Assume I have this schema (tested on postgresql) where the 'Scorelines' relation contains results of sport matches. (kickoff is a TIMESTAMP but replaced by INT for readability)
SQLFiddle here: http://sqlfiddle.com/#!12/52475/3
CREATE TABLE Scorelines (
team TEXT,
kickoff INT,
scored INT,
conceded INT
);
Now I want to produce another column 'three_matches_scored' that contains the sum of the points scored
over the 3 preceding game (determined by kickoff) of the same team. I have this:
SELECT team, kickoff, scored, conceded, SUM(scored) OVER three_matches AS three_matches_scored
FROM Scorelines
WINDOW three_matches AS
(PARTITION BY team ORDER BY kickoff
ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING)
ORDER BY kickoff;
This works beautifully so far, except that I get values starting from the second game. Example:
| TEAM | KICKOFF | SCORED | CONCEDED | THREE_MATCHES_SCORED |
|------|---------|--------|----------|----------------------|
| A | 1 | 1 | 0 | (null) |
| B | 2 | 1 | 1 | (null) |
| A | 3 | 1 | 1 | 1 |
| A | 4 | 3 | 0 | 2 |
| B | 4 | 1 | 4 | 1 |
| A | 6 | 0 | 2 | 5 |
| B | 6 | 4 | 2 | 2 |
| B | 8 | 1 | 2 | 6 |
| B | 10 | 1 | 1 | 6 |
| A | 11 | 2 | 1 | 4 |
I want the column 'three_matches_scored' to be (null) for the first 3 games because there are no 3 results to sum up. How can I achieve this?
I'd prefer simple understandable solutions, performance is not critical for this particular case.
My only idea right now, is to define a stored function SUM3, that results in (null) with less than 3 values to add up. But I never defined a function in SQL and can't seem to figure it out.

You can use a case statement to null the rows where there are less than 3 games:
SELECT team, kickoff, scored, conceded,
CASE WHEN COUNT(scored) OVER three_matches = 3
THEN SUM(scored) OVER three_matches
ELSE NULL
END AS three_matches_scored
FROM Scorelines
WINDOW three_matches AS
(PARTITION BY team ORDER BY kickoff
ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING)
ORDER BY kickoff;
Output:
team | kickoff | scored | conceded | three_matches_scored
------+---------+--------+----------+----------------------
A | 1 | 1 | 0 |
B | 2 | 1 | 1 |
A | 3 | 1 | 1 |
A | 4 | 3 | 0 |
B | 4 | 1 | 4 |
A | 6 | 0 | 2 | 5
B | 6 | 4 | 2 |
B | 8 | 1 | 2 | 6
B | 10 | 1 | 1 | 6
A | 11 | 2 | 1 | 4
(10 rows)

See harmics answer above.
(my first solution, just for reference)
Solution with user defined aggregate:
CREATE TYPE intermediate_sum AS (
sum INT,
count INT
);
CREATE FUNCTION sum_sfunc(intermediate_sum, INTEGER) RETURNS intermediate_sum AS
$$ SELECT $2 + $1.sum AS sum, $1.count - 1 AS count $$ LANGUAGE SQL;
CREATE FUNCTION sum_ffunc(intermediate_sum) RETURNS INTEGER AS
$$ SELECT (CASE WHEN $1.count > 1 THEN null
WHEN $1.count = 0 THEN $1.sum
END)
$$ LANGUAGE SQL;
CREATE AGGREGATE sum3(INTEGER) (
sfunc = sum_sfunc,
finalfunc = sum_ffunc,
stype = intermediate_sum,
initcond = '(0,3)'
);
The aggregate SUM3 wants at least 3 values, otherwise it returns (null). One can define other aggreates like SUM4 by changing the initcond, for example to '(0,4)'.

Related

SQL retrieve columns based on two rows values conditions

I have a table called products
id | product_id | detail | value
----+----------------+--------------+--------
1 | 1 | size | medium
2 | 1 | load | yes
5 | 1 | color | green
3 | 1 | availability | 10
4 | 1 | deliverable | yes
5 | 2 | size | small
6 | 2 | load | no
7 | 2 | color | bleu
8 | 2 | availability | 5
9 | 2 | deliverable | yes
I need to develop a query where I can extract rows based on a condition.
I want to select id, product_id, detail,value
if detail="load" and value="yes" then select * execept detail="delivrable" and I do not care its value
if detail="load" and value="no" then select * execept detail="load" and I do not care its value
=> I need only 1 detail between load and deliverable based on the values detail="load" and value="yes" as shown in the wished outcomes table below
id | product_id | detail | value
----+----------------+--------------+-------
1 | 1 | size | medium
2 | 1 | load | yes
5 | 1 | color | green
3 | 1 | availability | 10
5 | 2 | size | small
7 | 2 | color | bleu
8 | 2 | availability | 5
9 | 2 | deliverable | yes
Thank you in advance for your help
Consider below
select * except(a,b,c,d) from (
select *,
countif(detail="load" and value="yes") over win1 > 0 as a,
countif(detail="deliverable") over win2 = 0 as b,
countif(detail="load" and value="no") over win1 > 0 as c,
countif(detail="load") over win2 = 0 as d
from your_table
window win1 as (partition by product_id),
win2 as (win1 order by id rows between current row and current row)
)
where (a and b) or (c and d)
if applied to sample data in your question - output is

Grouping the rows on the basis of specific condition in SQL Server

I want to group the rows on the basis of a specific condition.
The table structure is something like this
EmpID | EmpName | TaskId | A_Shift_Status | B_Shift_Status | C_Shift_Status | D_Shift_Status
1 | John | 1 | 1 | null | 2 | 1
1 | John | 2 | 1 | null | 1 | 1
2 | Mike | 3 | 1 | 1 | 2 | 1
2 | Mike | 4 | null | 1 | null | 1
3 | Steve | 5 | null | 1 | 2 | 1
3 | Steve | 6 | 1 | null | 2 | 1
The criteria will be
Done 1
Pending 2
NA 3
The expected output is to group the employees by task and the status will be on the following condition
if ALL tasks are done by any employee then the status will be done
(i.e. 1)
if ANY of the tasks is incomplete then the status will be
incomplete/pending (i.e. 2)
So the desired output will be
EmpID | EmpName | A_Shift_Status | B_Shift_Status | C_Shift_Status | D_Shift_Status
1 | John | 1 | null | 2 | 1
2 | Mike | 1 | 1 | 2 | 1
3 | Steve | 1 | 1 | 2 | 1
So in other terms summary/grouping should only show complete/done (i.e. 1) when all the rows of a particular shift column of an employee have status as complete/done (i.e. 1)
Based on your data (where the criteria are 1, 2 and NULL for n/a), a simple 'group by' the employee, and MAX of the columns, should work e.g.,
SELECT
yt.EmpID,
yt.EmpName,
MAX(yt.A_Shift_Status) AS A_Shift_Status,
MAX(yt.B_Shift_Status) AS B_Shift_Status,
MAX(yt.C_Shift_Status) AS C_Shift_Status,
MAX(yt.D_Shift_Status) AS D_Shift_Status
FROM
yourtable yt
GROUP BY
yt.EmpID,
yt.EmpName;
For the shift statuses
If any of them are 2, it returns 2
otherwise if any of them are 1, it returns 1
otherwise it returns NULL
Notes re 1/2/3 (which was specified as criteria) vs 1/2/NULL (which is in the data)
It gets a little tricker if the inputs are supposed to use 1/2/3 instead of 1/2/NULL. Let us know if you are changing the inputs to reflect that.
If the input is fine as NULLs, but you need the output to have '3' for n/a (nulls), you can put an ISNULL or COALESCE around the MAX statements e.g., ISNULL(MAX(yt.A_Shift_Status), 3) AS A_Shift_Status

Hierarchy table, update column for a particular level

So I have a hierarchy table in the following format:
Instance | Parent | Serial | Hierarchy Level
1 | 0 | x0 | 1
2 | 0 | x1 | 1
3 | 1 | xy0 | 2
4 | 1 | xy1 | 2
5 | 2 | - | 2
6 | 2 | - | 2
7 | 2 | - | 2
And what I would like to get is:
Instance | Parent | Serial | Hierarchy Level
1 | 0 | x0 | 1
2 | 0 | x1 | 1
3 | 1 | xy0 | 2
4 | 1 | xy1 | 2
5 | 1 | x0 | 2
6 | 2 | x1 | 2
7 | 2 | x1 | 2
All level 1's have a serial number. My goal is to update all level 2's where Serial is null, and to do this, I would have to select the parent's serial number. Could someone give me an idea of how to go about this?
This is what I've tried so far (as well as a couple of other things, to no avail):
UPDATE "USER"."TABLE" AS A1
SET A1.SERIAL =
(
SELECT "SERIAL"
FROM "USER"."TABLE" AS A2
WHERE A2."PARENT" = A1."INSTANCE"
)
WHERE "SERIAL" IS NULL
AND "HIERARCHYLEVEL" = 2
Native to Java, I feel like this should be a lot easier, but I am having a lot of difficulties with this. Any help would be much appreciated.
Sorry for the stupid question. The simple fix was as #rd_nielsen explained in the OP responses:
UPDATE "user"."table" AS A1
SET A1.serial =
(
SELECT "serial"
FROM "user"."table" AS A2
WHERE A1."PARENT" = A2."INSTANCE"
)
WHERE "serial" IS NULL
AND "HIERARCHYLEVEL" = 2

Counting on multiple columns

I have a table like this:
+------------+---------------+-------------+
|store_number|entrance_number|camera_number|
+------------+---------------+-------------+
| 1 | 1 | 1 |
| 1 | 1 | 2 |
| 2 | 1 | 1 |
| 2 | 2 | 1 |
| 2 | 2 | 2 |
| 3 | 1 | 1 |
| 4 | 1 | 1 |
| 4 | 1 | 2 |
| 4 | 2 | 1 |
| 4 | 3 | 1 |
+------------+---------------+-------------+
In summary the stores are numbered 1 and up, the entrances are numbered 1 and up for each store, and the cameras are numbered 1 and up for each entrance.
What I want to do is count how many how many entrances in total, and how many cameras in total for each store. Producing this result from the above table:
+------------+---------------+-------------+
|store_number|entrances |cameras |
+------------+---------------+-------------+
| 1 | 1 | 2 |
| 2 | 2 | 3 |
| 3 | 1 | 1 |
| 4 | 3 | 4 |
+------------+---------------+-------------+
How can I count on multiple columns to produce this result?
You can do this with a GROUP BY and a COUNT() of each item:
Select Store_Number,
Count(Distinct Entrance_Number) as Entrances,
Count(Camera_Number) As Cameras
From YourTable
Group By Store_Number
From what I can tell from your expected output, you're looking for the number of cameras that appear, whilst also looking for the DISTINCT number of entrances.
This will work as well,
DECLARE #store TABLE
( store_number INT,entrance_number INT,camera_number INT)
INSERT INTO #store VALUES(1,1,1),(1,1,2),(2,1,1),(2,2,1),
(2,2,2),(3,1,1),(4,1,1),(4,1,2),(4,2,1),(4,3,1)
SELECT AA.s store_number, BB.e entrances,AA.c cameras FROM (
SELECT s,COUNT(DISTINCT c) c FROM ( SELECT store_number s,
CONVERT(VARCHAR,store_number) + CONVERT(VARCHAR,entrance_number) +
CONVERT(VARCHAR,camera_number) c FROM #store ) A GROUP BY s ) AA
LEFT JOIN
( SELECT s,COUNT(DISTINCT e) e FROM ( SELECT store_number s,
CONVERT(VARCHAR,store_number) + CONVERT(VARCHAR,entrance_number) e
FROM #store ) B GROUP BY s ) BB ON AA.s = BB.s
Hope it helped. :)

Select dynamic couples of lines in SQL (PostgreSQL)

My objective is to make dynamic group of lines (of product by TYPE & COLOR in fact)
I don't know if it's possible just with one select query.
But : I want to create group of lines (A PRODUCT is a TYPE and a COLOR) as per the number_per_group column and I want to do this grouping depending on the date order (Order By DATE)
A single product with a NB_PER_GROUP number 2 is exclude from the final result.
Table :
-----------------------------------------------
NUM | TYPE | COLOR | NB_PER_GROUP | DATE
-----------------------------------------------
0 | 1 | 1 | 2 | ...
1 | 1 | 1 | 2 |
2 | 1 | 2 | 2 |
3 | 1 | 2 | 2 |
4 | 1 | 1 | 2 |
5 | 1 | 1 | 2 |
6 | 4 | 1 | 3 |
7 | 1 | 1 | 2 |
8 | 4 | 1 | 3 |
9 | 4 | 1 | 3 |
10 | 5 | 1 | 2 |
Results :
------------------------
GROUP_NUMBER | NUM |
------------------------
0 | 0 |
0 | 1 |
~~~~~~~~~~~~~~~~~~~~~~~~
1 | 2 |
1 | 3 |
~~~~~~~~~~~~~~~~~~~~~~~~
2 | 4 |
2 | 5 |
~~~~~~~~~~~~~~~~~~~~~~~~
3 | 6 |
3 | 8 |
3 | 9 |
If you have another way to solve this problem, I will accept it.
What about something like this?
select max(gn.group_number) group_number, ip.num
from products ip
join (
select date, type, color, row_number() over (order by date) - 1 group_number
from (
select op.num, op.type, op.color, op.nb_per_group, op.date, (row_number() over (partition by op.type, op.color order by op.date) - 1) % nb_per_group group_order
from products op
) sq
where sq.group_order = 0
) gn
on ip.type = gn.type
and ip.color = gn.color
and ip.date >= gn.date
group by ip.num
order by group_number, ip.num
This may only work if your nb_per_group values are the same for each combination of type and color. It may also require unique dates, but that could probably be worked around if required.
The innermost subquery partitions the rows by type and color, orders them by date, then calculates the row numbers modulo nb_per_group; this forms a 0-based count for the group that resets to 0 each time nb_per_group is exceeded.
The next-level subquery finds all of the 0 values we mapped in the lower subquery and assigns group numbers to them.
Finally, the outermost query ties each row in the products table to a group number, calculated as the highest group number that split off before this product's date.