How to split 2 numbers into the equal ranges in PostgreSQL? - sql

In PostgreSQL database I have table called layers. It looks like this:
| ID | TOTAL_SUBSCRIBERS | DENSITY |
|----|-------------------|---------|
| 1 | 34440 | |
| 2 | 41994 | |
| 3 | 102824 | |
| 4 | 19608 | |
| 5 | 1287 | |
| 6 | 4944 | |
I found max and min values of the TOTAL_SUBSCRIBERS column.
select
MIN(total_subscribers),
MAX(total_subscribers)
from
layers;
Right now I need to split the max and min into 6 range and check if each TOTAL_SUBSCRIBERS included in a certain interval. Depending on which interval is included in TOTAL_SUBSCRIBERS, I need to write the number of the interval in the DENSITY column.
For example in this table max value is 102824, min value is 1287.
RANGES:
102824 - 1287 = 101537
101537 / 6 = 16922.8333 ~ 16923
1 range: [1287-18210]
2 range: [18211-35133]
3 range: [35134-52056]
4 range: [52057-68979]
5 range: [68980-85902]
6 range: [85903-102825]
FINAL RESULT:
| ID | TOTAL_SUBSCRIBERS | DENSITY |
|----|-------------------|---------|
| 1 | 34440 | 3 | < 34440 in 3 range
| 2 | 41994 | 3 | < 41994 in 3 range
| 3 | 102824 | 6 | < 102824 in 6 range
| 4 | 19608 | 2 | < 19608 in 2 range
| 5 | 1287 | 1 | < 1287 in 1 range
| 6 | 4944 | 1 | < 4944 in 1 range

In a CTE calculate the min and max of TOTAL_SUBSCRIBERS and also the length of each interval and then cross join to the table to make the calculation:
with cte as (
select
min(TOTAL_SUBSCRIBERS) minsub,
((max(TOTAL_SUBSCRIBERS) - min(TOTAL_SUBSCRIBERS)) + 1) / 6 dist
from layers
)
select l.*,
(l.TOTAL_SUBSCRIBERS - c.minsub) / c.dist + 1 DENSITY
from layers l cross join (select * from cte) c
See the demo.
Results:
| id | total_subscribers | density |
| --- | ----------------- | ------- |
| 1 | 34440 | 2 |
| 2 | 41994 | 3 |
| 3 | 102824 | 6 |
| 4 | 19608 | 2 |
| 5 | 1287 | 1 |
| 6 | 4944 | 1 |
In your expected results the row with id = 1 should have DENSITY = 2, right?
Also your ranges should be:
1 range: [1287-18209]
2 range: [18210-35132]
3 range: [35133-52055]
4 range: [52056-68978]
5 range: [68979-85901]
6 range: [85902-102824]
so they are equally distrbuted.

I believe a combination of a generate_series and a int4range containment operation might be what you're looking for. The following code is tested on PostgreSQL 11 - see db fiddle, but should also work with 9.4+.
Sample data
CREATE TEMPORARY TABLE layers (id SERIAL, total_subscribers INT);
INSERT INTO layers (total_subscribers)
VALUES (34440),(41994),(102824),(19608),(1287),(4944);
Query
SELECT id, total_subscribers,range_id AS density,range_min,range_max
FROM layers,(
WITH j AS (
SELECT min(total_subscribers) AS min_value,
max(total_subscribers) AS max_value,
(max(total_subscribers)-min(total_subscribers))/count(*) AS var
FROM layers)
SELECT
generate_series(1,(SELECT count(*) FROM layers)) AS range_id,
generate_series(j.min_value, j.max_value-var, var)::INT AS range_min,
generate_series(j.min_value+var, j.max_value+var, var+min_value)::INT AS range_max
FROM j) j
WHERE layers.total_subscribers <# int4range(j.range_min, range_max)
ORDER BY id;
id | total_subscribers | density | range_min | range_max
----+-------------------+---------+-----------+-----------
1 | 34440 | 2 | 18209 | 36418
2 | 41994 | 3 | 35131 | 54627
3 | 102824 | 6 | 85897 | 109254
4 | 19608 | 2 | 18209 | 36418
5 | 1287 | 1 | 1287 | 18209
6 | 4944 | 1 | 1287 | 18209
(6 Zeilen)
Further reading: Common Table Expressions (CTE)

select the Min value and divide the value of Max - Min by six as Range from layers as table b.
select min(TOTAL_SUBSCRIBERS) as M,
(max(TOTAL_SUBSCRIBERS)-min(TOTAL_SUBSCRIBERS))/6 as R from layers b
Then select all data from layers and using TOTAL_SUBSCRIBERS to minus Min(TOTAL_SUBSCRIBERS) and divide by Range plus 1 then you can know which range(1-6) the TOTAL_SUBSCRIBERS is.
select a.*,((a.TOTAL_SUBSCRIBERS-b.M)/b.R)+1 as DENSITY from(
select layers.ID ,layers.TOTAL_SUBSCRIBERS from layers )a,
(select min(TOTAL_SUBSCRIBERS) as M,
(max(TOTAL_SUBSCRIBERS)-min(TOTAL_SUBSCRIBERS))/6 as R from layers) b

Related

SQL to group time series meeting a criteria, according to start and end time

I am analyzing power systems time series data, and I am trying to find the contiguous data points that meet a certain boolean flag.
I would like to query this table by returning the start and end time corresponding to the inflection points wherein the value changed from 1 to 0, and 0 to 1.
How should go about implementing the pseudo-sql code below?
SELECT Time
FROM InputTable
WHERE InputTable.Value = 1
INTO OutputTable??, TimeStart??, TimeEnd??;
Input:
+-------+---------+------+
| Index | Time | Value|
+-------+---------+------+
| 0 | 00:00:01| 1 |
| 1 | 00:00:02| 1 |
| 2 | 00:00:03| 1 |
| 3 | 00:00:04| 0 |
| 4 | 00:00:05| 1 |
| 5 | 00:00:06| 1 |
| 6 | 00:00:07| 0 |
| 7 | 00:00:08| 1 |
+-------+---------+------+
Output:
+-------+-----------+----------+
| Index | TimeStart | TimeEnd |
+-------+-----------+----------+
| 0 | 00:00:01 | 00:00:03 |
| 1 | 00:00:05 | 00:00:06 |
| 2 | 00:00:08 | 00:00:08 |
+-------+-----------+----------+
You need to group the values based on adjacent "1"s. This is tricky in MS Access. One method that can be used in Access is to count the number of "0"s (or non-"1" values) before each row.
select ind, min(time), max(time)
from (select t.*,
(select 1 + count(*)
from inputtable as t2
where t2.value = 0 and t2.time < t.time
) as ind
from inputtable as t
) as t
where value = 1
group by ind

Multiply with Previous Value in Oracle SQL

Its easy to multiply (or sum/divide/etc.) with previous row in Excel spreadsheet, however, I could not do it so far in Oracle SQL.
A B C
199901 3.81 51905
199902 -6.09 48743.9855
199903 4.75 51059.32481
199904 6.39 54322.01567
199905 -2.35 53045.4483
199906 2.65 54451.15268
199907 1.1 55050.11536
199908 -1.45 54251.88869
199909 0 54251.88869
199910 4.37 56622.69622
Above, column B is static and column C has the formula as:
((B2/100)+1)*C1
((B3/100)+1)*C2
((B4/100)+1)*C3
Example: 51905 from row 1 multiplied with -6.09 from row 2:
((-6.09/100)+1)*51905
I have been trying analytical and window functions, but not succeeded yet. LAG function can give previous row value in current row, but cannot give calculated previous value.
This can be done with a help of MODEL clause
select *
FROM (
SELECT t.*,
row_number() over (order by a) as rn
from table1 t
)
MODEL
DIMENSION BY (rn)
MEASURES ( A, B, 0 c )
RULES (
c[rn=1] = 51905, -- value in a first row
c[rn>1] = round( c[cv()-1] * (b[cv()]/100 +1), 6 )
)
;
Demo: http://sqlfiddle.com/#!4/9756ed/11
| RN | A | B | C |
|----|--------|-------|--------------|
| 1 | 199901 | 3.81 | 51905 |
| 2 | 199902 | -6.09 | 48743.9855 |
| 3 | 199903 | 4.75 | 51059.324811 |
| 4 | 199904 | 6.39 | 54322.015666 |
| 5 | 199905 | -2.35 | 53045.448298 |
| 6 | 199906 | 2.65 | 54451.152678 |
| 7 | 199907 | 1.1 | 55050.115357 |
| 8 | 199908 | -1.45 | 54251.888684 |
| 9 | 199909 | 0 | 54251.888684 |
| 10 | 199910 | 4.37 | 56622.696219 |

Simple algebra with recursive SQL

The following schema is used to create simple algebraic formulas. variables is used to create formulas such as x=3+4y. variables_has_sub_variables is used to combine the previous mentioned formulas and uses the sign column (will be +1 or -1 only) to determine whether the formula should be added or subtracted to the combination.
For instance, variables table might have the following data where the Implied Formulas column is not really in the table but just for illustrative purposes only.
variables table
+-----------+-----------+-------+------------------+
| variables | intercept | slope | Implied Formula |
+-----------+-----------+-------+------------------+
| 1 | 2.86 | -0.82 | Y1=+2.86-0.82*X1 |
| 2 | 2.96 | -3.49 | Y2=+2.96-3.49*X2 |
| 3 | 2.56 | 2.81 | Y3=+2.56+2.81*X3 |
| 4 | 3.04 | -3.43 | Y4=+3.04-3.43*X4 |
| 5 | -1.94 | 4.11 | Y5=-1.94+4.11*X5 |
| 6 | -1.21 | -0.62 | Y6=-1.21-0.62*X6 |
| 7 | 0.88 | -0.61 | Y7=+0.88-0.61*X7 |
| 8 | -2.77 | -0.34 | Y8=-2.77-0.34*X8 |
| 9 | 1.81 | 1.65 | Y9=+1.81+1.65*X9 |
+-----------+-----------+-------+------------------+
Then, given the below variables_has_sub_variables data, the variables combined resulting in X7=+Y1-Y2+Y3, X8=+Y4+Y5-Y7, and X9=+Y6-Y7+Y8. Next Y7, Y8, and Y9 can be derived using the variables table resulting in Y7=+0.88-0.61*X7, etc. Note that the application will prevent an endless loop such as inserting a record where variables equals 7 and sub_variables equals 9 as variable 9 is based on variable 7.
variables_has_sub_variables table
+-----------+---------------+------+
| variables | sub_variables | sign |
+-----------+---------------+------+
| 7 | 1 | 1 |
| 7 | 2 | -1 |
| 7 | 3 | 1 |
| 8 | 4 | 1 |
| 8 | 5 | 1 |
| 8 | 7 | -1 |
| 9 | 6 | 1 |
| 9 | 7 | -1 |
| 9 | 8 | 1 |
+-----------+---------------+------+
My objective is given any variable (i.e. 1 to 9), determine the constants and root variables where a root variable is defined as not being in variables_has_sub_variables.variables (I can also easily a root column to variables if needed), and these root variables includes 1 through 6 using my above example data.
Doing so for a root variable is easier as there are no sub_variables and is simply Y1=+2.86-0.82*X1.
Doing so for variable 7 is a little trickier:
Y7=+0.88-0.61*X7
=+0.88-0.61*(+Y1-Y2+Y3)
=+0.88-0.61*(+(+2.86-0.82*X1)-(+2.96-3.49*X2)+( +2.56+2.81*X3))
= -0.62 + 0.50*X1 - 2.13*X2 - 1.71*X3
Now the SQL. Below is how I created the tables:
CREATE DATABASE algebra;
USE algebra;
CREATE TABLE `variables` (
`variables` INT NOT NULL,
`slope` DECIMAL(6,2) NOT NULL DEFAULT 1,
`intercept` DECIMAL(6,2) NOT NULL DEFAULT 0,
PRIMARY KEY (`variables`))
ENGINE = InnoDB;
CREATE TABLE `variables_has_sub_variables` (
`variables` INT NOT NULL,
`sub_variables` INT NOT NULL,
`sign` TINYINT NOT NULL,
PRIMARY KEY (`variables`, `sub_variables`),
INDEX `fk_variables_has_variables_variables1_idx` (`sub_variables` ASC),
INDEX `fk_variables_has_variables_variables_idx` (`variables` ASC),
CONSTRAINT `fk_variables_has_variables_variables`
FOREIGN KEY (`variables`)
REFERENCES `variables` (`variables`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `fk_variables_has_variables_variables1`
FOREIGN KEY (`sub_variables`)
REFERENCES `variables` (`variables`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
INSERT INTO variables(variables,intercept,slope) VALUES (1,2.86,-0.82),(2,2.96,-3.49),(3,2.56,2.81),(4,3.04,-3.43),(5,-1.94,4.11),(6,-1.21,-0.62),(7,0.88,-0.61),(8,-2.77,-0.34),(9,1.81,1.65);
INSERT INTO variables_has_sub_variables(variables,sub_variables,sign) VALUES (7,1,1),(7,2,-1),(7,3,1),(8,4,1),(8,5,1),(8,7,-1),(9,6,1),(9,7,-1),(9,8,1);
And now the queries. XXXX is 7, 8, and 9 for the following results. Before each query, I show my expected results.
WITH RECURSIVE t AS (
SELECT v.variables, v.slope, v.intercept
FROM variables v
WHERE v.variables=XXXX
UNION ALL
SELECT v.variables, vhsv.sign*t.slope*v.slope slope, vhsv.sign*t.slope*v.intercept intercept
FROM t
INNER JOIN variables_has_sub_variables vhsv ON vhsv.variables=t.variables
INNER JOIN variables v ON v.variables=vhsv.sub_variables
)
SELECT variables, SUM(slope) constant FROM t GROUP BY variables
UNION SELECT 'intercept' variables, SUM(intercept) intercept FROM t;
Variable 7 Desired
+-----------+----------+
| variables | constant |
+-----------+----------+
| 1 | 0.50 |
| 2 | -2.13 |
| 3 | -1.71 |
| intercept | -0.6206 |
+-----------+----------+
Variable 7 Actual
+-----------+----------+
| variables | constant |
+-----------+----------+
| 1 | 0.50 |
| 2 | -2.13 |
| 3 | -1.71 |
| 7 | -0.61 |
| intercept | -0.61 |
+-----------+----------+
5 rows in set (0.00 sec)
Variable 8 Desired
+-----------+-----------+
| variables | constant |
+-----------+-----------+
| 1 | 0.17 |
| 2 | -0.72 |
| 3 | -0.58 |
| 4 | 1.17 |
| 5 | -1.40 |
| intercept | -3.355004 |
+-----------+-----------+
Variable 8 Actual
+-----------+----------+
| variables | constant |
+-----------+----------+
| 1 | 0.17 |
| 2 | -0.73 |
| 3 | -0.59 |
| 4 | 1.17 |
| 5 | -1.40 |
| 7 | -0.21 |
| 8 | -0.34 |
| intercept | -3.36 |
+-----------+----------+
8 rows in set (0.00 sec)
Variable 9 Desired
+-----------+------------+
| variables | constant |
+-----------+------------+
| 1 | -0.54 |
| 2 | 2.32 |
| 3 | 1.87 |
| 4 | 1.92 |
| 5 | -2.31 |
| 6 | -1.02 |
| intercept | -4.6982666 |
+-----------+------------+
Variable 9 Actual
+-----------+----------+
| variables | constant |
+-----------+----------+
| 1 | -0.55 |
| 2 | 2.33 |
| 3 | 1.88 |
| 4 | 1.92 |
| 5 | -2.30 |
| 6 | -1.02 |
| 7 | 0.67 |
| 8 | -0.56 |
| 9 | 1.65 |
| intercept | -4.67 |
+-----------+----------+
10 rows in set (0.00 sec)
All I need to do is detect which variables are not the root variables and filter them out. How should this be accomplished?
In response to JNevill's answer:
For v.variables of 9
+-----------+-------+-------+----------+
| variables | depth | path | constant |
+-----------+-------+-------+----------+
| 1 | 3 | 9>7>1 | -0.55 |
| 2 | 3 | 9>7>2 | 2.33 |
| 3 | 3 | 9>7>3 | 1.88 |
| 4 | 3 | 9>8>4 | 1.92 |
| 5 | 3 | 9>8>5 | -2.30 |
| 6 | 2 | 9>6 | -1.02 |
| 7 | 2 | 9>7 | 0.67 |
| 8 | 2 | 9>8 | -0.56 |
| 9 | 1 | 9 | 1.65 |
| intercept | 1 | 9 | -4.67 |
+-----------+-------+-------+----------+
10 rows in set (0.00 sec)
I'm not going to attempt to fully wrap my head around what you are doing, and I would agree with #RickJames up in the comments that this feels like maybe not the best use-case for a database. I too am a little obsessive though. I get it.
There are couple of things that I almost always track in a recursive CTE.
The "Path". If I'm going to let a query head down a rabbit hole, I want to know how it got to the end point. So I track a path so I know which primary key was selected through each iteration. In the recursive seed (top portion) I use something like SELECT CAST(id as varchar(500)) as path... and in the recursive member (bottom portion) I do something like recursiveCTE.path + '>' + id as path...
The "Depth". I want to know how deep the iterations went to get to the resulting record. This is tracked by adding SELECT 1 as depth to the recursive seed and recursiveCTE + 1 as depth to the recursive member. Now I know how deep each record is.
I believe number 2 will solve your issue:
WITH RECURSIVE t
AS (
SELECT v.variables,
v.slope,
v.intercept,
1 as depth
FROM variables v
WHERE v.variables = XXXX
UNION ALL
SELECT v.variables,
vhsv.sign * t.slope * v.slope slope,
vhsv.sign * t.slope * v.intercept intercept,
t.depth + 1
FROM t
INNER JOIN variables_has_sub_variables vhsv ON vhsv.variables = t.variables
INNER JOIN variables v ON v.variables = vhsv.sub_variables
)
SELECT variables,
SUM(slope) constant
FROM t
WHERE depth > 1
GROUP BY variables
UNION
SELECT 'intercept' variables,
SUM(intercept) intercept
FROM t;
The WHERE clause here will restrict records in your recursive result set that have a depth of 1, meaning they were brought in from the recursive seed portion of the recursive CTE (That they are a root).
It wasn't clear if you required that the root be removed from your second UNION of your t CTE. If so, the same logic applies; just toss that WHERE clause on to restrict depth records of 1
While it may not be helpful here, an example of your recursive cte with PATH would be:
WITH RECURSIVE t
AS (
SELECT v.variables,
v.slope,
v.intercept,
1 as depth,
CAST(v.variables as CHAR(30)) as path
FROM variables v
WHERE v.variables = XXXX
UNION ALL
SELECT v.variables,
vhsv.sign * t.slope * v.slope slope,
vhsv.sign * t.slope * v.intercept intercept,
t.depth + 1,
CONCAT(t.path,'>', v.variables)
FROM t
INNER JOIN variables_has_sub_variables vhsv ON vhsv.variables = t.variables
INNER JOIN variables v ON v.variables = vhsv.sub_variables
)
SELECT variables,
SUM(slope) constant
FROM t
WHERE depth > 1
GROUP BY variables
UNION
SELECT 'intercept' variables,
SUM(intercept) intercept
FROM t;

SQL: Complex query with subtraction from different cells

I have two tables and I want to combine their data.
The first table
+------------+-----+------+-------+
| BusinessID | Lat | Long | Stars |
+------------+-----+------+-------+
| abc123 | 32 | 74 | 4.5 |
| abd123 | 32 | 75 | 4 |
| abe123 | 33 | 76 | 3 |
+------------+-----+------+-------+
The second table is:
+------------+-----+------+-------+
| BusinessID | day | time | count |
+------------+-----+------+-------+
| abc123 | 1 | 14 | 5 |
| abc123 | 1 | 15 | 6 |
| abc123 | 2 | 13 | 1 |
| abd123 | 4 | 12 | 4 |
| abd123 | 4 | 13 | 8 |
| abd123 | 5 | 11 | 2 |
+------------+-----+------+-------+
So what I want to do is find all the Businesses that are in a specific radius and have more check ins in the next hour than the current.
So the results are
+------------+
| BusinessID |
+------------+
| abd123 |
| abc123 |
+------------+
Because they have more check-ins in the next hour than the previous (6 > 5, 8 > 4)
What is more it would be helpful if the results where ordered by their difference in check-ins number. Ex. ( 8 - 4 > 6 - 5 )
SELECT *
FROM table2 t2
WHERE t2.BusinessID IN (
SELECT t1.BusinessID
FROM table1 t1
WHERE earth_box(ll_to_earth(32, 74), 4000/1.609) #> ll_to_earth(Lat, Long)
ORDER by earth_distance(ll_to_earth(32, 74), ll_to_earth(Lat, Long)), stars DESC
) AND checkin_day = 1 AND checkin_time = 14;
From the above query I can find the businesses in a radius and then find their check-ins in the specified time. Ex. 14. What I need to do now is to find the number of check-ins in the 15 hour (of the same businesses) and find if the number of the check-ins is greater than it was in the previous time.
I think you want something like this:
SELECT
t1.BusinessID
FROM
table1 t1
JOIN
(SELECT
*,
"count" - LAG("count") OVER (PARTITION BY BusinessID, "day" ORDER BY "time") "grow"
FROM
table2
WHERE
/* Some condition on table2 */) t2
ON t1.BusinessID = t2.BusinessID AND t2.grow > 0
WHERE
/* Some condition on table1 */
ORDER BY
t2.grow DESC;

SQL moving aggregate SUM without partial results

Assume I have this schema (tested on postgresql) where the 'Scorelines' relation contains results of sport matches. (kickoff is a TIMESTAMP but replaced by INT for readability)
SQLFiddle here: http://sqlfiddle.com/#!12/52475/3
CREATE TABLE Scorelines (
team TEXT,
kickoff INT,
scored INT,
conceded INT
);
Now I want to produce another column 'three_matches_scored' that contains the sum of the points scored
over the 3 preceding game (determined by kickoff) of the same team. I have this:
SELECT team, kickoff, scored, conceded, SUM(scored) OVER three_matches AS three_matches_scored
FROM Scorelines
WINDOW three_matches AS
(PARTITION BY team ORDER BY kickoff
ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING)
ORDER BY kickoff;
This works beautifully so far, except that I get values starting from the second game. Example:
| TEAM | KICKOFF | SCORED | CONCEDED | THREE_MATCHES_SCORED |
|------|---------|--------|----------|----------------------|
| A | 1 | 1 | 0 | (null) |
| B | 2 | 1 | 1 | (null) |
| A | 3 | 1 | 1 | 1 |
| A | 4 | 3 | 0 | 2 |
| B | 4 | 1 | 4 | 1 |
| A | 6 | 0 | 2 | 5 |
| B | 6 | 4 | 2 | 2 |
| B | 8 | 1 | 2 | 6 |
| B | 10 | 1 | 1 | 6 |
| A | 11 | 2 | 1 | 4 |
I want the column 'three_matches_scored' to be (null) for the first 3 games because there are no 3 results to sum up. How can I achieve this?
I'd prefer simple understandable solutions, performance is not critical for this particular case.
My only idea right now, is to define a stored function SUM3, that results in (null) with less than 3 values to add up. But I never defined a function in SQL and can't seem to figure it out.
You can use a case statement to null the rows where there are less than 3 games:
SELECT team, kickoff, scored, conceded,
CASE WHEN COUNT(scored) OVER three_matches = 3
THEN SUM(scored) OVER three_matches
ELSE NULL
END AS three_matches_scored
FROM Scorelines
WINDOW three_matches AS
(PARTITION BY team ORDER BY kickoff
ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING)
ORDER BY kickoff;
Output:
team | kickoff | scored | conceded | three_matches_scored
------+---------+--------+----------+----------------------
A | 1 | 1 | 0 |
B | 2 | 1 | 1 |
A | 3 | 1 | 1 |
A | 4 | 3 | 0 |
B | 4 | 1 | 4 |
A | 6 | 0 | 2 | 5
B | 6 | 4 | 2 |
B | 8 | 1 | 2 | 6
B | 10 | 1 | 1 | 6
A | 11 | 2 | 1 | 4
(10 rows)
See harmics answer above.
(my first solution, just for reference)
Solution with user defined aggregate:
CREATE TYPE intermediate_sum AS (
sum INT,
count INT
);
CREATE FUNCTION sum_sfunc(intermediate_sum, INTEGER) RETURNS intermediate_sum AS
$$ SELECT $2 + $1.sum AS sum, $1.count - 1 AS count $$ LANGUAGE SQL;
CREATE FUNCTION sum_ffunc(intermediate_sum) RETURNS INTEGER AS
$$ SELECT (CASE WHEN $1.count > 1 THEN null
WHEN $1.count = 0 THEN $1.sum
END)
$$ LANGUAGE SQL;
CREATE AGGREGATE sum3(INTEGER) (
sfunc = sum_sfunc,
finalfunc = sum_ffunc,
stype = intermediate_sum,
initcond = '(0,3)'
);
The aggregate SUM3 wants at least 3 values, otherwise it returns (null). One can define other aggreates like SUM4 by changing the initcond, for example to '(0,4)'.