Fill zeros for missing values in range - sql

I'd like to list a set of numbers within a range with their respective counts,
but include a zero where the value does not appear a table row.
For example,
create table score (
n int
);
insert into score values (3);
insert into score values (1);
insert into score values (1);
insert into score values (5);
insert into score values (5);
insert into score values (5);
I can do
select n, count(n) from score
group by n order by n;
to give
n | count
---+-------
1 | 2
3 | 1
5 | 3
but instead I would like
n | count
---+-------
0 | 0
1 | 2
2 | 0
3 | 1
4 | 0
5 | 3
I'm aware of generate_series(0, max(n)), but I'm not sure how to progress from here.
I could do this programatically at the application level, but for my own education
I'd like to learn how I can do this directly using a postgres query.

You could use a query like this that uses a LEFT JOIN:
SELECT
series, COUNT(score.n)
FROM
generate_series(0, (SELECT max(n) FROM score)) series
LEFT JOIN score
ON series=score.n
GROUP BY
series
Please see fiddle here.

Related

get the nth-lowest value in a `group by` clause

Here's a tough one: I have data coming back in a temporary table foo in this form:
id n v
-- - -
1 3 1
1 3 10
1 3 100
1 3 201
1 3 300
2 1 13
2 1 21
2 1 300
4 2 1
4 2 7
4 2 19
4 2 21
4 2 300
8 1 11
Grouping by id, I need to get the row with the nth-lowest value for v based on the value in n. For example, for the group with an ID of 1, I need to get the row which has v equal to 100, since 100 is the third-lowest value for v.
Here's what the final results need to look like:
id n v
-- - -
1 3 100
2 1 13
4 2 7
8 1 11
Some notes about the data:
the number of rows for each ID may vary
n will always be the same for every row with a given ID
n for a given ID will never be greater than the number of rows with that ID
the data will already be sorted by id, then v
Bonus points if you can do it in generic SQL instead of oracle-specific stuff, but that's not a requirement (I suspect that rownum may factor prominently in any solutions). It has in my attempts, but I wind up confusing myself before I get a working solution.
I would use row_number function make row number the compare with n column value in CTE, do another CTE to make row number order by v desc.
get rn = 1 which is mean max value in the n number group.
CREATE TABLE foo(
id int,
n int,
v int
);
insert into foo values (1,3,1);
insert into foo values (1,3,10);
insert into foo values (1,3,100);
insert into foo values (1,3,201);
insert into foo values (1,3,300);
insert into foo values (2,1,13);
insert into foo values (2,1,21);
insert into foo values (2,1,300);
insert into foo values (4,2,1);
insert into foo values (4,2,7);
insert into foo values (4,2,19);
insert into foo values (4,2,21);
insert into foo values (4,2,300);
insert into foo values (8,1,11);
Query 1:
with cte as(
select id,n,v
from
(
select t.*, row_number() over(partition by id ,n order by n) as rn
from foo t
) t1
where rn <= n
), maxcte as (
select id,n,v, row_number() over(partition by id ,n order by v desc) rn
from cte
)
select id,n,v
from maxcte
where rn = 1
Results:
| ID | N | V |
|----|---|-----|
| 1 | 3 | 100 |
| 2 | 1 | 13 |
| 4 | 2 | 7 |
| 8 | 1 | 11 |
use window function
select * from
(
select t.*, row_number() over(partition by id ,n order by v) as rn
from foo t
) t1
where t1.rn=t1.n
as ops sample output just need 3rd highest value so i put where condition t1.rn=3 though accodring to description it would be t1.rn=t1.n
https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=65abf8d4101d2d1802c1a05ed82c9064
If your database is version 12.1 or higher then there is a much simpler solution:
SELECT DISTINCT ID, n, NTH_VALUE(v,n) OVER (PARTITION BY ID) AS v
FROM foo
ORDER BY ID;
| ID | N | V |
|----|---|-----|
| 1 | 3 | 100 |
| 2 | 1 | 13 |
| 4 | 2 | 7 |
| 8 | 1 | 11 |
Depending on your real data you may have to add an ORDER BY n clause and/or windowing_clause as RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, see NTH_VALUE

Juggling the values of a column in oracle

In a table tab I have a column with the name of col1 and it has 5 rows with values 1 to 5.
col1
1
2
3
4
5
Now I want to write a select query which will juggle the values in col1,distribute it and put those values in new column.
Below output will help you understand my requirement.
col1 New_col
1 3
2 5
3 4
4 1
5 2
Note: If 1 is changed to 3, then no other value in col1 after juggling should result in 3. i have to do it for 500 rows, i am taking a small example for better understanding.
Please let me know if you require further clarification.
This is a step by step approach:
Try it at SQL Fiddle
Oracle 11g R2 Schema Setup:
create table t ( i int );
insert into t values (1);
insert into t values (2);
insert into t values (3);
insert into t values (4);
insert into t values (5);
Step by step query:
with
/*add a random column to shuffle*/
a as
( select i, dbms_random.value as o
from t),
/*get last element to pair it with the first*/
b as
( select i,
o,
last_Value(i) over (ORDER BY o asc
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS i2
from a)
/*pair each element with the next one, take the last one as default*/
select i, LAG(i, 1, i2 ) OVER (ORDER BY o ) AS i3
from b
Results:
| I | I3 |
|---|----|
| 2 | 5 |
| 1 | 2 |
| 3 | 1 |
| 4 | 3 |
| 5 | 4 |
What about this?
SELECT row_number() over (order by 1) col, col1 new_col
FROM tab
ORDER BY DBMS_RANDOM.VALUE
demo

Is it possible that LEFT JOIN fails while subquery with NOT IN clause suceeds?

A while I have posted an answer to this question PostgreSQL multiple criteria statement.
Task was quite simple - select values from one table if there is no corresponding value in another table. Assuming we have tables like below:
CREATE TABLE first (foo numeric);
CREATE TABLE second (foo numeric);
we would like to get all the values from first.foo which doesn’t occur in the second.foo. I've proposed two solutions:
using LEFT JOIN
SELECT first.foo
FROM first
LEFT JOIN second
ON first.foo = second.foo
WHERE second.foo IS NULL;
combining subquery and IN operator:
SELECT first.foo
FROM first
WHERE first.foo NOT IN (
SELECT second.foo FROM second
);
For some reason the first wouldn't work (returned 0 rows) in the context of the OP and it has been bugging me since then. I've tried to reproduce that issue using different versions of PostgreSQL but no luck so far.
Is there any particular reason why the first solution would fail and the second worked as expected? Am I missing something obvious?
Here is sqlfiddle but it seems to work on any available platform.
Edit
Like #bma and #MostyMostacho pointed out in the comments it should be rather second one that returned no results (sqlfiddle).
As per your sql fiddle, your NOT IN query fails to return results because of the NULL in the second table.
The problem is that NULL means "UNKNOWN" and therefore we cannot say that the following expression is true: 10 not in (5, null).
The reason is what happens when 10 = NULL is compared. We get a NULL back, not a true. This means that a NULL in the NOT IN clause means that no rows will ever pass.
To get the second one to perform the way you expect you have a relatively convoluted query:
SELECT first.foo
FROM first
WHERE (first.foo IN (
SELECT second.foo FROM second
) IS NOT TRUE);
This will properly handle the NULL comparisons, but the join syntax is probably cleaner.
select values from one table if there is no corresponding value in another table. You just answered your own question:
SELECT o.value
FROM table_one o
WHERE NOT EXISTS (
SELECT *
FROM table_two t
WHERE t.value = o.value
);
A short demonstration:
CREATE TABLE first (foo numeric);
CREATE TABLE second (foo numeric);
INSERT INTO first VALUES (1);
INSERT INTO first VALUES (2);
INSERT INTO first VALUES (3);
INSERT INTO first VALUES (4);
INSERT INTO first VALUES (5);
INSERT INTO first VALUES (NULL); -- added this for completeness
INSERT INTO second VALUES (1);
INSERT INTO second VALUES (3);
INSERT INTO second VALUES (NULL);
SELECT f.foo AS ffoo, s.foo AS sfoo
-- these expressions all yield boolean values
, (f.foo = s.foo) AS is_equal
, (f.foo IN (SELECT foo FROM second)) AS is_in
, (f.foo NOT IN (SELECT foo FROM second)) AS is_not_in
, (EXISTS (SELECT * FROM second x WHERE x.foo = f.foo)) AS does_exist
, (NOT EXISTS (SELECT * FROM second x WHERE x.foo = f.foo)) AS does_not_exist
, (EXISTS (SELECT * FROM first x LEFT JOIN second y ON x.foo = y.foo
WHERE x.foo = f.foo AND y.foo IS NULL))
AS left_join_is_null
FROM first f
FULL JOIN second s ON (f.foo = s.foo AND (f.foo IS NOT NULL OR s.foo IS NOT NULL) )
;
Result:
CREATE TABLE
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
ffoo | sfoo | is_equal | is_in | is_not_in | does_exist | does_not_exist | left_join_is_null
------+------+----------+-------+-----------+------------+----------------+-------------------
1 | 1 | t | t | f | t | f | f
2 | | | | | f | t | t
3 | 3 | t | t | f | t | f | f
4 | | | | | f | t | t
5 | | | | | f | t | t
| | | | | f | t | f
| | | | | f | t | f
(7 rows)
As you can see, the boolean can be NULL for the IN() and equals cases.
It cannot be NULL for the EXISTS() case. To be or not to be.
The LEFT JOIN ... WHERE s.foo IS NULL is (almost) equivalent to the NOT EXISTS case, except that it actually includes second.* into the query results (which is not needed, in most cases)

Recursive SQL statement (Postgresql) - simplified version

This is simplified question for more complicated one posted here:
Recursive SQL statement (PostgreSQL 9.1.4)
Simplified question
Given you have upper triangular matrix stored in 3 columns (RowIndex, ColumnIndex, MatrixValue):
ColumnIndex
1 2 3 4 5
1 2 2 3 3 4
2 4 4 5 6 X
3 3 2 2 X X
4 2 1 X X X
5 1 X X X X
X values are to be calculated using the following algorithm:
M[i,j] = (M[i-1,j]+M[i,j-1])/2
(i= rows, j = columns, M=matrix)
Example:
M[3,4] = (M[2,4]+M[3,3])/2
M[3,5] = (m[2,5]+M[3,4])/2
The full required result is:
ColumnIndex
1 2 3 4 5
1 2 2 3 3 4
2 4 4 5 6 5
3 3 2 2 4 4.5
4 2 1 1.5 2.75 3.625
5 1 1 1.25 2.00 2.8125
Sample data:
create table matrix_data (
RowIndex integer,
ColumnIndex integer,
MatrixValue numeric);
insert into matrix_data values (1,1,2);
insert into matrix_data values (1,2,2);
insert into matrix_data values (1,3,3);
insert into matrix_data values (1,4,3);
insert into matrix_data values (1,5,4);
insert into matrix_data values (2,1,4);
insert into matrix_data values (2,2,4);
insert into matrix_data values (2,3,5);
insert into matrix_data values (2,4,6);
insert into matrix_data values (3,1,3);
insert into matrix_data values (3,2,2);
insert into matrix_data values (3,3,2);
insert into matrix_data values (4,1,2);
insert into matrix_data values (4,2,1);
insert into matrix_data values (5,1,1);
Can this be done?
Test setup:
CREATE TEMP TABLE matrix (
rowindex integer,
columnindex integer,
matrixvalue numeric);
INSERT INTO matrix VALUES
(1,1,2),(1,2,2),(1,3,3),(1,4,3),(1,5,4)
,(2,1,4),(2,2,4),(2,3,5),(2,4,6)
,(3,1,3),(3,2,2),(3,3,2)
,(4,1,2),(4,2,1)
,(5,1,1);
Run INSERTs in a LOOP with DO:
DO $$
BEGIN
FOR i IN 2 .. 5 LOOP
FOR j IN 7-i .. 5 LOOP
INSERT INTO matrix
VALUES (i,j, (
SELECT sum(matrixvalue)/2
FROM matrix
WHERE (rowindex, columnindex) IN ((i-1, j),(i, j-1))
));
END LOOP;
END LOOP;
END;
$$
See result:
SELECT * FROM matrix order BY 1,2;
This can be done in a single SQL select statement, but only because recursion is not necessary. I'll outline the solution. If you actually want the SQL code, let me know.
First, notice that the only items that contribute to the sums are along the diagonal. Now, if we follow the contribution of the value "4" in (1, 5), it contributes 4/2 to (2,5) and 4/4 to (3,5) and 4/8 to (4,5). Each time, the contribution is cut in half, because (a+b)/2 is (a/2 + b/2).
When we extend this, we start to see a pattern similar to Pascal's triangle. In fact, for any given point in the lower triangular matrix (below where you have values), you can find the diagonal elements that contribute to the value. Extend a vertical line up to hit the diagonal and a horizontal line to hit the diagonal. Those are the contributors from the diagonal row.
How much do they contribute? Well, for that we can go to Pascal's triangle. For the first diagonal below where we have values, the contributions are (1,1)/2. For the second diagonal, (1,2,1)/4. For the third, (1,3,3,1)/8 . . . and so on.
Fortunately, we can calculate the contributions for each value using a formula (the "choose" function from combinatorics). The power of 2 is easy. And, determining how far a given cell is from the diagonal is not too hard.
All of this can be combined into a single Postgres SQL statement. However, #Erwin's solution also works. I only want to put the effort into debugging the statement if his solution doesn't meet your needs.
... and here comes the recursive CTE with multiple embedded CTE's (tm):
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE matrix_data (
yyy integer,
xxx integer,
val numeric);
insert into matrix_data (yyy,xxx,val) values
(1,1,2) , (1,2,2) , (1,3,3) , (1,4,3) , (1,5,4)
, (2,1,4) , (2,2,4) , (2,3,5) , (2,4,6)
, (3,1,3) , (3,2,2) , (3,3,2)
, (4,1,2) , (4,2,1)
, (5,1,1)
;
WITH RECURSIVE rr AS (
WITH xx AS (
SELECT MIN(xxx) AS x0
, MAX(xxx) AS x1
FROM matrix_data
)
, mimax AS (
SELECT generate_series(xx.x0,xx.x1) AS xxx
FROM xx
)
, yy AS (
SELECT MIN(yyy) AS y0
, MAX(yyy) AS y1
FROM matrix_data
)
, mimay AS (
SELECT generate_series(yy.y0,yy.y1) AS yyy
FROM yy
)
, cart AS (
SELECT * FROM mimax mm
JOIN mimay my ON (1=1)
)
, empty AS (
SELECT * FROM cart ca
WHERE NOT EXISTS (
SELECT *
FROM matrix_data nx
WHERE nx.xxx = ca.xxx
AND nx.yyy = ca.yyy
)
)
, hot AS (
SELECT * FROM empty emp
WHERE EXISTS (
SELECT *
FROM matrix_data ex
WHERE ex.xxx = emp.xxx -1
AND ex.yyy = emp.yyy
)
AND EXISTS (
SELECT *
FROM matrix_data ex
WHERE ex.xxx = emp.xxx
AND ex.yyy = emp.yyy -1
)
)
-- UPDATE from here:
SELECT h.xxx,h.yyy, md.val / 2 AS val
FROM hot h
JOIN matrix_data md ON
(md.yyy = h.yyy AND md.xxx = h.xxx-1)
OR (md.yyy = h.yyy-1 AND md.xxx = h.xxx)
UNION ALL
SELECT e.xxx,e.yyy, r.val / 2 AS val
FROM empty e
JOIN rr r ON ( e.xxx = r.xxx+1 AND e.yyy = r.yyy)
OR ( e.xxx = r.xxx AND e.yyy = r.yyy+1 )
)
INSERT INTO matrix_data(yyy,xxx,val)
SELECT DISTINCT yyy,xxx
,SUM(val)
FROM rr
GROUP BY yyy,xxx
;
SELECT * FROM matrix_data
;
New result:
NOTICE: drop cascades to table tmp.matrix_data
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 15
INSERT 0 10
yyy | xxx | val
-----+-----+------------------------
1 | 1 | 2
1 | 2 | 2
1 | 3 | 3
1 | 4 | 3
1 | 5 | 4
2 | 1 | 4
2 | 2 | 4
2 | 3 | 5
2 | 4 | 6
3 | 1 | 3
3 | 2 | 2
3 | 3 | 2
4 | 1 | 2
4 | 2 | 1
5 | 1 | 1
2 | 5 | 5.0000000000000000
5 | 5 | 2.81250000000000000000
4 | 3 | 1.50000000000000000000
3 | 5 | 4.50000000000000000000
5 | 2 | 1.00000000000000000000
3 | 4 | 4.00000000000000000000
5 | 3 | 1.25000000000000000000
4 | 5 | 3.62500000000000000000
4 | 4 | 2.75000000000000000000
5 | 4 | 2.00000000000000000000
(25 rows)
while (select max(ColumnIndex+RowIndex) from matrix_data)<10
begin
insert matrix_data
select c1.RowIndex, c1.ColumnIndex+1, (c1.MatrixValue+c2.MatrixValue)/2
from matrix_data c1
inner join
matrix_data c2
on c1.ColumnIndex+1=c2.ColumnIndex and c1.RowIndex-1 = c2.RowIndex
where c1.RowIndex+c1.ColumnIndex=(select max(RowIndex+ColumnIndex) from matrix_data)
and c1.ColumnIndex<5
end

Postgresql: Insert the cartesian product of two or more sets

as definition: The cartesian product of two sets is the set of all possible pairs of these sets, so {A,B} x {a,b} = {(A,a),(A,b),(B,a),(B,b)}.
Now i want to insert such a cartesian product into a database table (each pair as a row). It is intended to fill the table with default values for each pair, so the data, i.e. the two sets, are not present in the database at this point.
Any idea how to achieve this with postgresql?
EDIT :
With the help of Grzegorz Szpetkowski's answer I was able to produce a query that does what I want to achieve, but it really isn't the prettiest one. Suppose I want to insert the cartesian product of the sets {1,2,3} and {'A','B','C'}.
INSERT INTO "Test"
SELECT * FROM
(SELECT 1 UNION SELECT 2 UNION SELECT 3) P
CROSS JOIN
(SELECT 'A' UNION SELECT 'B' UNION SELECT 'C') Q
Is there any better way to do this?
EDIT2 :
Accepted answer is fine, but i found another version which might be appropriate if it gets more complex:
CREATE TEMP TABLE "Numbers" (ID integer) ON COMMIT DROP;
CREATE TEMP TABLE "Chars" (Char character varying) ON COMMIT DROP;
INSERT INTO "Numbers" (ID) VALUES (1),(2),(3);
INSERT INTO "Chars" (Char) VALUES ('A'),('B'),('C');
INSERT INTO "Test"
SELECT * FROM
"Numbers"
CROSS JOIN
"Chars";
I am not sure if this really answers your question, but in PostgreSQL there is CROSS JOIN defined as:
For every possible combination of rows from T1 and T2 (i.e., a
Cartesian product), the joined table will contain a row consisting of
all columns in T1 followed by all columns in T2. If the tables have N
and M rows respectively, the joined table will have N * M rows.
FROM T1 CROSS JOIN T2 is equivalent to FROM T1, T2. It is also
equivalent to FROM T1 INNER JOIN T2 ON TRUE (see below).
EDIT:
One way is to use VALUES Lists (note that in fact you have no order, use ORDER BY clause to get some ordering):
SELECT N AS number, L AS letter FROM
(VALUES (1), (2), (3)) a(N)
CROSS JOIN
(VALUES ('A'), ('B'), ('C')) b(L);
Result:
number | letter
--------+--------
1 | A
1 | B
1 | C
2 | A
2 | B
2 | C
3 | A
3 | B
3 | C
(9 rows)
BTW:
For more numbers I believe it's handle to use generate_series function, e.g.:
SELECT n AS number, chr(ascii('A') + L - 1) AS letter
FROM
generate_series(1, 5) N
CROSS JOIN
generate_series(1, 5) L
ORDER BY N, L;
Result:
number | letter
--------+--------
1 | A
1 | B
1 | C
1 | D
1 | E
2 | A
2 | B
2 | C
2 | D
2 | E
3 | A
3 | B
3 | C
3 | D
3 | E
4 | A
4 | B
4 | C
4 | D
4 | E
5 | A
5 | B
5 | C
5 | D
5 | E
(25 rows)