Which window function is faster? - sql

count(*) OVER (PARTITION BY a, b ORDER BY a, b, c) * 10
This produces the same result as:
dense_rank() OVER (PARTITION BY a, b ORDER BY a, b, c) * 10
Used in a query like this:
SELECT
dense_rank() OVER (ORDER BY a, b) ,
a || b,
count(*) OVER (
PARTITION BY a, b
ORDER BY a, b, c
) * 10 ,
a2,
b1,
c1,
cc1,
c2,
FROM
join ....
ORDER BY 1, 6;
I'm happy with my query result.
But should I appreciate one approach over the other and why?

After PARTITION BY a, b there is no point in adding aor b to ORDER BY, like David commented.
So we simplify to:
count(*) OVER (PARTITION BY a, b ORDER BY c) * 10
dense_rank() OVER (PARTITION BY a, b ORDER BY c) * 10
These two only happen to be equivalent while c is UNIQUE. Else they are not.
You'd need to define exactly what the number is supposed to signify, and show your table definition, and the exact query because joins can introduce duplicates and NULL values.
row_numer() or rank() are similar window functions ...
Performance is practically the same for all of them.

Related

Select MAX(DateTime) returning multiple lines

I'm trying to select the last MAX(DateTime) status from the table "Zee" but if the DateTime is the same it returns two lines, and I would like to get only the last one (maybe last inserted?).
here is the query:
SELECT Z."ID" AS ID,Z."A" AS A,Z."B" AS B,Z."C" AS C,Z."D" AS D
FROM ZEE Z
INNER JOIN
(SELECT ID, A, B, MAX(C) AS C
FROM ZEE
GROUP BY A, B) groupedtt
ON Z.A = groupedtt.A
AND Z.B = groupedtt.B
AND Z.C = groupedtt.C
WHERE (
Z.B = 103
OR Z.B = 104
);
and the result:
Thanks,
Regards.
I usually use rank() for such things:
select Z."ID" AS ID,Z."A" AS A,Z."B" AS B,Z."C" AS C,Z."D" AS D
from (select Z.*, rank()over(partition by A,B order by C desc, rownum) r from ZEE Z
)Z where Z.r=1
Use the ROW_NUMBER() analytic function (you will also eliminate the self-join):
SELECT ID, A, B, C, D
FROM (
SELECT ID,
A,
B,
C,
D,
ROW_NUMBER() OVER ( PARTITION BY A, B ORDER BY C DESC ) As rn
FROM ZEE
)
WHERE rn = 1;

Cumulative sum over a table

What is the best way to perform a cumulative sum over a table in Postgres, in a way that can bring the best performance and flexibility in case more fields / columns are added to the table.
Table
a b d
1 59 15 181
2 16 268
3 219
4 102
Cumulative
a b d
1 59 15 181
2 31 449
3 668
4 770
You can use window functions, but you need additional logic to avoid values where there are NULLs:
SELECT id,
(case when a is not null then sum(a) OVER (ORDER BY id) end) as a,
(case when b is not null then sum(b) OVER (ORDER BY id) end) as b,
(case when d is not null then sum(d) OVER (ORDER BY id) end) as d
FROM table;
This assumes that the first column that specifies the ordering is called id.
Window functions for running sum.
SELECT sum(a) OVER (ORDER BY d) as "a",
sum(b) OVER (ORDER BY d) as "b",
sum(d) OVER (ORDER BY d) as "d"
FROM table;
If you have more than one running sum, make sure the orders are the same.
http://www.postgresql.org/docs/8.4/static/tutorial-window.html
http://www.postgresonline.com/journal/archives/119-Running-totals-and-sums-using-PostgreSQL-8.4-Windowing-functions.html
It's important to note that if you want your columns to appear as the aggregate table in your question (each field uniquely ordered), it'd be a little more involved.
Update: I've modified the query to do the required sorting, without a given common field.
SQL Fiddle: (1) Only Aggregates, or (2) Source Data Beside Running Sum
WITH
rcd AS (
select row_number() OVER() as num,a,b,d
from tbl
),
sorted_a AS (
select row_number() OVER(w1) as num, sum(a) over(w2) a
from tbl
window w1 as (order by a nulls last),
w2 as (order by a nulls first)
),
sorted_b AS (
select row_number() OVER(w1) as num, sum(b) over(w2) b
from tbl
window w1 as (order by b nulls last),
w2 as (order by b nulls first)
),
sorted_d AS (
select row_number() OVER(w1) as num, sum(d) over(w2) d
from tbl
window w1 as (order by d nulls last),
w2 as (order by d nulls first)
)
SELECT sorted_a.a, sorted_b.b, sorted_d.d
FROM rcd
JOIN sorted_a USING(num)
JOIN sorted_b USING(num)
JOIN sorted_d USING(num)
ORDER BY num;
I think what you are really looking for is this:
SELECT id
, sum(a) OVER (PARTITION BY a_grp ORDER BY id) as a
, sum(b) OVER (PARTITION BY b_grp ORDER BY id) as b
, sum(d) OVER (PARTITION BY d_grp ORDER BY id) as d
FROM (
SELECT *
, count(a IS NULL OR NULL) OVER (ORDER BY id) as a_grp
, count(b IS NULL OR NULL) OVER (ORDER BY id) as b_grp
, count(d IS NULL OR NULL) OVER (ORDER BY id) as d_grp
FROM tbl
) sub
ORDER BY id;
The expression count(col IS NULL OR NULL) OVER (ORDER BY id) forms groups of consecutive non-null rows for a, b and d in the subquery sub.
In the outer query we run cumulative sums per group. NULL values form their own group and stay NULL automatically. No additional CASE statement necessary.
SQL Fiddle (with some added values for column a to demonstrate the effect).

Select distinct rows from two fields

I have a table that has millions of records.
So I might have these columns
a, b, c, d
I need to select all the distinct records based on columns a and b.
But I need to select columns a, b, c and d not just a and b.
Can I do this?
edit
Data might be
1,1,frog,green
1,1,frog,brown
2,1,cat,black
2,4,dog,white
so i need;
1,1,frog,green
2,1,cat,black
2,4,dog,white
SQL Server supports Common Table Expression and Window Function. The query below uses ROW_NUMBER() which ranks the record according to group. It sorts by c ASC, d ASC (just play with it).
WITH records
AS
(
SELECT a, b, c, d,
ROW_NUMBER() OVER(PARTITION BY a, b ORDER BY c, d) rn
FROM TableName
)
SELECT a, b, c, d
FROM records
WHERE rn = 1
SQLFiddle Demo
TSQL Ranking Functions
partition by is your man
SELECT a, b, c, d FROM (
SELECT a, b, c, d, ROW_NUMBER() OVER (PARTITION BY a, b ORDER BY a, b) rn
FROM table
) sq
where rn = 1
Please try:
select *
From(
select
row_number() over (partition by a, b order by a, b) RNum,
*
from
YourTable
)x
where RNum=1
Sample
select * From(
select row_number() over (partition by a, b order by a, b) RNum, *
from(
select 1 a, 1 b, 'frog' c, 'green' d union all
select 1 a, 1 b, 'frog' c, 'brown' d union all
select 2 a, 1 b, 'cat' c, 'black' d union all
select 2 a, 4 b, 'dog' c, 'white' d)x
)y
where RNum=1

Order by newly selected column

I have a query like:
SELECT
R.*
FROM
(SELECT A, B,
(SELECT smth from another table) as C,
ROW_NUMBER() OVER (ORDER BY C DESC) AS RowNumber
FROM SomeTable) R
WHERE
RowNumber BETWEEN 10 AND 20
This gives me an error on ORDER BY C DESC.
I understand why this error is caused, so I've thought of adding another SELECT with ORDER BY and only than selecting rows from 10 to 20. But I don't think it's good to have 3 nested SELECT commands.
How else is it possible to select these rows?
A column cannot refer to an alias on same level, you have to table-derive it first, or use CTE.
SELECT
R.* , ROW_NUMBER() OVER (ORDER BY C DESC) AS RowNumber
FROM
(SELECT A, B, (SELECT smth from another table) as C
FROM SomeTable) R
-- WHERE
-- but you still cannot do this
-- RowNumber BETWEEN 10 AND 20
Need to do this:
select S.*
from
(
SELECT
R.* , ROW_NUMBER() OVER (ORDER BY C DESC) AS RowNumber
FROM
(SELECT A, B,
(SELECT smth from another table) as C
FROM SomeTable) R
) as s
where s.RowNumber between 10 and 20
To avoid deep nesting and to make it at least look pleasant, use CTE:
with R as
(
SELECT A, B, (SELECT smth from another table) as C
FROM SomeTable
)
,S AS
(
SELECT R.*, ROW_NUMBER() OVER (ORDER BY C DESC) AS RowNumber
FROM R
)
SELECT S.*
FROM S
WHERE S.RowNumber BETWEEN 1 AND 20
You cannot use aliased columns in the same SELECT, but you can wrap it into another select to make it work:
SELECT R.*
FROM (SELECT ABC.A, ABC.B, ABC.C, ROW_NUMBER() OVER (ORDER BY C DESC) AS RowNumber
FROM (SELECT A, B, (SELECT smth from another table) as C FROM SomeTable) ABC
) R
WHERE R.RowNumber BETWEEN 10 AND 20

SELECT SUM as field

Suppose i have this table
table (a,b,c,d). Datatypes are not important.
I want to do this
select a as a1,b as b1,c as c1,
(select sum(d) from table where a=a1 and b=b1) as total
from table
group by a,b,c
...but I can't find a way (sqldeveloper keeps complaining with "from clause not found".)
Is there a way? Is it possible?
SELECT a as a1,b as b1,c as c1,
(
SELECT SUM(d)
FROM mytable mi
WHERE mi.a = mo.a
AND mi.b= mo.b
) as total
FROM mytable mo
GROUP BY
a, b, c
It's much more simple and efficient to rewrite it as this:
SELECT a AS a1, B AS b1, c AS c1, SUM(SUM(d)) OVER (PARTITION BY a, b) AS total
FROM mytable
GROUP BY
a, b, c
Note the SUM(SUM(d)) here.
The innermost SUM is the aggregate function. It calculates the SUM(d) a-b-c-wise.
The outermost SUM is the analytic function. It sums the precalculated SUM(d)'s a-b-wise, and returns the value along with each row.
Du you mean something like this?
select a as a1,
b as b1,
c as c1,
sum(sum(d)) OVER (PARTITION BY a, b) AS total
from table
group by a,b,c
you can do it with aliases:
SELECT a AS a1, b AS b1, c AS c1,
(SELECT SUM(d)
FROM test_t t_in
WHERE t_in.a = t.a
AND t_in.b = t.b) AS total
FROM test_t t
GROUP BY a, b, c