Using CTE to calculate cumulative sum

Using CTE to calculate cumulative sum - sql

I want to write a non-recursive common table expression (CTE) in postgres to calculate a cumulative sum, here's an example, input table:
----------------------
1 | A | 0 | -1
1 | B | 3 | 1
2 | A | 1 | 0
2 | B | 3 | 2
An output should look like this:
----------------------
1 | A | 0 | -1
1 | B | 3 | 1
2 | A | 1 | -1
2 | B | 6 | 3
As you can see the cumulative sum of columns 3 and 4 are calculated, this is easy to do using a recursive CTE, but how is it done with a non-recursive one?

Use window functions. Assuming that your table has columns col1, col2, col3 and col4, that would be:
select
t.*,
sum(col3) over(partition by col2 order by col1) col3,
sum(col4) over(partition by col2 order by col1) col4
from mytable t

You would use a window function for a cumulative sum. I don't see what the sum is in your example, but the syntax is something like:
select t.*, sum(x) over (order by y) as cumulative_sum
from t;
For your example, this would seem to be:
select t.*,
sum(col3) over (partition by col2 order by col1) as new_col3,
sum(col4) over (partition by col2 order by col1) as new_col4
from t;

Related

Get rows with maximum count per one column - while grouping by two columns

I'm trying to get max count of a field.
This is what I get and what I'm tried to do.
| col1 | col2 |
| A | B |
| A | B |
| A | D |
| A | D |
| A | D |
| C | F |
| C | G |
| C | F |
I'm trying to get the max count occurrences of col2, grouped by col1.
With this query I get the occurrences grouped by col1 and col2.
SELECT col1, col2, count(*) as conta
FROM tab
WHERE
GROUP by col1, col2
ORDER BY col1, col2
And I get:
| col1 | col2 | conta |
| A | B | 2 |
| A | D | 3 |
| C | F | 2 |
| C | G | 1 |
Then I used this query to get max of count:
SELECT max(conta) as conta2, col1
FROM (
SELECT col1, col2, count(*) as conta
FROM tab
WHERE
GROUP BY col1, col2
ORDER BY col1, col2
) AS derivedTable
GROUP BY col1
And I get:
| col1 | conta |
| A | 3 |
| C | 2 |
What I'm missing is the value of col2. I would like something like this:
| col1 | col2 | conta |
| A | D | 3 |
| C | F | 2 |
The problem is that if I try to select the col2 field, I get an error message, that I have to use this field in group by or aggregation function, but using it in the group by it's not the right way.

Simpler & faster (and correct):
SELECT DISTINCT ON (col1)
col1, col2, count(*) AS conta
FROM tab
GROUP BY col1, col2
ORDER BY col1, conta DESC;
db<>fiddle here (based on a_horse's fiddle)
DISTINCT ON is applied after aggregation, so we don't need a subquery or CTE. Consider the sequence of events in a SELECT query:
Best way to get result count before LIMIT was applied
Select first row in each GROUP BY group?

You can combine GROUP BY with a window function - which gets evaluated after the group by:
with cte as (
SELECT col1, col2,
count(*) as conta,
dense_rank() over (partition by col1 order by count(*) desc) as rnk
FROM tab
WHERE ...
GROUP by col1, col2
)
select col1, col2, conta
from cte
where rnk = 1
order by col1, col2;
This will return the combination of col1,col2 with the same highest max count twice. If you don't want that, use row_number() instead of dense_rank()
Online example

Possibly not the most elegant solution, but using a common table expression may help.
with cte as (
select col1, col2, count(*) as total
from dtable
group by col1, col2
)
select col1, col2, total
from cte c
where total = (select max(total)
from cte cc
where cc.col1 = c.col1)
order by col1 asc
Returns
col1|col2|total|
----+----+-----+
A | D | 3|
C | F | 2|
from the docs

I misunderstood the question. Here is your solution:
;with tablex as
(Select col1, col2, Count(col2) as Count From Your_Table Group by col1, col2),
aaaa as
(Select ROW_NUMBER() over (partition by col1 order by Count desc) as row, * From tablex)
Select * From aaaa Where row = 1

Using a window function:
select distinct on (col1) col1, col2, cnt
from
(
select col1, col2, count(*) over (partition by col1, col2) cnt
from the_table
) t
order by col1, cnt desc;
col1
col2
cnt
A
D
3
C
F
2
This solution does not solve cases with ties.

SQL group by without repeating fields same as excel pivot

What I need is the following.
Currently, it's repeating column names with the regular group by and sum.
| column 1 | column2 | column3 | sum |
|-------------|-------------|----------|-----|
|main product |sub product1 |subsub 1 | 500|
|main product |sub product1 |subsub 2 | 300|
|main product |sub product2 |subsub 1 | 300|
I want to get rid of repeating the same as excel pivot, so below, I need.
| column 1 | column2 | column3 | sum |
|-------------|-------------|----------|-----|
|main product |sub product1 |subsub 1 | 500|
|main product | |subsub 2 | 300|
|main product |sub product2 |subsub 1 | 300|
Can someone help me with this?
edit : formatted

We can approximate this behavior with the help of ROW_NUMBER:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col3) rn
FROM yourTable
)
SELECT col1, CASE WHEN rn = 1 THEN col2 ELSE '' END AS col2, col3, sum
FROM cte t
ORDER BY col1, t.col2;

You can use row_number() :
select col1,
case when row_number() over(partition by col1, col2 order by col3) = 1 then col2 else 0 end as col2,
col3, sum
from table t
group by col1, col2, col3;

Partition date on a new timestamp to get previous timestamp

I have a table which looks like:
col1 | col2 | col3 | timestamp
----------------------------------------
1 | 2 | 3 | 2020-1-16 16:11:10
----------------------------------------
1 | 2 | 3 | 2020-1-16 16:13:20
----------------------------------------
1 | 2 | 3 | 2020-1-24 09:29:24
I want to create another column which gives me the previous date but partitions it by day. Also if it has no previous date, then it should return the same date. It should look like this:
col1 | col2 | col3 | timestamp | prev_timestamp
------------------------------------------------------------
1 | 2 | 3 | 2020-1-16 16:11:10 | 2020-1-16 16:11:10
------------------------------------------------------------
1 | 2 | 3 | 2020-1-16 16:13:20 | 2020-1-16 16:11:10
-------------------------------------------------------------
1 | 2 | 3 | 2020-1-24 9:29:24 | 2020-1-24 09:29:24
I know i can use lag and partition by but then for the timestamp 2020-1-24 9:29:24 it gives me a previous timestamp of 2020-1-16 16:13:20 which i do not want.

You can do what you want just with lag():
select t.*,
lag(timestamp, 1, timestamp) over (partition by col1, col2, col3, date(timestamp)
order by timestamp
) as prev_timestamp
from t;
No conditional logic or subquery is necessary. You simply want to get the previous timestamp on the same day and lag() does that.

You can use lag() and conditional logic:
select
col1,
col2,
col3,
case when date_trunc(prev_timestamp) = date_trunc(timestamp)
then prev_timestamp
else timestamp
end prev_timestamp
from (
select
t.*,
lag(timestamp) over(partition by col1, col2, col3 order by timestamp) prev_timestamp
from mytable t
) t
You can remove the nested query by repeating the lag() expression like so:
select
t.*,
case when date_trunc(lag(timestamp) over(partition by col1, col2, col3 order by timestamp)) = date_trunc(timestamp)
then lag(timestamp) over(partition by col1, col2, col3 order by timestamp)
else timestamp
end prev_timestamp
from mytable t

In SQL is there a way to partition by a value if it's not continuous

I would like to do the rank the values over a partition with two columns. col1 will be the key and col2 will be some value that is also going to be used in ORDER BY. I would like to start a new partition only when col2 is discontinued. For example, I would like to do the following:
+------+------+------+
| col1 | col2 | rank |
+------+------+------+
| a | 1 | 1 |
| a | 2 | 2 |
| a | 3 | 3 |
| a | 9 | 1 |
| a | 10 | 2 |
| b | 1 | 1 |
| b | 2 | 2 |
| b | 8 | 1 |
+------+------+------+
Thinking somewhere in lines of
SELECT col1, RANK() OVER (PARTITION BY col1, SOMETHING HERE??? ORDER BY col2 DESC)
Does anyone have any ideas?

If I understand correctly, you want to enumerate by "islands" of adjoining sequential values. You can do so with a simple observation: subtracting a sequence from col2 will be constant for each group. So, let's use this observation:
select t.*,
row_number() over (partition by col1, grp order by col1) as rnk
from (select t.*,
(col2 - row_number() over (partition by col1 order by col2)) as grp
from t
) t

Trying to write a query that will display duplicates results as null

I have a table that looks like the first example.
I'm trying to write a MSSQL2012 statement that that will display results like the second example.
Basically I want null values instead of duplicate values in columns 1 and 2. This is for readability purposes during reporting.
This seems like it should be possible, but I'm drawing a blank. No amount of joins or unions I've written has rendered the results I need.
| Col1 | Col2 | Col3 |
+------+------+------+
| 1 | 2 | 4 |
| 1 | 2 | 5 |
| 1 | 3 | 6 |
| 1 | 3 | 7 |
+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| 1 | 2 | 4 |
| Null | null | 5 |
| null | 3 | 6 |
| null | null | 7 |
+------+------+------+

I would do this with no subqueries at all:
select (case when row_number() over (partition by col1 order by col2, col3) = 1
then col1
end) as col1,
(case when row_number() over (partition by col2 order by col3) = 1
then col2
end) as col2,
col3
from t
order by t.col1, t.col2, t.col3;
Note that the order by at the end of the query is very important. The result set that you want depends critically on the ordering of the rows. Without the order by, the result set could be in any order. So, the query might look like it works, and then suddenly fail one day or on a slightly different set of data.

Using a common table expression with row_number():
;with cte as (
select *
, rn_1 = row_number() over (partition by col1 order by col2, col3)
, rn_2 = row_number() over (partition by col1, col2 order by col3)
from t
)
select
col1 = case when rn_1 > 1 then null else col1 end
, col2 = case when rn_2 > 1 then null else col2 end
, col3
from cte
without the cte
select
col1 = case when rn_1 > 1 then null else col1 end
, col2 = case when rn_2 > 1 then null else col2 end
, col3
from (
select *
, rn_1 = row_number() over (partition by col1 order by col2, col3)
, rn_2 = row_number() over (partition by col1, col2 order by col3)
from t
) sub
rextester demo: http://rextester.com/UYA17142
returns:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 2 | 4 |
| NULL | NULL | 5 |
| NULL | 3 | 6 |
| NULL | NULL | 7 |
+------+------+------+

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using CTE to calculate cumulative sum - sql

Use window functions. Assuming that your table has columns col1, col2, col3 and col4, that would be: select t.*, sum(col3) over(partition by col2 order by col1) col3, sum(col4) over(partition by col2 order by col1) col4 from mytable t

Related

Get rows with maximum count per one column - while grouping by two columns

SQL group by without repeating fields same as excel pivot

Partition date on a new timestamp to get previous timestamp

In SQL is there a way to partition by a value if it's not continuous

Trying to write a query that will display duplicates results as null

Categories

Resources