Sql assign unique key to groups having particular pattern - sql

Hi I was trying to group data based on a particular pattern.
I have a table with two column as below,
Name rollingsum
A 5
A 10
A 0
A 5
A 0
B 6
B 0
I need to generate a key column that increment only after rollingsum equals 0 is encountered.As given below
Name rollingsum key
A 5 1
A 10 1
A 0 1
A 5 2
A 0 2
B 6 3
B 0 3
I am using postgres, I tried to increment variable in case statement as below
Declare a int;
a:=1;
........etc
Case when rolling sum =0 then a:=a+1 else a end as key
But I am getting an error near :
Thanks in advance for all help

You need an ordering columns because the results depend on the ordering of the rows -- and SQL tables represent unordered sets.
Then do a cumulative sum of the 0 counts from the end of the data. That is in reverse order, so subtract that from the total:
select t.*,
(1 + sum( (rolling_sum = 0)::int ) over () -
sum( (rolling_sum = 0)::int ) over (order by ordercol desc)
) as key
from t;

Assuming that you have a column called id to order the rows, here is one option using a cumulative count and a window frame:
select name, rollingsum,
1 + count(*) filter(where rollingsum = 0) over(
order by id
rows between unbounded preceding and 1 preceding
) as key
from mytable
Demo on DB Fiddle:
name | rollingsum | key
:--- | ---------: | --:
A | 5 | 1
A | 10 | 1
A | 0 | 1
A | 5 | 2
A | 0 | 2
B | 6 | 3
B | 0 | 3

Related

Complex SQL query by pattern with timestamps

I have the following info in my SQLite database:
ID | timestamp | val
1 | 1577644027 | 0
2 | 1577644028 | 0
3 | 1577644029 | 1
4 | 1577644030 | 1
5 | 1577644031 | 2
6 | 1577644032 | 2
7 | 1577644033 | 3
8 | 1577644034 | 2
9 | 1577644035 | 1
10 | 1577644036 | 0
11 | 1577644037 | 1
12 | 1577644038 | 1
13 | 1577644039 | 1
14 | 1577644040 | 0
I want to perform a query that returns the elements that compose an episode. An episode is a set of ordered registers that comply the following requirements:
The first element is greater than zero.
The previous element of the first one is zero.
The last element is greater than zero.
The next element of the last one is zero.
The expected result of the query on this example would be something like this:
[
[{"id":3, tmstamp:1577644029, value:1}
{"id":4, tmstamp:1577644030, value:1}
{"id":5, tmstamp:1577644031, value:2}
{"id":6, tmstamp:1577644032, value:2}
{"id":7, tmstamp:1577644033, value:3}
{"id":8, tmstamp:1577644034, value:2}
{"id":9, tmstamp:1577644035, value:1}],
[{"id":11, tmstamp:1577644037, value:1}
{"id":12, tmstamp:1577644038, value:1}
{"id":13, tmstamp:1577644039, value:1}]
]
Currently, I am avoiding this query and I am using an auxiliary table to store the initial and end timestamp of episodes, but this is only because I do not know how to perform this query.
Threfore, my question is quite straightforward: does anyone know how can I perform this query in order to obtain something similar to the stated ouput?
This answer assumes that the "before" and "after" conditions are not really important. That is, an episode can be the first row in the table.
You can identify the episodes by counting the number of 0s before each row. Then filter out the 0 values:
select t.*,
dense_rank() over (order by grp) as episode
from (select t.*,
sum(case when val = 0 then 1 else 0 end) over (order by timestamp) as grp
from t
) t
where val <> 0;
If this is not the case, then lag() and lead() and a cumulative sum can handle the previous value being 0:
select t.*,
sum(case when prev_val = 0 and val > 0 then 1 else 0 end) over (order by timestamp) as episode
from (select t.*,
lag(val) over (order by timestamp) as prev_val,
lead(val) over (order by timestamp) as next_val
from t
) t
where val <> 0;
If you want the result as JSON objects then you must use the JSON1 Extension functions of SQLite:
with cte as (
select *, sum(val = 0) over (order by timestamp) grp
from tablename
)
select
json_group_array(
json_object('id', id, 'timestamp', timestamp, 'val', val)
) result
from cte
where val > 0
group by grp
See the demo.
Results:
| result |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [{"id":3,"timestamp":1577644029,"val":1},{"id":4,"timestamp":1577644030,"val":1},{"id":5,"timestamp":1577644031,"val":2},{"id":6,"timestamp":1577644032,"val":2},{"id":7,"timestamp":1577644033,"val":3},{"id":8,"timestamp":1577644034,"val":2},{"id":9,"timestamp":1577644035,"val":1}] |
| [{"id":11,"timestamp":1577644037,"val":1},{"id":12,"timestamp":1577644038,"val":1},{"id":13,"timestamp":1577644039,"val":1}] |

Unable to use lag function correctly in sql

I have created a table from multiple tables like this:
Week | Cid | CustId | L1
10 | 1 | 1 | 2
10 | 2 | 1 | 2
10 | 5 | 1 | 2
10 | 4 | 1 | 1
10 | 3 | 2 | 1
4 | 6 | 1 | 2
4 | 7 | 1 | 2
I want the output as:
Repeat
0
1
1
0
0
0
1
So, basically what I want is for each week, if a person (custid) comes in again with the same L1, then the value in the column Repeat should become 1, otherwise 0 ( so like, here, in row 2 & 3, custid 1, came with L1=2 again, so it will get 1 in column "Repeat", however in row 4, custid 1 came with L1=1, so it will get value as ).
By the way, the table isn't ordered (as I've shown).
I'm trying to do it as follows:
select t.*,
lag(0, 1, 0) over (partition by week, custid, L1 order by cid) as repeat
from
table;
But this is not giving the output and is giving empty result.
I think you need a case, but I would use row_number() for this:
select t.*,
(case when row_number() over (partition by week, custid, l1 order by cid) = 1
then 0 else 1
end) as repeat
from table;
This can also be computed without Window functions but by a self-join in the following way:
SELECT a.week, a.cid, a.custid, a.l1,
CASE WHEN b IS NULL THEN 1 ELSE 0 END AS repeat
FROM mytable a NATURAL LEFT JOIN
(SELECT week, min(cid) AS cid, custid, l1 FROM mytable
GROUP BY week,custid,l1) b
ORDER BY week DESC, custid, l1 DESC, cid;
It can be done simply by using an count(*) as analytic function. No case expression or self join needed. The query is even portable across databases that support analytic functions:
SELECT cust.*, least(count(*)
OVER (PARTITION BY Week, CustId, L1 ORDER BY Cid
ROWS UNBOUNDED PRECEDING) - 1, 1) repeat
FROM cust ORDER BY Week DESC, custId, L1 DESC;
Executing the query on your data results in the following output (last row is the repeat row):
Week | Cid | CustId | L1 | repeat
10 1 1 2 0
10 2 1 2 1
10 5 1 2 1
10 4 1 1 0
10 3 2 1 0
4 6 1 2 0
4 7 1 2 1
Tested on Oracle 11g and PostgreSQL 9.4. Note that the second ORDER BY is optional. See Oracle Language Reference, Analytic Functions for more details.

SQL advanced query counting the max value of a group

I want to create a query that will count the number of times the following condition is met.
I have a table that consists of multiple records with a matching foreign key. I want to check only for each of the foreign key groups if the highest value of another column of that key occurs more than once. If it does that will up the count.
Data
--------------------------
ID | Foreign Key | Value
--------------------------
1 | 1 | 1
2 | 1 | 2
3 | 1 | 2
4 | 2 | 0
5 | 2 | 2
6 | 2 | 1
7 | 3 | 0
8 | 3 | 1
9 | 3 | 1
The query I want should return the number 2. This is because the maximum value in group 1(Foreign Key) occurs twice, the value is 2. In group 2 the maximum value is 2 but only occurs once so this will not up the count. Then in group 3 the maximum value is 1 which occurs twice which will up the count. The count therefore ends up as two.
All credit goes to the comment from #Bob, but here is the sql that solved this problem.
SELECT Count(1)
FROM (SELECT DISTINCT foreign_key
FROM (SELECT foreign_key,
Count(1)
FROM data
WHERE ( foreign_key, value ) IN (SELECT foreign_key,
Max(value)
FROM data
GROUP BY foreign_key)
GROUP BY foreign_key
HAVING Count(1) > 1) AS data) AS data;
This is one approach:
select max(num_at_max)
from (select t.*, count(val) over(partition by fk) as num_at_max
from tbl t
join (select max(max_val_by_grp) as max_val_all_grps
from (select fk, max(val) as max_val_by_grp
from tbl
group by fk) x) x
on t.val = x.max_val_all_grps) x

How to calculate the value of a previous row from the count of another column

I want to create an additional column which calculates the value of a row from count column with its predecessor row from the sum column. Below is the query. I tried using ROLLUP but it does not serve the purpose.
select to_char(register_date,'YYYY-MM') as "registered_in_month"
,count(*) as Total_count
from CMSS.USERS_PROFILE a
where a.pcms_db != '*'
group by (to_char(register_date,'YYYY-MM'))
order by to_char(register_date,'YYYY-MM')
This is what i get
registered_in_month TOTAL_COUNT
-------------------------------------
2005-01 1
2005-02 3
2005-04 8
2005-06 4
But what I would like to display is below, including the months which have count as 0
registered_in_month TOTAL_COUNT SUM
------------------------------------------
2005-01 1 1
2005-02 3 4
2005-03 0 4
2005-04 8 12
2005-05 0 12
2005-06 4 16
To include missing months in your result, first you need to have complete list of months. To do that you should find the earliest and latest month and then use heirarchial
query to generate the complete list.
SQL Fiddle
with x(min_date, max_date) as (
select min(trunc(register_date,'month')),
max(trunc(register_date,'month'))
from users_profile
)
select add_months(min_date,level-1)
from x
connect by add_months(min_date,level-1) <= max_date;
Once you have all the months, you can outer join it to your table. To get the cumulative sum, simply add up the count using SUM as analytical function.
with x(min_date, max_date) as (
select min(trunc(register_date,'month')),
max(trunc(register_date,'month'))
from users_profile
),
y(all_months) as (
select add_months(min_date,level-1)
from x
connect by add_months(min_date,level-1) <= max_date
)
select to_char(a.all_months,'yyyy-mm') registered_in_month,
count(b.register_date) total_count,
sum(count(b.register_date)) over (order by a.all_months) "sum"
from y a left outer join users_profile b
on a.all_months = trunc(b.register_date,'month')
group by a.all_months
order by a.all_months;
Output:
| REGISTERED_IN_MONTH | TOTAL_COUNT | SUM |
|---------------------|-------------|-----|
| 2005-01 | 1 | 1 |
| 2005-02 | 3 | 4 |
| 2005-03 | 0 | 4 |
| 2005-04 | 8 | 12 |
| 2005-05 | 0 | 12 |
| 2005-06 | 4 | 16 |

Count rows in each 'partition' of a table

Disclaimer: I don't mean partition in the window function sense, nor table partitioning; I mean it in the more general sense, i.e. to divide up.
Here's a table:
id | y
----+------------
1 | 1
2 | 1
3 | 1
4 | 2
5 | 2
6 | null
7 | 2
8 | 2
9 | null
10 | null
I'd like to partition by checking equality on y, such that I end up with counts of the number of times each value of y appears contiguously, when sorted on id (i.e. in the order shown).
Here's the output I'm looking for:
y | count
-----+----------
1 | 3
2 | 2
null | 1
2 | 2
null | 2
So reading down the rows in that output we have:
The first partition of three 1's
The first partition of two 2's
The first partition of a null
The second partition of two 2's
The second partition of two nulls
Try:
SELECT y, count(*)
FROM (
SELECT y,
sum( xyz ) OVER (
ORDER BY id
rows between unbounded preceding
and current row
) qwe
FROM (
SELECT *,
case
when y is null and
lag(y) OVER ( ORDER BY id ) is null
then 0
when y = lag(y) OVER ( ORDER BY id )
then 0
else 1 end xyz
FROM table1
) alias
) alias
GROUP BY qwe, y
ORDER BY qwe;
demo: http://sqlfiddle.com/#!15/b1794/12