SQL - Identify consecutive numbers in a table - sql

Is there a way to flag consecutive numbers in an SQL table?
Based on the values in 'value_group_4' column, is it possible to tag continous values? This needs to be done within groups of each 'date_group_1'
I tried using row_numbers, rank, dense_rank but unable to come up with a foolproof way.

This has nothing to do with consecutiveness. You simply want to mark all rows where date_group_1 and value_group_4 are not unique.
One way:
select
mytable.*,
case when exists
(
select null
from mytable agg
where agg.date_group_1 = mytable.date_group_1
and agg.value_group_4 = mytable.value_group_4
group by agg.date_group_1, agg.value_group_4
having count(*) > 1
) then 1 else 0 end as flag
from mytable
order by date_group_1, value_group_4;
In a later version of SQL Server you'd use COUNT OVER instead.

SQL tables represent unordered sets. There is no such thing as consecutive values, unless a column specifies the ordering. Your data does not have such an obvious column, but I'll assume one exists and just call it id for convenience.
With such a column, lag()/lead() does what you want:
select t.*,
(case when lag(value_group_4) over (partition by data_group1 order by id) = value_group_4
then 1
when lead(value_group_4) over (partition by data_group1 order by id) = value_group_4
then 1
else 0
end) as flag
from t;
On close inspection, value_group_3 may do what you want. So you can use that for the id.

If your version of SQL Server doesn't have a full suite of windowing functions it should be still possible. This problem looks like a last-non-null problem which Itzik Ben-Gan has good example here... http://www.itprotoday.com/software-development/last-non-null-puzzle
Also, look at Mikael Eriksson's answer here which uses no windowing functions.

If the order of your data is determined by the date_group_1, value_group_3 column values, then why not make it as simple as the following query:
select
*,
rank() over(partition by date_group_1 order by value_group_3) - 1 value_group_3,
case
when count(*) over(partition by date_group_1, value_group_3) > 1 then 1
else 0
end expected_result
from data;
Output:
| date_group_1 | category_group_2 | value_group_3 | value_group_3 | expected_result |
+--------------+------------------+---------------+---------------+-----------------+
| 2018-01-11 | A | 15.3 | 0 | 0 |
| 2018-01-11 | B | 17.3 | 1 | 1 |
| 2018-01-11 | A | 17.3 | 1 | 1 |
| 2018-01-11 | B | 21 | 3 | 0 |
| 2018-01-22 | A | 15.3 | 0 | 0 |
| 2018-01-22 | B | 17.3 | 1 | 0 |
| 2018-01-22 | A | 21 | 2 | 0 |
| 2018-01-22 | B | 23 | 3 | 0 |
| 2018-03-13 | A | 15.3 | 0 | 0 |
| 2018-03-13 | B | 17.3 | 1 | 1 |
| 2018-03-13 | A | 17.3 | 1 | 1 |
| 2018-03-13 | B | 23 | 3 | 0 |
| 2018-05-15 | A | 6 | 0 | 0 |
| 2018-05-15 | B | 6.3 | 1 | 0 |
| 2018-05-15 | A | 15 | 2 | 0 |
| 2018-05-15 | B | 16.3 | 3 | 1 |
| 2018-05-15 | A | 16.3 | 3 | 1 |
| 2018-05-15 | B | 22 | 5 | 0 |
| 2019-05-04 | A | 0 | 0 | 0 |
| 2019-05-04 | B | 7 | 1 | 0 |
| 2019-05-04 | A | 15.3 | 2 | 0 |
| 2019-05-04 | B | 17.3 | 3 | 0 |
Test it online with SQL Fiddle.

Related

How do I conditionally increase the value of the proceeding row number by 1

I need to increase the value of the proceeding row number by 1. When the row encounters another condition I then need to reset the counter. This is probably easiest explained with an example:
+---------+------------+------------+-----------+----------------+
| Acct_ID | Ins_Date | Acct_RowID | indicator | Desired_Output |
+---------+------------+------------+-----------+----------------+
| 5841 | 07/11/2019 | 1 | 1 | 1 |
| 5841 | 08/11/2019 | 2 | 0 | 2 |
| 5841 | 09/11/2019 | 3 | 0 | 3 |
| 5841 | 10/11/2019 | 4 | 0 | 4 |
| 5841 | 11/11/2019 | 5 | 1 | 1 |
| 5841 | 12/11/2019 | 6 | 0 | 2 |
| 5841 | 13/11/2019 | 7 | 1 | 1 |
| 5841 | 14/11/2019 | 8 | 0 | 2 |
| 5841 | 15/11/2019 | 9 | 0 | 3 |
| 5841 | 16/11/2019 | 10 | 0 | 4 |
| 5841 | 17/11/2019 | 11 | 0 | 5 |
| 5841 | 18/11/2019 | 12 | 0 | 6 |
| 5132 | 11/03/2019 | 1 | 1 | 1 |
| 5132 | 12/03/2019 | 2 | 0 | 2 |
| 5132 | 13/03/2019 | 3 | 0 | 3 |
| 5132 | 14/03/2019 | 4 | 1 | 1 |
| 5132 | 15/03/2019 | 5 | 0 | 2 |
| 5132 | 16/03/2019 | 6 | 0 | 3 |
| 5132 | 17/03/2019 | 7 | 0 | 4 |
| 5132 | 18/03/2019 | 8 | 0 | 5 |
| 5132 | 19/03/2019 | 9 | 1 | 1 |
| 5132 | 20/03/2019 | 10 | 0 | 2 |
+---------+------------+------------+-----------+----------------+
The column I want to create is 'Desired_Output'. It can be seen from this table that I need to use the column 'indicator'. I want the following row to be n+1; unless the next row is 1. The counter needs to reset when the value 1 is encountered again.
I have tried to use a loop method of some sort but this did not produce the desired results.
Is this possible in some way?
The trick is to identify the group of consecutive rows starts from indicator 1 to the next 1. This is achieve by using the cross apply finding the Acct_RowID with indicator = 1 and use that as a Grp_RowID to use as partition by in the row_number() window function
select *,
Desired_Output = row_number() over (partition by t.Acct_ID, Grp_RowID
order by Acct_RowID)
from your_table t
cross apply
(
select Grp_RowID = max(Acct_RowID)
from your_table x
where x.Acct_ID = t.Acct_ID
and x.Acct_RowID <= t.Acct_RowID
and x.indicator = 1
) g

How to calculate date difference for all IDs in Microsoft SQL Server

How can I do to check if the difference between date 1 and date 2 for every ID is more than 6 months? Let me illustrate with an example.
So I have a table like this one:
+----+---------+
| ID | Y-M |
+----+---------+
| 1 | 2017-01 |
| 1 | 2017-02 |
| 1 | 2017-10 |
| 2 | 2017-04 |
| 2 | 2017-06 |
| 3 | 2017-06 |
| 4 | 2017-07 |
+----+---------+
And I want to add a third column that says yes if the difference between the first one and the second one is more than 6months. I want to say yes only on the first one. In case there is no date to compare with, then it would be also a yes. Anyway, this would be the final result:
+----+---------+------------+
| ID | Y-M | Difference |
+----+---------+------------+
| 1 | 2017-01 | No |
| 1 | 2017-02 | Yes |
| 1 | 2017-10 | Yes |
| 2 | 2017-04 | Yes |
| 2 | 2017-11 | No |
| 2 | 2017-12 | Yes |
| 3 | 2017-06 | Yes |
| 4 | 2017-07 | Yes |
+----+---------+------------+
Thank you!
You can use lead() and some date arithmetic:
select t.*,
(case when lead(ym) over (partition by id order by ym) < dateadd(month, 6, ym)
then 'No' else 'Yes'
end) as difference
from t;

SQL window excluding current group?

I'm trying to provide rolled up summaries of the following data including only the group in question as well as excluding the group. I think this can be done with a window function, but I'm having problems with getting the syntax down (in my case Hive SQL).
I want the following data to be aggregated
+------------+---------+--------+
| date | product | rating |
+------------+---------+--------+
| 2018-01-01 | A | 1 |
| 2018-01-02 | A | 3 |
| 2018-01-20 | A | 4 |
| 2018-01-27 | A | 5 |
| 2018-01-29 | A | 4 |
| 2018-02-01 | A | 5 |
| 2017-01-09 | B | NULL |
| 2017-01-12 | B | 3 |
| 2017-01-15 | B | 4 |
| 2017-01-28 | B | 4 |
| 2017-07-21 | B | 2 |
| 2017-09-21 | B | 5 |
| 2017-09-13 | C | 3 |
| 2017-09-14 | C | 4 |
| 2017-09-15 | C | 5 |
| 2017-09-16 | C | 5 |
| 2018-04-01 | C | 2 |
| 2018-01-13 | D | 1 |
| 2018-01-14 | D | 2 |
| 2018-01-24 | D | 3 |
| 2018-01-31 | D | 4 |
+------------+---------+--------+
Aggregated results:
+------+-------+---------+----+------------+------------------+----------+
| year | month | product | ct | avg_rating | avg_rating_other | other_ct |
+------+-------+---------+----+------------+------------------+----------+
| 2018 | 1 | A | 5 | 3.4 | 2.5 | 4 |
| 2018 | 2 | A | 1 | 5 | NULL | 0 |
| 2017 | 1 | B | 4 | 3.6666667 | NULL | 0 |
| 2017 | 7 | B | 1 | 2 | NULL | 0 |
| 2017 | 9 | B | 1 | 5 | 4.25 | 4 |
| 2017 | 9 | C | 4 | 4.25 | 5 | 1 |
| 2018 | 4 | C | 1 | 2 | NULL | 0 |
| 2018 | 1 | D | 4 | 2.5 | 3.4 | 5 |
+------+-------+---------+----+------------+------------------+----------+
I've also considered producing two aggregates, one with the product in question and one without, but having trouble with creating the appropriate joining key.
You can do:
select year(date), month(date), product,
count(*) as ct, avg(rating) as avg_rating,
sum(count(*)) over (partition by year(date), month(date)) - count(*) as ct_other,
((sum(sum(rating)) over (partition by year(date), month(date)) - sum(rating)) /
(sum(count(*)) over (partition by year(date), month(date)) - count(*))
) as avg_other
from t
group by year(date), month(date), product;
The rating for the "other" is a bit tricky. You need to add everything up and subtract out the current row -- and calculate the average by doing the sum divided by the count.

Making a partition query, reporting the first NOT NULL occurrence within partition before current row (if any)

I have a logins table which looks like this:
person_id | login_at | points_won
-----------+----------------+----------------------
1 | 2017-02-02 |
1 | 2017-02-01 |
2 | 2017-02-01 | 2
1 | 2017-01-29 | 2
2 | 2017-01-28 |
2 | 2017-01-25 | 1
3 | 2017-01-22 |
3 | 2017-01-21 |
1 | 2017-01-10 | 3
1 | 2017-01-01 | 1
I want to generate a result set containing a points_won column, which should work something like: For each row partition based on the person_id order the partition by login_at desc then report the first occurrence (not null) of last_points_won of the ordered rows in the partition (if any).
It should result in something like this:
person_id | login_at | points_won | last_points_won
-----------+----------------+----------------------+----------------------
1 | 2017-02-02 | | 2
1 | 2017-02-01 | | 2
2 | 2017-02-01 | 2 | 2
1 | 2017-01-29 | 2 | 2
2 | 2017-01-28 | | 1
2 | 2017-01-25 | 1 | 1
3 | 2017-01-22 | |
3 | 2017-01-21 | |
1 | 2017-01-10 | 3 | 3
1 | 2017-01-01 | 1 | 1
Or in plain words:
for each row, give me either the points won during this login OR if none, give
me the points won at the persons latest previous login, where he actually made some
points.
This could be achieved within a single window too, with the IGNORE NULLS option of the last_value() window function. But that's not supported in PostgreSQL yet. One alternative is the FILTER (WHERE ...) clause, but that will only work, when the window function is an aggregate function in the first place (which is not true for last_value(), but something similar could be created easily with CREATE AGGREGATE). To solve this with only built-in aggregates, you can use the array_agg() too:
SELECT (tbl).*,
all_points_won[array_upper(all_points_won, 1)] last_points_won
FROM (SELECT tbl,
array_agg(points_won)
FILTER (WHERE points_won IS NOT NULL)
OVER (PARTITION BY person_id ORDER BY login_at) all_points_won
FROM tbl) s
Note: the sub-query is not needed, if you create a dedicated last_agg() aggregate, like:
CREATE FUNCTION last_val(anyelement, anyelement)
RETURNS anyelement
LANGUAGE SQL
IMMUTABLE
CALLED ON NULL INPUT
AS 'SELECT $2';
CREATE AGGREGATE last_agg(anyelement) (
SFUNC = last_val,
STYPE = anyelement
);
SELECT tbl.*,
last_agg(points_won)
FILTER (WHERE points_won IS NOT NULL)
OVER (PARTITION BY person_id ORDER BY login_at) last_points_won
FROM tbl;
Rextester sample
Edit: once the IGNORE NULLS option will be supported on PostgreSQL, you can use the following query (which should work in Amazon Redshift too):
SELECT tbl.*,
last_value(points_won IGNORE NULLS)
OVER (PARTITION BY person_id ORDER BY login_at ROW BETWEEN UNBOUNCED PRECEDING AND CURRENT ROW) last_points_won
FROM tbl;
select *
,min(points_won) over
(
partition by person_id,group_id
) as last_points_won
from (select *
,count(points_won) over
(
partition by person_id
order by login_at
) as group_id
from mytable
) t
+-----------+------------+------------+----------+-----------------+
| person_id | login_at | points_won | group_id | last_points_won |
+-----------+------------+------------+----------+-----------------+
| 1 | 2017-01-01 | 1 | 1 | 1 |
+-----------+------------+------------+----------+-----------------+
| 1 | 2017-01-10 | 3 | 2 | 3 |
+-----------+------------+------------+----------+-----------------+
| 1 | 2017-01-29 | 2 | 3 | 2 |
+-----------+------------+------------+----------+-----------------+
| 1 | 2017-02-01 | (null) | 3 | 2 |
+-----------+------------+------------+----------+-----------------+
| 1 | 2017-02-02 | (null) | 3 | 2 |
+-----------+------------+------------+----------+-----------------+
| 2 | 2017-01-25 | 1 | 1 | 1 |
+-----------+------------+------------+----------+-----------------+
| 2 | 2017-01-28 | (null) | 1 | 1 |
+-----------+------------+------------+----------+-----------------+
| 2 | 2017-02-01 | 2 | 2 | 2 |
+-----------+------------+------------+----------+-----------------+
| 3 | 2017-01-21 | (null) | 0 | (null) |
+-----------+------------+------------+----------+-----------------+
| 3 | 2017-01-22 | (null) | 0 | (null) |
+-----------+------------+------------+----------+-----------------+

Alternation of positive and negative values

thank you for attention.
I have a table called "PROD_COST" with 5 fields
(ID,Duration,Cost,COST_NEXT,COST_CHANGE).
I need extra field called "groups" for aggregation.
Duration = number of days the price is valid (1 day=1row).
Cost = product price in this day.
Cost_next = lead(cost,1,0).
Cost_change = Cost_next - Cost.
Now i need to group by Cost_change. It can be
positive,negative or 0 values.
+----+---+------+------+------+
| 1 | 1 | 10 | 8,5 | -1,5 |
| 2 | 1 | 8,5 | 12,2 | 3,7 |
| 3 | 1 | 12,2 | 5,3 | -6,9 |
| 4 | 1 | 5,3 | 4,2 | 1,2 |
| 5 | 1 | 4,2 | 6,2 | 2 |
| 6 | 1 | 6,2 | 9,2 | 3 |
| 7 | 1 | 9,2 | 7,5 | -2,7 |
| 8 | 1 | 7,5 | 6,2 | -1,3 |
| 9 | 1 | 6,2 | 6,3 | 0,1 |
| 10 | 1 | 6,3 | 7,2 | 0,9 |
| 11 | 1 | 7,2 | 7,5 | 0,3 |
| 12 | 1 | 7,5 | 0 | 7,5 |
+----+---+------+------+------+`
I need to make a query, which will group it by first negative or positive value (+ - + - + -). Last one field is what i want.
Sorry for my English `
+----+---+------+------+------+---+
| 1 | 1 | 10 | 8,5 | -1,5 | 1 |
| 2 | 1 | 8,5 | 12,2 | 3,7 | 2 |
| 3 | 1 | 12,2 | 5,3 | -6,9 | 3 |
| 4 | 1 | 5,3 | 4,2 | 1,2 | 4 |
| 5 | 1 | 4,2 | 6,2 | 2 | 4 |
| 6 | 1 | 6,2 | 9,2 | 3 | 4 |
| 7 | 1 | 9,2 | 7,5 | -2,7 | 5 |
| 8 | 1 | 7,5 | 6,2 | -1,3 | 5 |
| 9 | 1 | 6,2 | 6,3 | 0,1 | 6 |
| 10 | 1 | 6,3 | 7,2 | 0,9 | 6 |
| 11 | 1 | 7,2 | 7,5 | 0,3 | 6 |
| 12 | 1 | 7,5 | 0 | 7,5 | 6 |
+----+---+------+------+------+---+`
If you're in SQL Server 2012 you can use the window functions to do this:
select
id, COST_CHANGE, sum(GRP) over (order by id asc) +1
from
(
select
*,
case when sign(COST_CHANGE) != sign(isnull(lag(COST_CHANGE)
over (order by id asc),COST_CHANGE)) then 1 else 0 end as GRP
from
PROD_COST
) X
Lag will get the value from previous row, check the sign of it and compare it to the current row. If the values don't match, the case will return 1. The outer select will do a running total of these numbers, and every time there is 1, it will increase the total.
It is possible to use the same logic with older versions too, you'll just have to fetch the previous row from the table using the id and do running total by re-calculating all rows before the current one.
Example in SQL Fiddle
James's answer is close but it doesn't handle the zero value correctly. This is a pretty easy modification. One tricky approximation uses differences between product changes:
select id, COST_CHANGE, sum(IsNewGroup) over (order by id asc) +1
from (select pc.*,
(case when sign(cost_change) - sign(lag(cost_change) over (order by id)) between -1 and 1
then 0
else 1 -- `NULL` intentionally goes here
end) IsNewGroup
from Prod_Cost
) pc
For clarity, here is a SQL Fiddle with zero values. From my understanding of the question, the OP only wants an actual sign change.
This may still not be correct. The OP simply is not clear about what to do about 0 values.