SQL create a conditional sequence - sql

I want to create a conditional int field called Sequence for each group of IDs.
The value 1 is given to Sequence for the first occurrence of a condition, otherwise increment the last count by 1. There are finite list of values for the field condition as illustrated below.
For a new group of ID, Sequence should initialise and start counting from 1.
ID
Date
Condition
Seq
01
01Jun14
AAAAAAAAA
1
01
02Jun14
AAAAAAAAA
2
01
03Jun14
BBBBBBBBB
1
01
04Jun14
BBBBBBBBB
2
01
05Jun14
AAAAAAAAA
3
01
06Jun14
BBBBBBBBB
3
01
07Jun14
EEEEEEEEE
1
02
01Jun14
AAAAAAAAA
1
02
02Jun14
CCCCCCCCC
1
02
03Jun14
CCCCCCCCC
2
02
04Jun14
BBBBBBBBB
1
02
05Jun14
AAAAAAAAA
2
02
06Jun14
BBBBBBBBB
2
03
01Jun14
FFFFFFFFF
1
03
02Jun14
AAAAAAAAA
1
03
03Jun14
AAAAAAAAA
2
03
04Jun14
CCCCCCCCC
1

As pointed out in the comments by Squirrel, this is a job for row number
Basically
Seq = ROW_NUMBER() OVER (PARTITION BY ID, CONDITION ORDER BY DATE ASC)
If you need to incorporate this into your code while Insert statement, you'd do something like this
DECLARE #Seq INT
SELECT #Seq= ROW_NUMBER() OVER (PARTITION BY ID, CONDITION ORDER BY DATE ASC) FROM <yourtable> WHERE ID=#ID AND CONDITION= #condition
INSERT INTO <yourtable> VALUES
(#ID, #date, #condition, ISNULL(#Seq,1))

You need to use partition by condition, id in your row_number so that the row_numbers start at 1 for each combination of condition and id.
I have created a table with your required numbering so as to check that the calulated values are what you need.
select
ID,
d "Date" ,
Condition,
Seq seq_required,
row_number() over (partition by condition, id order by d) seq_calculated
from t
order by id, d
GO
ID | Date | Condition | seq_required | seq_calculated
-: | :--------- | :-------- | -----------: | -------------:
1 | 2014-06-01 | AAAAAAAAA | 1 | 1
1 | 2014-06-02 | AAAAAAAAA | 2 | 2
1 | 2014-06-03 | BBBBBBBBB | 1 | 1
1 | 2014-06-04 | BBBBBBBBB | 2 | 2
1 | 2014-06-05 | AAAAAAAAA | 3 | 3
1 | 2014-06-06 | BBBBBBBBB | 3 | 3
1 | 2014-06-07 | EEEEEEEEE | 1 | 1
2 | 2014-06-01 | AAAAAAAAA | 1 | 1
2 | 2014-06-02 | CCCCCCCCC | 1 | 1
2 | 2014-06-03 | CCCCCCCCC | 2 | 2
2 | 2014-06-04 | BBBBBBBBB | 1 | 1
2 | 2014-06-05 | AAAAAAAAA | 2 | 2
2 | 2014-06-06 | BBBBBBBBB | 2 | 2
3 | 2014-06-01 | FFFFFFFFF | 1 | 1
3 | 2014-06-02 | AAAAAAAAA | 1 | 1
3 | 2014-06-03 | AAAAAAAAA | 2 | 2
3 | 2014-06-04 | CCCCCCCCC | 1 | 1
db<>fiddle here

I think dense_rank() should do the trick.
select
ID,
[Date],
Condition,
dense_RANK() OVER(PARTITION by ID ORDER BY [Date]) as Seq
from Yourtable

SELECT ID,Condition,COUNT(con) as seq FROM "your table name" GROUP BY con
SELECT ID,Condition,COUNT(con) as seq FROM "your table name" where ID=1 GROUP BY con
desc:
1- for counting all type of Conditional in table
2- for counting each type of Conditional in each group of IDs

Related

SQL Server Get all Birthday Years

I have a table in SQL Server that is Composed of
ID, B_Day
1, 1977-02-20
2, 2001-03-10
...
I want to add rows to this table for each year of a birthday, up to the current birthday year.
i.e:
ID, B_Day
1,1977-02-20
1,1978-02-20
1,1979-02-20
...
1,2020-02-20
2, 2001-03-10
2, 2002-03-10
...
2, 2019-03-10
I'm struggling to determine what the best strategy for accomplishing this. I thought about recursively self-joining, but that creates far too many layers. Any suggestions?
The following should work
with row_gen
as (select top 200 row_number() over(order by name)-1 as rnk
from master..spt_values
)
select a.id,a.b_day,dateadd(year,rnk,b_day) incr_b_day
from dbo.t a
join row_gen b
on dateadd(year,b.rnk,a.b_day)<=getdate()
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=0d06c95e1914ca45ca192d0d192bd2e0
You can use recursive approach :
with cte as (
select t.id, t.b_day, convert(date, getdate()) as mx_dt
from table t
union all
select c.id, dateadd(year, 1, c.b_day), c.mx_dt
from cte c
where dateadd(year, 1, c.b_day) < c.mx_dt
)
select c.id, c.b_day
from cte c
order by c.id, c.b_day;
Default recursion is 100, you can add query hint for more recursion option (maxrecursion 0).
If your dataset is not too big, one option is to use a recursive query:
with cte as (
select id, b_day bday0, b_day, 1 lvl from mytable
union all
select
id,
bday0,
dateadd(year, lvl, bday0), lvl + 1
from cte
where dateadd(year, lvl, bday0) <= getdate()
)
select id, b_day from cte order by id, b_day
Demo on DB Fiddle:
id | b_day
-: | :---------
1 | 1977-02-20
1 | 1978-02-20
1 | 1979-02-20
1 | 1980-02-20
1 | 1981-02-20
1 | 1982-02-20
1 | 1983-02-20
1 | 1984-02-20
1 | 1985-02-20
1 | 1986-02-20
1 | 1987-02-20
1 | 1988-02-20
1 | 1989-02-20
1 | 1990-02-20
1 | 1991-02-20
1 | 1992-02-20
1 | 1993-02-20
1 | 1994-02-20
1 | 1995-02-20
1 | 1996-02-20
1 | 1997-02-20
1 | 1998-02-20
1 | 1999-02-20
1 | 2000-02-20
1 | 2001-02-20
1 | 2002-02-20
1 | 2003-02-20
1 | 2004-02-20
1 | 2005-02-20
1 | 2006-02-20
1 | 2007-02-20
1 | 2008-02-20
1 | 2009-02-20
1 | 2010-02-20
1 | 2011-02-20
1 | 2012-02-20
1 | 2013-02-20
1 | 2014-02-20
1 | 2015-02-20
1 | 2016-02-20
1 | 2017-02-20
1 | 2018-02-20
1 | 2019-02-20
1 | 2020-02-20
2 | 2001-03-01
2 | 2002-03-01
2 | 2003-03-01
2 | 2004-03-01
2 | 2005-03-01
2 | 2006-03-01
2 | 2007-03-01
2 | 2008-03-01
2 | 2009-03-01
2 | 2010-03-01
2 | 2011-03-01
2 | 2012-03-01
2 | 2013-03-01
2 | 2014-03-01
2 | 2015-03-01
2 | 2016-03-01
2 | 2017-03-01
2 | 2018-03-01
2 | 2019-03-01
2 | 2020-03-01

bigquery get highest possible steps group by col

i have question about counting row number based on a column iteration
my table looks like this
time | steps | name
13:02 | 0 | a
13:03 | 0 | a
13:04 | 1 | a
13:05 | 0 | a
13:07 | 1 | a
13:10 | 1 | a
13:12 | 2 | a
13:04 | 0 | b
13:06 | 0 | b
13:12 | 1 | b
13:14 | 2 | b
13:19 | 3 | b
13:14 | 0 | b
13:19 | 3 | b
from table above i want to get the highest possible steps made by name. but must meet these condition:
steps made by name must be sequential(ex: 0,1,2,3 return 0,1,2,3; 0,1,2,4 return 0,1,2)
each step must be sequential according to time
Select any value if there are more than 1 record is possible(ex: 0,1,1,2 return 0,ANY(1,1),2)
table i looking for is
time | steps | name
13:05 | 0 | a
13:07 | 1 | a
13:12 | 2 | a
13:06 | 0 | b
13:12 | 1 | b
13:14 | 2 | b
13:19 | 3 | b
Is there any way i can do this in bigquery?
First remove duplicates. Then identify the rows where the "next" step (by time) is what you expect.
The following almost works:
select t.*
from (select min(time) as time, steps, name,
lead(steps) over (partition by name order by min(time)) as next_step
from yourtable t
group by steps, name
) t
where next_step = step + 1;
However, you want the minimum set. For that, you also need for the row number to match. It turns out that that condition is sufficient:
select t.*
from (select min(time) as time, steps, name,
row_number() over (partition by name order by min(time)) as seqnum
from yourtable t
group by steps, name
) t
where step = seqnum - 1;

SQL Create Identity Column Based On Other Table Values

Sql Server 2017
Is it possible to construct a column in SQL that acts as an Identity column for a clustered Primary Key (Value1, Value2, Value3)?
|--------+--------+--------+---------------|
| Value1 | Value2 | Value3 | Identity-ish? |
|--------+--------+--------+---------------|
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 2 |
|--------+--------+--------+---------------|
| 1 | 2 | 1 | 1 |
| 1 | 2 | 1 | 2 |
| 1 | 2 | 1 | 3 |
|--------+--------+--------+---------------|
| 2 | 5 | 3 | 1 |
| 2 | 5 | 3 | 2 |
|--------+--------+--------+---------------|
| 82 | 21 | 13 | 1 |
| 82 | 21 | 13 | 2 |
|--------+--------+--------+---------------|
Currently the way I am tackling this issue is by querying the table for the max(Identity-ish?) on a given PK, and incrementing it when inserting a new record.
However, the scale of the project has reached such a size where this method is getting called too frequently, and sometimes it gets called twice at the same time (causing two identical rows (Value1, Value2, Value3, Identity-ish?))
Ideally I would like to be able to declare Identity-ish? as an Identity column that gets its' values automatically assigned in the way I've shown above.
what you want is I think:
select * from
(
select Value1,Value2,Value3,
row_number ()over (partition by Value1,Value2,Value3 order by
Value1,Value2,Value3) as Identity-ish?
from tablename
)rnk
use ROW_NUMBER()function with partition for column
select t.*,
ROW_NUMBER()
OVER(partition by value1,value2,value3 ORDER BY value1,value2,value3 )
from t
value1 value2 value3 identity_is
1 1 1 1
1 1 1 2
1 2 1 1
1 2 1 2
1 2 1 3
here i put an example
http://www.sqlfiddle.com/#!18/c4b62/4

In Redshift, how do I run the opposite of a SUM function

Assuming I have a data table
date | user_id | user_last_name | order_id | is_new_session
------------+------------+----------------+-----------+---------------
2014-09-01 | A | B | 1 | t
2014-09-01 | A | B | 5 | f
2014-09-02 | A | B | 8 | t
2014-09-01 | B | B | 2 | t
2014-09-02 | B | test | 3 | t
2014-09-03 | B | test | 4 | t
2014-09-04 | B | test | 6 | t
2014-09-04 | B | test | 7 | f
2014-09-05 | B | test | 9 | t
2014-09-05 | B | test | 10 | f
I want to get another column in Redshift which basically assigns session numbers to each users session. It starts at 1 for the first record for each user and as you move further down, if it encounters a true in the "is_new_session" column, it increments. Stays the same if it encounters a false. If it hits a new user, the value resets to 1. The ideal output for this table would be:
1
1
2
1
2
3
4
4
5
5
In my mind it's kind of the opposite of a SUM(1) over (Partition BY user_id, is_new_session ORDER BY user_id, date ASC)
Any ideas?
Thanks!
I think you want an incremental sum:
select t.*,
sum(case when is_new_session then 1 else 0 end) over (partition by user_id order by date) as session_number
from t;
In Redshift, you might need the windowing clause:
select t.*,
sum(case when is_new_session then 1 else 0 end) over
(partition by user_id
order by date
rows between unbounded preceding and current row
) as session_number
from t;

How do I do multiple selection based on a flowchart of criteria?

Table name: Copies
+------------------------------------------------------------------------------------+
| group_id | my_id | previous | in_this | higher_value | most_recent |
+----------------------------------------------------------------------------------------------------------------
| 900 | 1 | null | Y | 7 | May16 |
| 900 | 2 | null | Y | 3 | Oct 16 |
| 900 | 3 | null | N | 9 | Oct 16 |
| 901 | 4 | 378 | Y | 3 | Oct 16 |
| 901 | 5 | null | N | 2 | Oct 16 |
| 902 | 6 | null | N | 5 | May16 |
| 902 | 7 | null | N | 9 | Oct 16 |
| 903 | 8 | null | Y | 3 | Oct 16 |
| 903 | 9 | null | Y | 3 | May16 |
| 904 | 10 | null | N | 0 | May 16 |
| 904 | 11 | null | N | 0 | May16
--------------------------------------------------------------------------------------
Output table
+---------------------------------------------------------------------------------------------------+
| group_id | my_id | previous | in_this | higher_value |most_recent|
+----------------------------------------------------------------------------------------------------
| 900 | 1 | null | Y | 7 | May16 |
| 902 | 7 | null | N | 9 | Oct 16 |
| 903 | 8 | null | Y | 3 | Oct 16 |
---------------------------------------------------------------------------------------------------------
Hi all, I need help with a query that returns one record within a group based on the importance of the field. The importance is ranked as follows:
previous- if one record within the group_id is not null, then neither record within a group_id is returned (because according to our rules, all records within a group should have the same previous value)
in_this- If one record is Y, and the other is N within a group_id, then we keep the Y; If all records are Y or all are N, then we move to the next attribute
Higher_value- If all records in the ‘in_this’ field are equal, then we need to select the record with the greater value from this field. If both records have an equal value, we move to the next attribute
Most_recent- If all records were of equal value in the ‘higher_value’ field, then we consider the newest record. If these are equal, then nothing is returned.
This is a simplified version of the table I am looking at, but I just would like to get the gist of how something like this would work. Basically, my table has multiple copies of records that have been grouped through some algorithm. I have been tasked with selecting which of these records within a group is the ‘good’ one, and we are basing this on these fields.
I’d like the output to actually show all fields, because I will likely attempt to refine the query to include other fields (there are over 40 to consider), but the most important is the group_id and my_id fields. It would be neat if we could also somehow flag why each record got picked, but that isn’t necessary.
It seems like something like this should be easy, but I have a hard time wrapping my head around how to pick from within a group_id. Thanks for your help.
You can use analytic functions for this. The trick is establishing the right variables for each condition:
select t.*
from (select t.*,
max(in_this) over (partition by group_id) as max_in_this,
min(higher_value) over (partition by group_id) as min_higher_value,
max(higher_value) over (partition by group_id) as max_higher_value,
row_number() over (partition by group_id, higher_value order by my_id) as seqnum_ghv,
min(most_recent) over (partition by group_id) as min_most_recent,
max(most_recent) over (partition by group_id) as max_most_recent,
row_number() over (partition by group_id order by most_recent) as seqnum_mr
from t
) t
where max_in_this is not null and
( (min_higher_value <> max_higher_value and seqnum_ghv = 1) or
(min_higher_value = max_higher_value and min_most_recent <> max_most_recent and seqnum_mr = 1
)
);
The third condition as stated makes no sense, but you should get the idea for how to implement this.