ROW_NUMBER() with DISTINCT - sql

I've got a table of ticket assignments showing the different groups a ticket is transferred to before its resolved. Here is a simplified table:
asgn_grp | date | ticket_id
---------|--------|----------
A | 1-1-15 | 1
A | 1-2-15 | 1
B | 1-3-15 | 1
A | 1-1-15 | 2
C | 1-2-15 | 2
B | 1-3-15 | 2
C | 1-1-15 | 3
B | 1-2-15 | 3
I need to get a count of the second distinct group that a ticket was assigned to, meaning I want to know once a ticket is transferred out of the group its in, internal transfers don't count. So the second distinct group for ticket 1 is B, ticket 2 is C, ticket 3 is B. I need to get a count of these, so the end result I need is
asgn_grp | count
---------|-------
B | 2
C | 1
I've tried
SELECT distinct top 2 asgn_grp, ROW_NUMBER() OVER (ORDER BY date)
As my sub-query and pulling the second one out of that, but when I add the ROW_NUMBER() it messes up my distinct. If I pull the ROW_NUMBER() out of the sub-query, I have now way to order my values to ensure I get the second one after I DISTINCT the list.
Also, let me know if I was unclear about anything.

Instead of using distinct, try using group by twice.
select asgn_grp, count(*) from (
select * , row_number() over (partition by ticket_id order by min_date) rn
from (
select asgn_grp, ticket_id, min(date) min_date
from Table1 group by asgn_grp, ticket_id
) t1
) t2 where rn = 2
group by asgn_grp;
http://sqlfiddle.com/#!3/a0d1e
The derived table t1 contains every unique asgn_grp for each ticket_id along with the minimum date of each asgn_grp. For the sample data t1 has the following rows:
ASGN_GRP TICKET_ID MIN_DATE
A 1 January, 01 2015 00:00:00+0000
B 1 January, 03 2015 00:00:00+0000
A 2 January, 01 2015 00:00:00+0000
B 2 January, 03 2015 00:00:00+0000
C 2 January, 02 2015 00:00:00+0000
B 3 January, 02 2015 00:00:00+0000
C 3 January, 01 2015 00:00:00+0000
The outer query then uses row_number() to number each asgn_grp within a ticket_id by its min_date and generates the following for t2
ASGN_GRP TICKET_ID MIN_DATE RN
A 1 January, 01 2015 00:00:00+0000 1
B 1 January, 03 2015 00:00:00+0000 2
A 2 January, 01 2015 00:00:00+0000 1
C 2 January, 02 2015 00:00:00+0000 2
B 2 January, 03 2015 00:00:00+0000 3
C 3 January, 01 2015 00:00:00+0000 1
B 3 January, 02 2015 00:00:00+0000 2
This table is filtered for RN = 2 and is grouped by asgn_grp to get the count for each asgn_grp.

First, you need to identify groups of constant values of asgn_grp for each ticket. You can do that with a difference of row numbers.
Then, you need the ordering for each group. For that, use the minimum date in the group. Finally, you can rank these groups to get the second one, using dense_rank() on the date.
select asgn_grp, count(*)
from (select ticket_id, asgn_grp,
dense_rank() over (partition by ticket_id order by grpdate) as seqnum
from (select s.*, min(date) over (partition by ticket_id, asgn_grp, grp) as grpdate
from (select s.*,
(row_number() over (partition by ticket_id order by date) -
row_number() over (partition by ticket_id, asgn_grp order by date)
) as grp
from simplified s
) s
) s
) s
where seqnum = 2
group by asgn_grp;

If you need all assign groups with count zero for non-changed ones, use outer joins instead of inner joins
WITH TBL AS
(
SELECT A.*, ROW_NUMBER() OVER(PARTITION BY ticket_id ORDER BY asgn_grp) AS RN
FROM TABLE AS A
)
SELECT A.ASSN_GRP, COUNT(*) AS CNT
FROM TBL AS A
INNER JOIN TBL AS B
ON B.TICKET_ID = A.TICKET_ID
AND A.RN = B.RN + 1
GROUP BY A.ASSGN_GRP

As you want to know why using DISTINCT with ROW_NUMBER() changes your results:
You can see this question that is about differentioal between DISTINCT and GROUP BY.
And from that:
The GROUP BY query aggregates before it computes. The DISTINCT query computes before the aggregate.
So When you use ROW_NUMBER() -that is a scalar value- if query computes first you will have a unique field for ROW_NUMBER() results and then your DISTINCT will apply over it that in your result it will not find any duplicate row!
And for your results you can use this query
SELECT ticket_id, asgn_grp,
(SELECT COUNT([date]) FROM yourTable t WHERE t.asgn_grp = r.asgn_grp And t.ticket_id = r.ticket_id)
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ticket_id ORDER BY [date]) As ra
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ticket_id, asgn_grp ORDER BY [date] Desc) As rn
FROM yourTable) findingOldDates
WHERE rn = 1) r
WHERE ra = 2

Related

Alphanumeric sequence in SQL Server

For a single id column, we have sequence number 01, 02, 03 upto 99 repeating twice /thrice.
Example:
ID SEQ_NO
----------
2 01
2 02
2 03
.
.
.
2 99
2 01
2 02
2 99
We have a requirement to add AA prefix to second time when it is looping on seq_no, and for third time it should be BB.
Can anyone explain how to do this?
Try the following using the ROW_NUMBER function:
If you want only to select SEQ_NO as a new column:
WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID, SEQ_NO ORDER BY SEQ_NO) rn
FROM table_name
)
SELECT ID, SEQ_NO,
CASE
WHEN rn>1 THEN
CONCAT(CHAR(rn+63), CHAR(rn+63), SEQ_NO)
ELSE SEQ_NO
END AS new_seq
FROM CTE
WHERE rn <= 27
ORDER BY ID, new_seq
If you want to update the SEQ_NO column:
WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID, SEQ_NO ORDER BY SEQ_NO) rn
FROM table_name
)
UPDATE CTE SET SEQ_NO = CONCAT(CHAR(rn+63), CHAR(rn+63), SEQ_NO)
WHERE rn > 1 AND rn <= 27
See a demo with a set of data where seq (01 - 10) is repeated three times.

create table with 2 column with different conditions SQL

I have a table with this format:
Id_command, date_creat
01 01-01-2020
02 01-01-2021
03 01-11-2020
..
I would like to extract from a table a new table where the first table contain all the id_command where date_creat > 01-01-2020 and a second column where date_creat > 01-01-2021.
The expected result :
Id_command (date_creat > 01-01-2020) , id command(date_creat < 31-12-2020)
01 02
03
I got the idea to crate two differnt table, then outer_join, but i am not sure if we can do this with a simpler manner
Thanks
First select the relevant rows from the table and add a row number
select Id_command,
row_number() over (order by Id_command) as rn
from tab
where date_creat > DATE'2020-01-01'
ID_COMMAND RN
---------- ----------
2 1
3 2
Make the same for the second conditions.
Finally use those two subqueries and full outer join them using the row number.
with a as(
select Id_command,
row_number() over (order by Id_command) as rn
from tab
where date_creat > DATE'2020-01-01'
), b as (
select Id_command,
row_number() over (order by Id_command) as rn
from tab
where date_creat <= DATE'2020-01-01')
select a.Id_command, b.Id_command
from a
full outer join b
on a.rn = b.rn
order by 1,2
ID_COMMAND ID_COMMAND
---------- ----------
2 1
3

Conditional filter with row numbers

I have a sample code below of containing an ID, a Date, a Value, along with a row numbered that is partitioned by the ID holder and ordered by their meeting date:
SELECT
c.ID
,m.CONTACT_DATE
,d.TEST
,row_number() over(partition by c.ID
order by m.CONTACT_DATE desc
) [rn]
FROM COMMUNITY C
INNER JOIN MEETING m
ON c.ID = m.CONTACT_ID
LEFT JOIN DISCUSSION d
ON m.DISCUSSION_TEST = d.TEST
A sample of the results of running such a query would bring:
ID CONTACT_DATE TEST rn
01 2017-05-01 NULL 1
01 2017-04-01 1 2
01 2017-03-01 NULL 3
02 2017-08-01 NULL 1
02 2017-09-01 NULL 2
02 2017-10-01 1 3
03 2017-02-01 NULL 1
03 2017-01-01 NULL 2
What I'd like to do is group each of the IDs to get the most recent CONTACT_DATE (ie. Place in subquery T, then WHERE T.rn = 1 GROUP BY T.ID)
However, if there's a value under TEST, then instead I want to see the most recent CONTACT_DATE that has a value, like below:
ID CONTACT_DATE TEST rn
01 2017-04-01 1 2
02 2017-10-01 1 3
03 2017-02-01 NULL 1
What can I do to filter the most recent CONTACT_DATE that has a value under TEST, while still getting the most recent CONTACT_DATE if all values for that ID is NULL?
You can change your row_number ordering:
row_number() over(partition by c.ID
order by CASE WHEN d.TEST IS NOT NULL THEN 1 ELSE 2 END
, m.CONTACT_DATE desc
)

Sql query to Count Total Consecutive Years from latest year

I have a table Temp:
CREATE TABLE Temp
(
[ID] [int],
[Year] [INT],
)
**ID Year**
1 2016
1 2016
1 2015
1 2012
1 2011
1 2010
2 2016
2 2015
2 2014
2 2012
2 2011
2 2010
2 2009
3 2016
3 2015
3 2004
3 1999
4 2016
4 2015
4 2014
4 2010
5 2016
5 2014
5 2013
I want to calculate the total consecutive years starting from the most recent Year.
Result should look like this:
ID Total Consecutive Yrs
1 2
2 3
3 2
4 3
5 1
select ID,
-- returns a sequence without gaps for consecutive years
first_value(year) over (partition by ID order by year desc) - year +1 as x,
-- returns a sequence without gaps
row_number() over (partition by ID order by year desc) as rn
from Temp
e.g. for ID=1:
1 2016 1 1
1 2015 2 2
1 2012 5 3
1 2011 6 4
1 2010 7 5
As long as there's no gap, both sequences increase the same.
Now check for equal sequences and count the rows:
with cte as
(
select ID,
-- returns a sequence without gaps for consecutive years
first_value(year) over (partition by ID order by year desc) - year + 1 as x,
-- returns a sequence without gaps
row_number() over (partition by ID order by year desc) as rn
from Temp
)
select ID, count(*)
from cte
where x = rn -- no gap
group by ID
Edit:
Based on your year zero comment:
with cte as
(
select ID, year,
-- returns a sequence without gaps for consecutive years
first_value(year) over (partition by ID order by year desc) - year + 1 as x,
-- returns a sequence without gaps
row_number() over (partition by ID order by year desc) as rn
from Temp
)
select ID,
-- remove the year zero from counting
sum(case when year <> 0 then 1 else 0 end)
from cte
where x = rn
group by ID
You can use lead and get this counts as below:
Select top (1) with ties Id, RowN as [Total Consecutive Years] from (
Select *, Num = case when ([year]- lead(year) over(partition by Id order by [Year] desc) > 1) then 0 else 1 end
, RowN = Row_Number() over (partition by Id order by [Year] desc)
from temp
) a
where a.Num = 0
order by row_number() over(partition by Id order by RowN)
Output as below:
+----+-------------------------+
| Id | Total Consecutive Years |
+----+-------------------------+
| 1 | 2 |
| 2 | 3 |
| 3 | 2 |
| 4 | 3 |
| 5 | 1 |
+----+-------------------------+
You can do this using window functions:
select id, count(distinct year)
from (select t.*,
dense_rank() over (partition by id order by year + seqnum desc) as grp
from (select t.*,
dense_rank() over (partition by id order by year desc) as seqnum
from temp t
) t
) t
where grp = 1
group by id;
This assumes that "most recent year" is per id.
Gordon Linoff,
Your code is awesome!
Your code pulls consecutive years from the most recent year.
I modified it to pull overall max consecutive years.
Posted here in case anyone else needs it:
--overall max consecutive years
select id,max(yr_cnt) max_consecutive_years
from (
select id, grp,count(seqnum) yr_cnt
from (select t.*,
dense_rank() over (partition by id order by year + seqnum desc) as grp
from (select t.*,
dense_rank() over (partition by id order by year desc) as seqnum
from temp t
) t
) t
group by id,grp) t2
group by id;

SQL count changes in column

I need to count the changes in assigned group on a ticket. The problem is my log also count changes in assignee that are in the same group.
Here is some sample data
ticket_id | assigned_group | assignee | date
----------------------------------------------------
1001 | group A | john | 1-1-15
1001 | group A | michael | 1-2-15
1001 | group A | jacob | 1-3-15
1001 | group B | eddie | 1-4-15
1002 | group A | john | 1-1-15
1002 | group B | eddie | 1-2-15
1002 | group A | john | 1-3-15
1002 | group B | eddie | 1-4-15
1002 | group A | john | 1-5-15
I need this to return
ticket_id | count
--------------------
10001 | 2
10002 | 4
My query is like this
select ticket_id, assigned_group, count(*) from mytable group by ticket_id, assigned_group
But that gives me
ticket_id | count
--------------------
10001 | 4
10002 | 5
edit:
Also if I use
select ticket_id, count(Distinct assigned_group) as [Count] from mytable group by ticket_id
I only get
ticket_id | count
--------------------
10001 | 2
10002 | 2
Any advice?
Use Distinct Count to get the result
select ticket_id, count(Distinct assigned_group) as [Count]
from mytable
group by ticket_id
try this..
with temp as
(
select ticket_id, assigned_group, count(*) as count,date from mytable group by ticket_id, assigned_group,date
)
select ticket_id, count from temp
You can use Row_number() function to look into the next record's value.
with tbl as (select *, row_number() over(partition by ticket_id order by 1) from table)
select a.ticket_id, a.assigned_group, a.assignee_name, a.date,
count(case when a.assigned_group <> b.assigned_group then 1 else 0 end) as No_of_change
from tbl as a
left join tbl as b
on a.rn = b.rn + 1
If you are using SQL Server 2012, then you can use the LAG function to determine the previous assigned group easily. Then, if the previous assigned group is different from the current assigned group, you can increment the count, as below:
WITH previous_groups AS
(
SELECT
ticket_id,
assign_date,
assigned_group,
LAG(assigned_group, 1, NULL) OVER (PARTITION BY ticket_id ORDER BY assign_date) AS prev_assign_group
FROM mytable
)
SELECT
ticket_id,
SUM(CASE
WHEN assigned_group <> prev_assign_group THEN 1
ELSE 0
END) AS count
FROM previous_groups
WHERE prev_assign_group IS NOT NULL
GROUP BY ticket_id
ORDER BY ticket_id;
If you are using SQL Server 2008 or earlier versions, then you need an extra step to determine the previous assigned group, as below:
WITH previous_assign_dates AS
(
SELECT
mt1.ticket_id,
mt1.assign_date,
MAX(mt2.assign_date) AS prev_assign_date
FROM mytable mt1
LEFT JOIN mytable mt2
ON mt1.ticket_id = mt2.ticket_id
AND mt2.assign_date < mt1.assign_date
GROUP BY
mt1.ticket_id,
mt1.assign_date
),
previous_groups AS
(
SELECT
mt1.*,
mt2.assigned_group AS prev_assign_group
FROM mytable mt1
INNER JOIN previous_assign_dates pad
ON mt1.ticket_id = pad.ticket_id
AND mt1.assign_date = pad.assign_date
LEFT JOIN mytable mt2
ON pad.ticket_id = mt2.ticket_id
AND pad.prev_assign_date = mt2.assign_date
)
SELECT
ticket_id,
SUM(CASE
WHEN assigned_group <> prev_assign_group THEN 1
ELSE 0
END) AS count
FROM previous_groups
WHERE prev_assign_group IS NOT NULL
GROUP BY ticket_id
ORDER BY ticket_id;
SQL Fiddle demo
References:
The LAG function on MSDN
Adding an ordinal number within the ticket, then a self join where the group is different and consecutive ordinals, should work:
SELECT t1.ticket_id, COUNT(*) FROM
(SELECT *, ROW_NUMBER() OVER(PARTITION BY ticket_id ORDER BY date) ordinal
FROM mytable) t1
JOIN
(SELECT *, ROW_NUMBER() OVER(PARTITION BY ticket_id ORDER BY date) ordinal FROM nytable) t2
ON t1.ticket_id=t2.ticket_id AND t1.assigned_group<>t2.assigned_group AND t1.ordinal+1=t2.ordinal
GROUP BY t1.ticket_id