Is there a way to partition by incremental series in Postgressql? - sql

In postgressql is there a way to attain the result below by using partition by or any other way?
last_name year increment partition
Doe 2000 1 1
Doe 2001 2 1
Doe 2002 3 1
Doe 2003 -1 2
Doe 2004 1 3
Doe 2005 2 3
Doe 2006 3 3
Doe 2007 -1 4
Doe 2008 -2 4

SELECT last_name,
year,
increment,
SUM(CASE WHEN increment < 0 THEN 1 ELSE 0 END) OVER (PARTITION BY last_name ORDER BY year) AS partition
FROM your_table
ORDER BY last_name, year;

It seems that you want to group the consecutive positive/ negative values together, one option is to use a difference between two row_number functions, this will make the partition but with unordered group numbers.
select *,
row_number() over (partition by last_name order by year) -
row_number() over (partition by last_name,
case when increment>=0 then 1 else 2 end order by year) as prt
from tbl
order by last_name, year
If you want the partitions in order (1, 2, 3...) you could try another approach using lag and running sum as the following:
select last_name, year, increment,
1 + sum(case when sign(increment) <> sign(pre_inc) then 1 else 0 end) over
(partition by last_name order by year) as prt
from
(
select *,
lag(increment, 1 , increment) over
(partition by last_name order by year) pre_inc
from tbl
) t
order by last_name, year
See demo

If the increment column does encrease over the column year, it will be marked as 1; otherwise, it will be marked as 0. Then, we group the successive data using "LAG", regardless of whether the increment is positive or negative.
with cte as (
select * ,
row_number() over (partition by last_name order by year) as row_num,
case when increment >= LAG(increment,1,0) over (partition by last_name order by year)
then 1 else 0 end rank_num
from mytable
),
cte2 as (
select *, LAG(rank_num,1,1) over (partition by last_name order by year) as pre
from cte
order by year
)
select last_name, year, increment, 1+sum(case when pre <> rank_num then 1 else 0 end) over
(partition by last_name order by year) as partition
from cte2;

Related

Selecting rows that have row_number more than 1

I have a table as following (using bigquery):
id
year
month
sales
row_number
111
2020
11
1000
1
111
2020
12
2000
2
112
2020
11
3000
1
113
2020
11
1000
1
Is there a way in which I can select rows that have row numbers more than one?
For example, my desired output is:
id
year
month
sales
row_number
111
2020
11
1000
1
111
2020
12
2000
2
I don't want to just exclusively select rows with row_number = 2 but also row_number = 1 as well.
The original code block I used for the first table result is:
SELECT
id,
year,
month,
SUM(sales) AS sales,
ROW_NUMBER() OVER (PARTITIONY BY id ORDER BY id ASC) AS row_number
FROM
table
GROUP BY
id, year, month
You can use window functions:
select t.* except (cnt)
from (select t.*,
count(*) over (partition by id) as cnt
from t
) t
where cnt > 1;
As applied to your aggregation query:
SELECT iym.* EXCEPT (cnt)
FROM (SELECT id, year, month,
SUM(sales) as sales,
ROW_NUMBER() OVER (Partition by id ORDER BY id ASC) AS row_number
COUNT(*) OVER(Partition by id ORDER BY id ASC) AS cnt
FROM table
GROUP BY id, year, month
) iym
WHERE cnt > 1;
You can wrap your query as in below example
select * except(flag) from (
select *, countif(row_number > 1) over(partition by id) > 0 flag
from (YOUR_ORIGINAL_QUERY)
)
where flag
so it can look as
select * except(flag) from (
select *, countif(row_number > 1) over(partition by id) > 0 flag
from (
SELECT id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month
)
)
where flag
so when applied to sample data in your question - it will produce below output
Try this:
with tmp as (SELECT id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(Partition by id ORDER BY id ASC) AS row_number
FROM table
GROUP BY id, year, month)
select * from tmp a where exists ( select 1 from tmp b where a.id = b.id and b.row_number =2)
It's a so clearly exists statement SQL
This is what I use, it's similar to #ElapsedSoul answer but from my understanding for static list "IN" is better than using "EXISTS" but I'm not sure if the performance difference, if any, is significant:
Difference between EXISTS and IN in SQL?
WITH T1 AS
(
SELECT
id,
year,
month,
SUM(sales) as sales,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY id ASC) AS ROW_NUM
FROM table
GROUP BY id, year, month
)
SELECT *
FROM T1
WHERE id IN (SELECT id FROM T1 WHERE ROW_NUM > 1);

Convert number sequence format so that it is hyphenated

I have a sequence of numbers that need to be rendered with a hyphen but not sure how best to do this from the SQL database selection.
The expected result:
Peter: 1,3-7,10,11,13
Andrew: 1-3
Paul: 1-3
An example of the data from the table (small selection):
NAME #
Peter 1
Andrew 1
Paul 1
Andrew 2
Paul 2
Peter 3
Andrew 3
Paul 3
Peter 4
Peter 5
Peter 6
Peter 7
This is part gaps-and-islands and part string aggregation. This identifies the groupings:
select name,
(case when min(number) = max(number)
then convert(varchar(max), min(num))
else concat(min(number), '-', max(number))
end) as range
from (select name, number,
row_number() over (partition by name order by number) as seqnum
from t
) t
group by name, (number - seqnum);
With this you can add an additional level of aggregation to get the final result:
select name,
string_agg(range, ',') within group (order by min(min_number)) as col
from (select name, min(number) as min_number,
(case when min(number) = max(number)
then convert(varchar(max), min(num))
else concat(min(number), '-', max(number))
end) as range
from (select name, number,
row_number() over (partition by name order by number) as seqnum
from t
) t
group by name, (number - seqnum)
) n
group by name;

How to calculate unique rank in SQL Server (without any duplication)?

I want to calculate unique rankings but I get duplicate rankings
Here's my attempt:
SELECT
TG.EMPCODE,
DENSE_RANK() OVER (ORDER BY TS.COUNT_DEL DESC, TG.COUNT_TG DESC) AS YOUR_RANK
FROM
(SELECT
EmpCode,
SUM(CASE WHEN Tgenerate = 1 THEN 1 ELSE 0 END) AS COUNT_TG
FROM
TBLTGENERATE1
GROUP BY
EMPCODE) TG
INNER JOIN
(SELECT
EMP_CODE,
SUM(CASE WHEN STATUS = 'DELIVERED' THEN 1 ELSE 0 END) AS COUNT_DEL
FROM
TBLSTAT
GROUP BY
EMP_CODE) TS ON TG.EMPCODE = TS.EMP_CODE;
The output I get is like this:
EID Rank
---------
102 1
105 2
101 2
103 3
106 4
There is same rank for 105 and 101.
How do I calculate unique ranking?
Use ROW_NUMBER() instead of DENSE_RANK():
SELECT TG.EMPCODE,
ROW_NUMBER() OVER (ORDER BY TS.COUNT_DEL DESC, TG.COUNT_TG DESC) AS YOUR_RANK
Ties will then be given sequential rankings.

Sql query to Count Total Consecutive Years from latest year

I have a table Temp:
CREATE TABLE Temp
(
[ID] [int],
[Year] [INT],
)
**ID Year**
1 2016
1 2016
1 2015
1 2012
1 2011
1 2010
2 2016
2 2015
2 2014
2 2012
2 2011
2 2010
2 2009
3 2016
3 2015
3 2004
3 1999
4 2016
4 2015
4 2014
4 2010
5 2016
5 2014
5 2013
I want to calculate the total consecutive years starting from the most recent Year.
Result should look like this:
ID Total Consecutive Yrs
1 2
2 3
3 2
4 3
5 1
select ID,
-- returns a sequence without gaps for consecutive years
first_value(year) over (partition by ID order by year desc) - year +1 as x,
-- returns a sequence without gaps
row_number() over (partition by ID order by year desc) as rn
from Temp
e.g. for ID=1:
1 2016 1 1
1 2015 2 2
1 2012 5 3
1 2011 6 4
1 2010 7 5
As long as there's no gap, both sequences increase the same.
Now check for equal sequences and count the rows:
with cte as
(
select ID,
-- returns a sequence without gaps for consecutive years
first_value(year) over (partition by ID order by year desc) - year + 1 as x,
-- returns a sequence without gaps
row_number() over (partition by ID order by year desc) as rn
from Temp
)
select ID, count(*)
from cte
where x = rn -- no gap
group by ID
Edit:
Based on your year zero comment:
with cte as
(
select ID, year,
-- returns a sequence without gaps for consecutive years
first_value(year) over (partition by ID order by year desc) - year + 1 as x,
-- returns a sequence without gaps
row_number() over (partition by ID order by year desc) as rn
from Temp
)
select ID,
-- remove the year zero from counting
sum(case when year <> 0 then 1 else 0 end)
from cte
where x = rn
group by ID
You can use lead and get this counts as below:
Select top (1) with ties Id, RowN as [Total Consecutive Years] from (
Select *, Num = case when ([year]- lead(year) over(partition by Id order by [Year] desc) > 1) then 0 else 1 end
, RowN = Row_Number() over (partition by Id order by [Year] desc)
from temp
) a
where a.Num = 0
order by row_number() over(partition by Id order by RowN)
Output as below:
+----+-------------------------+
| Id | Total Consecutive Years |
+----+-------------------------+
| 1 | 2 |
| 2 | 3 |
| 3 | 2 |
| 4 | 3 |
| 5 | 1 |
+----+-------------------------+
You can do this using window functions:
select id, count(distinct year)
from (select t.*,
dense_rank() over (partition by id order by year + seqnum desc) as grp
from (select t.*,
dense_rank() over (partition by id order by year desc) as seqnum
from temp t
) t
) t
where grp = 1
group by id;
This assumes that "most recent year" is per id.
Gordon Linoff,
Your code is awesome!
Your code pulls consecutive years from the most recent year.
I modified it to pull overall max consecutive years.
Posted here in case anyone else needs it:
--overall max consecutive years
select id,max(yr_cnt) max_consecutive_years
from (
select id, grp,count(seqnum) yr_cnt
from (select t.*,
dense_rank() over (partition by id order by year + seqnum desc) as grp
from (select t.*,
dense_rank() over (partition by id order by year desc) as seqnum
from temp t
) t
) t
group by id,grp) t2
group by id;

Find date sequence in SQL Server

I'm trying to find the maximum sequence of days by customer in my data.
I want to understand what is the max sequence of days that specific customer made. If someone enter to my app in the 25/8/16 AND 26/08/16 AND 27/08/16 AND 01/09/16 AND 02/09/16 - The max sequence will be 3 days (25,26,27).
In the end (The output) I want to get two fields: custid | MaxDaySequence
I have the following fields in my data table:
custid | orderdate(timestemp)
For exmple:
custid orderdate
1 25/08/2007
1 03/10/2007
1 13/10/2007
1 15/01/2008
1 16/03/2008
1 09/04/2008
2 18/09/2006
2 08/08/2007
2 28/11/2007
2 04/03/2008
3 27/11/2006
3 15/04/2007
3 13/05/2007
3 19/06/2007
3 22/09/2007
3 25/09/2007
3 28/01/2008
I'm using SQL Server 2014.
Thanks
There is a trick, if you have an incrementing number ordered by your date then a subtracting that number of days from your dates will be the same if they are consecutive. So like this:
SELECT custid,
min(orderdate) as start_of_group,
max(orderdate) as end_of_group,
count(*) as num_days
FROM (
SELECT custid, orderdate
ROW_NUMBER() OVER (PARTITION BY custid ORDER BY orderdate) as rn
) x
GROUP BY custid, dateadd(day, - rn, orderdate);
You could take the result of this and pull out the max number of days to solve your problem:
SELECT custid, max(num_days) as longest
FROM (
SELECT custid,
count(*) as num_days
FROM (
SELECT custid, orderdate
ROW_NUMBER() OVER (PARTITION BY custid ORDER BY orderdate) as rn
) x
GROUP BY custid, dateadd(day, - rn, orderdate)
) y
GROUP BY custid
If you want to solve it with MySQL:
select user_id,max(num_days) as longest
from(
select user_id, count(*) as num_days
from
(
SELECT (CASE a1.user_id
WHEN #curType
THEN #curRow := #curRow + 1
ELSE #curRow := 1 AND #curType := a1.user_id END
) AS rank,
a1.user_id,
a1.last_update as dat
FROM (select a2.user_id,left(FROM_UNIXTIME(a2.last_update),10) as 'last_update'
from visits as a2 group by 1,2) as a1 ,
(SELECT #curRow := 0, #curType := '') r
ORDER BY a1.user_id DESC, dat) x
group by user_id, DATE_ADD(dat,INTERVAL -rank day)
) y
group by 1
order by longest desc