Find Customers With 4 Consecutive Years of Giving (Including Gaps) - sql

I have a table similar to below:
+------------+-----------+
| CustomerID | OrderYear |
+------------+-----------+
| 1 | 2012 |
| 1 | 2013 |
| 1 | 2014 |
| 1 | 2017 |
| 1 | 2018 |
| 2 | 2012 |
| 2 | 2013 |
| 2 | 2014 |
| 2 | 2015 |
| 2 | 2017 |
+------------+-----------+
How would I identify which CustomerIDs have 4 consecutive years of giving? (In the above, only customer 2.) As you can see, some records will have gaps in order years.
I started down the row of trying to utilize some combination of ROW_NUMBER/LAG/LEAD with no luck to this point.
Very paired down/modified attempt...
WITH CTE
AS
(
SELECT T.ConstituentLookupID,
T.FISCALYEAR,
COUNT(T.FISCALYEAR) OVER (PARTITION BY T.ConstituentLookupID) AS
YearCount,
FIRST_VALUE(T.FISCALYEAR) OVER(PARTITION BY T.ConstituentLookupID ORDER
BY T.FISCALYEAR DESC) - T.FISCALYEAR + 1 as X,
ROW_NUMBER() OVER(PARTITION BY T.ConstituentLookupID ORDER BY
T.FISCALYEAR DESC) AS RN
FROM #Temp AS T)
SELECT CTE.ConstituentLookupID,
CTE.FISCALYEAR,
CTE.YearCount,
CTE.X,
CTE.RN,
FROM CTE
WHERE CTE.YearCount >= 4 --Have at least 4 years of giving
AND CTE.X - CTE.RN = 1 --Some kind of way to calculate consecutive years. Doesnt account current year and gaps...;

Assuming no duplicates, you can use lag():
select distinct customerid
from (
select t.*,
lag(orderyear, 3) over(partition by customerid order by orderyear) oderyear3
from mytable t
) t
where orderyear = orderyear3 + 3
A more conventional approach is to use some gaps-and-islands technique. This is convenient if you want the start and end of each series. Here, an island is a series of rows with "adjacent" order years, and you want islands that are at least 4 years long. We can identify the islands by comparing the order year against an incrementing sequence, then use aggregation:
select customerid, min(orderyear) firstorderyear, max(orderyear) lastorderyear
from (
select t.*,
row_number() over(partition by customerid order by orderyear) rn
from mytable t
) t
group by customerid, orderyear - rn
having count(*) >= 4

Assuming you have no more than one row per customer and year, the simplest method is lag():
select customerid, year
from (select t.*,
lag(orderyear, 3) over (partition by customerid order by orderyear) as prev3_year
from t
) t
where prev3_year = year - 3;
The idea is to look 3 years back. If that year is year - 3, then there are four years in a row. If your data can have duplicates, there are tweaks to the logic (they make the query more slightly more complicated).
This could return duplicates, so you might just want:
select distinct customerid
from (select t.*,
lag(orderyear, 3) over (partition by customerid order by orderyear) as prev3_year
from t
) t
where prev3_year = year - 3;

I have a simple solution using row number and group by
SELECT Max(z.customerid),
Count(z.grp)
FROM (SELECT customerid,
orderyear,
orderyear - Row_number()
OVER (
ORDER BY customerid) AS Grp
FROM mytable)z
GROUP BY z.grp
HAVING Count(z.grp) = 4

Related

How to SELECT in SQL based on a value from the same table column?

I have the following table
| id | date | team |
|----|------------|------|
| 1 | 2019-01-05 | A |
| 2 | 2019-01-05 | A |
| 3 | 2019-01-01 | A |
| 4 | 2019-01-04 | B |
| 5 | 2019-01-01 | B |
How can I query the table to receive the most recent values for the teams?
For example, the result for the above table would be ids 1,2,4.
In this case, you can use window functions:
select t.*
from (select t.*, rank() over (partition by team order by date desc) as seqnum
from t
) t
where seqnum = 1;
In some databases a correlated subquery is faster with the right indexes (I haven't tested this with Postgres):
select t.*
from t
where t.date = (select max(t2.date) from t t2 where t2.team = t.team);
And if you wanted only one row per team, then the canonical answer is:
select distinct on (t.team) t.*
from t
order by t.team, t.date desc;
However, that doesn't work in this case because you want all rows from the most recent date.
If your dataset is large, consider the max analytic function in a subquery:
with cte as (
select
id, date, team,
max (date) over (partition by team) as max_date
from t
)
select id
from cte
where date = max_date
Notionally, max is O(n), so it should be pretty efficient. I don't pretend to know the actual implementation on PostgreSQL, but my guess is it's O(n).
One more possibility, generic:
select * from t join (select max(date) date,team from t
group by team) tt
using(date,team)
Window function is the best solution for you.
select id
from (
select team, id, rank() over (partition by team order by date desc) as row_num
from table
) t
where row_num = 1
That query will return this table:
| id |
|----|
| 1 |
| 2 |
| 4 |
If you to get it one row per team, you need to use array_agg function.
select team, array_agg(id) ids
from (
select team, id, rank() over (partition by team order by date desc) as row_num
from table
) t
where row_num = 1
group by team
That query will return this table:
| team | ids |
|------|--------|
| A | [1, 2] |
| B | [4] |

get the id based on condition in group by

I'm trying to create a sql query to merge rows where there are equal dates. the idea is to do this based on the highest amount of hours, so that i in the end gets the corresponding id for each date with the highest amount of hours. i've been trying to do with a simple group by, but does not seem to work, since i CANT just put a aggregate function on id column, since it should be based the hours condition
+------+-------+--------------------------------------+
| id | date | hours |
+------+-------+--------------------------------------+
| 1 | 2012-01-01 | 37 |
| 2 | 2012-01-01 | 10 |
| 3 | 2012-01-01 | 5 |
| 4 | 2012-01-02 | 37 |
+------+-------+--------------------------------------+
desired result
+------+-------+--------------------------------------+
| id | date | hours |
+------+-------+--------------------------------------+
| 1 | 2012-01-01 | 37 |
| 4 | 2012-01-02 | 37 |
+------+-------+--------------------------------------+
If you want exactly one row -- even if there are ties -- then use row_number():
select t.*
from (select t.*, row_number() over (partition by date order by hours desc) as seqnum
from t
) t
where seqnum = 1;
Ironically, both Postgres and Oracle (the original tags) have what I would consider to be better ways of doing this, but they are quite different.
Postgres:
select distinct on (date) t.*
from t
order by date, hours desc;
Oracle:
select date, max(hours) as hours,
max(id) keep (dense_rank first over order by hours desc) as id
from t
group by date;
Here's one approach using row_number:
select id, dt, hours
from (
select id, dt, hours, row_number() over (partition by dt order by hours desc) rn
from yourtable
) t
where rn = 1
You can use subquery with correlation approach :
select t.*
from table t
where id = (select t1.id
from table t1
where t1.date = t.date
order by t1.hours desc
limit 1);
In Oracle you can use fetch first 1 row only in subquery instead of LIMIT clause.

Retrieve rows in SQL based on Maximum value in multiple columns

I have a SQL table with the following fields:
Company ID
Company Name
Fiscal Year
Fiscal Quarter
There are multiple records for various fiscal years and fiscal quarters for each company. I want to retrieve the rows for each company based on Maximum Fiscal Year and Maximum Fiscal Quarter. For example, if the table has the following:
Company ID | Company Name | Fiscal Year | Fiscal Quarter
1 | Test1 | 2017 | 1
1 | Test1 | 2017 | 2
1 | Test1 | 2018 | 1
1 | Test1 | 2018 | 2
2 | Test2 | 2018 | 3
2 | Test2 | 2018 | 4
The query should return the following (Only the record with the maximum fiscal year and maximum fiscal quarter for that year):
Company ID | Company Name | Fiscal Year | Fiscal Quarter
1 | Test1 | 2018 | 2
2 | Test2 | 2018 | 4
I am able to use the below query to get the records with the maximum fiscal year but not sure how to further select the maximum quarter within the year:
SELECT fp.companyId, fp.companyname, fp.fiscalyear,fp.fiscalquarter
FROM dbo.ciqFinPeriod fp
LEFT OUTER JOIN dbo.ciqFinPeriod fp2
ON (fp.companyId = fp2.companyId AND fp.fiscalyear < fp2.fiscalyear)
WHERE fp2.companyId IS NULL
Thank you so much for any assistance!
If you have a list of companies, I would simply do:
select fp.*
from Companies c outer apply
(select top (1) fp.*
from dbo.ciqFinPeriod fp
where fp.companyId = c.companyId
order by fp.fiscalyear desc, fp.fiscalquarter desc
) fp;
If not, then row_number() is probably the simplest method:
select fp.*
from (select fp.*,
row_number() over (partition by fp.companyId order by order by fp.fiscalyear desc, fp.fiscalquarter desc) as seqnum
from dbo.ciqFinPeriod fp
) fp
where seqnum = 1;
Or the somewhat more abstruse (clever ?):
select top (1) with ties fp.*
from dbo.ciqFinPeriod fp
order by row_number() over (partition by fp.companyId order by order by fp.fiscalyear desc, fp.fiscalquarter desc)
I've had some success with the following, same output as you.
create table #table
(
CompanyID int,
CompanyName varchar(200),
Year int,
Quater int
)
insert into #table (CompanyID,CompanyName,Year,Quater)
VALUES
('1','Test1','2017','1'),
('1','Test1','2017','2'),
('1','Test1','2018','1'),
('1','Test1','2018','2'),
('2','Test2','2018','3'),
('2','Test2','2018','4')
SELECT CompanyID,CompanyName,Year,Quater
FROM
(
Select CompanyID,CompanyName,Year,Quater
, ROW_NUMBER() OVER(PARTITION BY CompanyID ORDER BY Year desc,Quater DESC)
as RowNum
from #table
) X WHERE RowNum = 1
drop table #table
Select Company I'd, company name,Max(year),Max(quarter) group by 1,2

Need to sum all Most Recent Rows from each Store that have ItemID

I have a table with, among other things, these columns: DateTransferred, ComputedQuantity, StoreID, ItemID
I have two goals. My simpler goal is to write a query where I feel in the ItemID and it sums up the ComputedQuantity where it matches that ItemID, only using the most recent DateTransferred for each StoreID. So with the following example data:
DateTransferred | StoreID | ItemID | ComputedQuantity
11/10/17 | 1 | 1 | 3 <
10/10/17 | 1 | 1 | 4
09/10/17 | 2 | 1 | 9 <
08/10/17 | 3 | 1 | 1 <
07/10/17 | 3 | 1 | 10
I would want it to pull every row with < next to it, as that's the most recent Date for that StoreID, and sum up to 13
My more complicated goal is that I would like to include the above-calculated value into a 'join' where I'm dealing with the Item table, so that I can pull all the items and join them with a new column which has the summed up ComputedQuantity
This is on SQL Server 10 on Windows Server 2008, if that matters
One simple method uses a correlated subquery:
select t.*
from t
where t. DateTransferred = (select max(t2.DateTransferred)
from t t2
where t2.storeid = t.storeid
);
Another even simpler method uses window functions:
select t.*
from (select t.*,
row_number() over (partition by storeid order by DateTransferred desc) as seqnum
from t
) t
where seqnum = 1;
In either case, you can add a where clause to the subquery if you want the most recent date on or before some given date (say a year ago).
Also, these both assume that your data has no future dates. If so, then add where DateTransferred < getdate().
The final statement which sums the ComputedQuantities:
select ItemID, SUM(ComputedQuantity) Quantity
from (select t.*,
row_number() over (partition by StoreID, ItemID order by DateTransferred DESC) as seqnum
from [db].[dbo].[InventoryTransferLog] t
) t
where seqnum = 1 and ComputedQuantity > 0
GROUP BY ItemID
ORDER BY ItemID
I decided not to sum values < 0

Rank function for date in Oracle SQL

I have the following code for example:
SELECT id, order_day, purchase_id FROM d
customer_id and purchase_id are unique. Each customer_id could have multiple purchase_id. Assume every one has made at least 5 orders.
Now, I just want to pull the first 5 purchase IDs of each customers ID (this depends on the earliest dates of purchases). I want the result to look like this:
id | purchase_id | rank
-------------------------
A | WERFEW43 | 1
A | ERTGDSFV | 3
A | FDGRT45 | 2
A | BRTE4TEW | 4
A | DFGDV | 5
B | DSFSF | 1
B | CF345 | 2
B | SDFSDFSDFS | 4
I thought of Ranking order_day, but my knowledge is not good enough to pull this off.
select id,purchase_id, rank() over (order by order_day)
from d
you also can try dense_rank() over (order by order_day) and row_number() over (order by order_day) and choose which one will be more suitable for you
select *
from
( SELECT
id
,order_day
,purchase_id
,row_number() -- ranking
over (partition by id -- each customer
order by order_day) as rn -- based on oldest dates
FROM d
) as dt
where rn <= 5