How to create GROUP BY on min and max date - sql

I have a database table like this
emp_id start-date end_date title location
111 1-JAN-2000 31-DEC-2003 MANAGER NYO
111 1-JAN-2003 31-DEC-2005 MANAGER BOM
111 1-JAN-2006 31-DEC-2007 CFO NYO
111 1-JAN-2008 31-DEC-2015 MANAGER NYO
I have created a SQL code already with GROUP BY and min , max function
select emp_id,min(start_date),max(end_date),title
from table1
group by emp_id,title
What is expect is this:
111 1-JAN-2000 31-DEC-2005 MANAGER
111 1-JAN-2006 31-DEC-2007 CFO
111 1-JAN-2008 31-DEC-2015 MANAGER
What i am getting is:
111 1-JAN-2000 31-DEC-2015 MANAGER
111 1-JAN-2006 31-DEC-2007 CFO

This is a type of gaps-and-islands problem with date-chains. I would suggest using a left join to find where the islands start. Then a cumulative sum and aggregation:
select emp_id, title, min(start_date), max(end_date)
from (select t.*,
sum(case when tprev.emp_id is null then 1 else 0 end) over
(partition by t.emp_id, t.title order by t.start_date) as grouping
from t left join
t tprev
on t.emp_id = tprev.emp_id and
t.title = tprev.title and
t.start_date = tprev.end_date + 1
) t
group by grouping, emp_id, title;

try like below by using window function find the gap and make it the group
with cte1 as
(
select a.*,
row_number()over(partition by emp_id,title order by start-date) rn,
row_number() over(order by start-date) rn1
from table_name a
) select emp_id,
min(start-date),
max(end_date),
max(title)
from cte1 group by emp_id, rn1-rn
demo link

Related

SQL: An alternative to Group By approach using Partion By

I have a table in a DW system (say AWS SnowFlake):
UPC_CODE A_PRICE A_QTY DATE COMPANY_CODE A_CAT
1001 100.25 2 2021-05-06 1 PB
1001 2122.75 10 2021-05-01 1 PB
1002 212.75 5 2021-05-07 2 PT
1002 3100.75 10 2021-05-01 2 PB
What I am looking for is :
For each UPC_CODE and COMPANY_CODE the latest data should be picked up
So the resultant table should be like below:
UPC_CODE A_PRICE A_QTY DATE COMPANY_CODE A_CAT
1001 100.25 2 2021-05-06 1 PB
1002 212.75 5 2021-05-07 2 PT
Approach: Below SQL string
SELECT UPC_CODE,A_PRICE,A_QTY,MAX(DATE) AS F_DATE,COMPANY_CODE,A_CAT
FROM <table_name>
GROUP BY 1,2,3,5,6
Can I have an alternative approach using partionby()?
Your current GROUP BY query doesn't really do what you have in mind. One canonical approach here uses ROW_NUMBER:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY UPC_CODE, COMPANY_CODE ORDER BY DATE DESC) rn
FROM yourTable
)
SELECT UPC_CODE, A_PRICE, A_QTY, DATE, COMPANY_CODE, A_CAT
FROM cte
WHERE rn = 1;
If you did want to use a GROUP BY approach, here is one way to do that:
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT UPC_CODE, COMPANY_CODE, MAX(DATE) AS MAX_DATE
FROM yourTable
GROUP BY UPC_CODE, COMPANY_CODE
) t2
ON t2.UPC_CODE = t1.UPC_CODE AND
t2.COMPANY_CODE = t1.COMPANY_CODE AND
t2.MAX_DATE = t1.DATE;
In Snowflake (which your first line suggests), you would use QUALIFY:
SELECT UPC_CODE, A_PRICE, A_QTY, DATE AS F_DATE, COMPANY_CODE, A_CAT
FROM <table_name>
QUALIFY ROW_NUMBER() OVER (PARTITION BYUPC_CODE, A_PRICE, A_QTY, COMPANY_CODE, A_CAT
ORDER BY DATE DESC
) = 1;

First value in DATE minus 30 days SQL

I have bunch of data out of which I'm showing ID, max date and it's corresponding values (user id, type, ...). Then I need to take MAX date for each ID, substract 30 days and show first date and it's corresponding values within this date period.
Example:
ID Date Name
1 01.05.2018 AAA
1 21.04.2018 CCC
1 05.04.2018 BBB
1 28.03.2018 AAA
expected:
ID max_date max_name previous_date previous_name
1 01.05.2018 AAA 05.04.2018 BBB
I have working solution using subselects, but as I have quite huge WHERE part, refresh takes ages.
SUBSELECT looks like that:
(SELECT MIN(N.name)
FROM t1 N
WHERE N.ID = T.ID
AND (N.date < MAX(T.date) AND N.date >= (MAX(T.date)-30))
AND (...)) AS PreviousName
How'd you write the select?
I'm using TSQL
Thanks
I can do this with 2 CTEs to build up the dates and names.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE t1 (ID int, theDate date, theName varchar(10)) ;
INSERT INTO t1 (ID, theDate, theName)
VALUES
( 1,'2018-05-01','AAA' )
, ( 1,'2018-04-21','CCC' )
, ( 1,'2018-04-05','BBB' )
, ( 1,'2018-03-27','AAA' )
, ( 2,'2018-05-02','AAA' )
, ( 2,'2018-05-21','CCC' )
, ( 2,'2018-03-03','BBB' )
, ( 2,'2018-01-20','AAA' )
;
Main Query:
;WITH cte1 AS (
SELECT t1.ID, t1.theDate, t1.theName
, DATEADD(day,-30,t1.theDate) AS dMinus30
, ROW_NUMBER() OVER (PARTITION BY t1.ID ORDER BY t1.theDate DESC) AS rn
FROM t1
)
, cte2 AS (
SELECT c2.ID, c2.theDate, c2.theName
, ROW_NUMBER() OVER (PARTITION BY c2.ID ORDER BY c2.theDate) AS rn
, COUNT(*) OVER (PARTITION BY c2.ID) AS theCount
FROM cte1
INNER JOIN cte1 c2 ON cte1.ID = c2.ID
AND c2.theDate >= cte1.dMinus30
WHERE cte1.rn = 1
GROUP BY c2.ID, c2.theDate, c2.theName
)
SELECT cte1.ID, cte1.theDate AS max_date, cte1.theName AS max_name
, cte2.theDate AS previous_date, cte2.theName AS previous_name
, cte2.theCount
FROM cte1
INNER JOIN cte2 ON cte1.ID = cte2.ID
AND cte2.rn=1
WHERE cte1.rn = 1
Results:
| ID | max_date | max_name | previous_date | previous_name |
|----|------------|----------|---------------|---------------|
| 1 | 2018-05-01 | AAA | 2018-04-05 | BBB |
| 2 | 2018-05-21 | CCC | 2018-05-02 | AAA |
cte1 builds the list of max_date and max_name grouped by the ID and then using a ROW_NUMBER() window function to sort the groups by the dates to get the most recent date. cte2 joins back to this list to get all dates within the last 30 days of cte1's max date. Then it does essentially the same thing to get the last date. Then the outer query joins those two results together to get the columns needed while only selecting the most and least recent rows from each respectively.
I'm not sure how well it will scale with your data, but using the CTEs should optimize pretty well.
EDIT: For the additional requirement, I just added in another COUNT() window function to cte2.
I would do:
select id,
max(case when seqnum = 1 then date end) as max_date,
max(case when seqnum = 1 then name end) as max_name,
max(case when seqnum = 2 then date end) as prev_date,
max(case when seqnum = 2 then name end) as prev_name,
from (select e.*, row_number() over (partition by id order by date desc) as seqnum
from example e
) e
group by id;

Display result as group by count with max date?

I have below sample data, that I need to display results as count by group with max date.
REQUEST_NUMBER ASSIGNED_GROUP LAST_MODIFIED_DATE
001 GROUP A
001 GROUP B 2/2/2018
002 GROUP A
002 GROUP B 2/2/2018
002 GROUP C 2/3/2018
003 GROUP B
My expected result needs to be displayed as count of a group with max of last_modified_date only like:
ASSIGNED_GROUP TOTAL_COUNT
GROUP B 2
GROUP C 1
In my above example 001 was last assigned to GROUP B, 002 last assigned to GROUP C, and 003 is only 1 record with NULL last_modified_date, so remains with GROUP B.
I'm trying with just one result so far, but not getting proper results:
SELECT request_number, ASSIGNED_GROUP_NAME
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY request_number ORDER BY request_number) RNUM,
request_number, ASSIGNED_GROUP_NAME
FROM WORK_DETAIL
WHERE request_number = '3458112'
)
WHERE MAX(last_modified_date)
ORDER BY ASSIGNED_GROUP_NAME
Something like this could work
SELECT ASSIGNED_GROUP, COUNT(ASSIGNED_GROUP), MAX(LAST_MODIFIED_DATE) FROM YourTable
GROUP BY ASSIGNED_GROUP
You could use group by;
select t.assigned_group,t.last_modified_date,count(*) from table t inner join
(
select assigned_group,max(last_modified_date) as maxDate from table
where last_modified_date is not null
group by assigned_group
) t2
ON t.last_modified_date = t2.maxDate and t.assigned_group = t2.assigned_group
group by t.assigned_group,t.last_modified_date
You could use ajoin with a subquery with max_date group by assigned_group
select a.ASSIGNED_GROUP, count(*)
from my_table a
inner join(
select ASSIGNED_GROUP, max(LAST_MODIFIED_DATE) as max_date
from my_table
where LAST_MODIFIED_DATE is not null
group by ASSIGNED_GROUP
) t on t.max_date = a.LAST_MODIFIED_DATE and t.ASSIGNED_GROUP = a.ASSIGNED_GROUP
group by a.assigned_group

how to use same column twice with different criteria with one common column in sql

I have a table
ID P_ID Cost
1 101 1000
2 101 1050
3 101 1100
4 102 5000
5 102 2000
6 102 6000
7 103 3000
8 103 5000
9 103 4000
I want to use 'Cost' column twice to fetch first and last inserted value in cost corresponding to each P_ID
I want output as:
P_ID First_Cost Last_Cost
101 1000 1100
102 5000 6000
103 3000 4000
;WITH t AS
(
SELECT P_ID, Cost,
f = ROW_NUMBER() OVER (PARTITION BY P_ID ORDER BY ID),
l = ROW_NUMBER() OVER (PARTITION BY P_ID ORDER BY ID DESC)
FROM dbo.tablename
)
SELECT t.P_ID, t.Cost, t2.Cost
FROM t INNER JOIN t AS t2
ON t.P_ID = t2.P_ID
WHERE t.f = 1 AND t2.l = 1;
In 2012 you will be able to use FIRST_VALUE():
SELECT DISTINCT
P_ID,
FIRST_VALUE(Cost) OVER (PARTITION BY P_ID ORDER BY ID),
FIRST_VALUE(Cost) OVER (PARTITION BY P_ID ORDER BY ID DESC)
FROM dbo.tablename;
You get a slightly more favorable plan if you remove the DISTINCT and instead use ROW_NUMBER() with the same partitioning to eliminate multiple rows with the same P_ID:
;WITH t AS
(
SELECT
P_ID,
f = FIRST_VALUE(Cost) OVER (PARTITION BY P_ID ORDER BY ID),
l = FIRST_VALUE(Cost) OVER (PARTITION BY P_ID ORDER BY ID DESC),
r = ROW_NUMBER() OVER (PARTITION BY P_ID ORDER BY ID)
FROM dbo.tablename
)
SELECT P_ID, f, l FROM t WHERE r = 1;
Why not LAST_VALUE(), you ask? Well, it doesn't work like you might expect. For more details, see the comments under the documentation.
SELECT t.P_ID,
SUM(CASE WHEN ID = t.minID THEN Cost ELSE 0 END) as FirstCost,
SUM(CASE WHEN ID = t.maxID THEN Cost ELSE 0 END) as LastCost
FROM myTable
JOIN (
SELECT P_ID, MIN(ID) as minID, MAX(ID) as maxID
FROM myTable
GROUP BY P_ID) t ON myTable.ID IN (t.minID, t.maxID)
GROUP BY t.P_ID
Admittedly, #AaronBertrand's approach is cleaner here. However, this solution will work on older versions of SQL Server (that don't support CTE's or window functions), or on pretty much any other DBMS.
Do you want first and last in terms of Min and Max, or do you want which one was entered first and which one was entered last? If you want Min and max you can group by.
SELECT P_ID, MIN(Cost), MAX(Cost) FROM table_name GROUP BY P_ID
I believe this does your thing also, just without self joins or subqueries:
SELECT DISTINCT
P_ID
,MIN(Cost) OVER (PARTITION BY P_ID) as FirstCost
,MAX(Cost) OVER (PARTITION BY P_ID) as LastCost
FROM Table

Taking the Largest SUM from a table

I'm trying to get the Employee with the highest sales
Employee DeptNo Date Sales
Chris 2 2012/1/1 1000
Joe 1 2012/1/1 900
Arthur 3 2012/1/1 1100
Chris 2 2012/3/1 1200
Joe 1 2012/2/1 1500
Arthur 3 2010/2/1 1200
Joe 1 2010/3/1 900
Arthur 3 2010/3/1 1100
Arthur 3 2010/4/1 1200
Joe 1 2012/4/1 1500
Chris 2 2010/4/1 1800
I've tried using two subqueries, and then comparing them together to find the higher value
SELECT c1.Employee,
c1.TOTAL_SALES
FROM (SELECT Employee,
Sum(sales) AS TOTAL_SALES
FROM EmployeeSales
GROUP BY Employee) c1,
(SELECT Employee,
Sum(sales) AS TOTAL_SALES
FROM EmployeeSales
GROUP BY Employee) c2
WHERE ( c1.TOTAL_SALES > c2.TOTAL_SALES
AND c1.Employee > c2.Employee )
But the resulting query gives me two rows of
Employee TOTAL_SALES
joe 4800
joe 4800
What am I doing wrong?
I would use a CTE.
;With [CTE] as (
Select
[Employee]
,sum([Sales]) as [Total_Sales]
,Row_Number()
Over(order by sum([sales]) Desc) as [RN]
From [EmployeeSales]
Group by [Employee]
)
Select
[Employee]
,[Total_Sales]
From [CTE]
Where [RN] = 1
Example of working code SQL Fiddle:
http://sqlfiddle.com/#!3/bd772/2
To return all employees with the highest total sales, you can use SQL Server's proprietary TOP WITH TIES:
SELECT TOP (1) WITH TIES name, SUM(sales) as total_sales
FROM employees
GROUP BY name
ORDER BY SUM(sales) DESC
SELECT name, SUM(sales) as total_sales
FROM employees
GROUP BY name
ORDER by total_sales DESC
LIMIT 1;
A better solution is to group by an employee id so we are sure they are the same person. Since there can be two Chris's.
I would use a window partition
select * from
(
select
employee
, sum(sales) as sales
, row_number() over
(
order by sum(sales) desc
) as rank
from EmployeeSales
group by employee
) tmp
where tmp.rank = 1
And I agree with what someone said (Shawn) about having an employeeID and group by that for this, rather than the name.
(I removed the partition from the row_number() call as it is not needed for this)
you can use CTE for that
WITH CTE
AS ( select employee , sum(sales) as sales,
ROW_NUMBER() OVER (PARTITION BY employee ORDER BY sum(sales) desc) RN
FROM EmployeeSales)
SELECT employee ,
sales
FROM CTE
WHERE RN =1