Top 3 employee when emp changes its store - sql

Table 1 = emp - (emp_id, store_id, start_dt, end_dt, amount)
Table 2 = Sales - (emp_id, product_id, sales_dateline, qty, amount)
Table 1 has data like this
emp_id store_id amount start_dt, end_dt
1 1 200 2/21/2019 10/21/2019
1 2 400 10/22/2019 12/31/2019
How can we find top 3 employees working on each store during Q4 of 2019 ( sales_dateline column).
Note - we need to consider amount from previous store also for each employee. store_id should be displayed in result set. please help.

This query gets the top 3 employees working in multiple stores during Q4 of 2019 based on the sum of the sales_dateline column of Table 2. The total sales per employee includes sales from each of the store_id's the employee worked in.
Step 1: (cte named 'q4_sales_per_employee_per_store'): Summarize the sales (sum_sales) per employee per store in q4 of 2019.
Step 2: (cte named 'q4_sales_per_multi_store_employee'): Summarize the aggregate sales (sum_sales) per
employee in q4 of 2019.
Step 3: (cte named 'q4_multi_store_employee_rankings'): Rank [row rank and dense rank (ties are given equal rank)] of aggregate sales (sum_sales) per employee in q4 of 2019.
Step 4: Select the top 3 (based on dense rank which could be more then 3 employees due to ties) multi-store employees with the highest sales (across all stores) JOIN'ed with per store sales.
Something like this
with
q4_sales_per_employee_per_store(emp_id, store_id, sum_qty, sum_amount) as (
select t1.emp_id, t1.store_id, sum(t2.qty), sum(t2.amount)
from Table1 t1
join Table2 t2 on t1.emp_id=t2.emp_id
where t2.sales_dateline>=cast('20191001' as date)
and t2.sales_dateline<cast('20200101' as date)
group by t1.emp_id, t1.store_id),
q4_sales_per_multi_store_employee(emp_id, sum_qty, sum_amount) as (
select qs.emp_id, sum(sum_qty), sum(sum_amount)
from q4_sales_per_employee_per_store
group by qs.emp_id
having count(*)>1)
q4_multi_store_employee_rankings(emp_id, sum_qty, sum_amount, sales_row, sales_rank) as (
select *,
row_number over (order by sum_amount desc) sales_row,
dense_rank over (order by sum_amount desc) sales_rank
from q4_sales_per_multi_store_employee)
select mser.sales_rank, mser.sales_row, mser.emp_id,
spe.store_id, spe.sum_qty, spe.sum_amount
from q4_multi_store_employee_rankings mser
join q4_sales_per_employee_per_store spe on mse.emp_id=spe.emp_id
where mser.sales_rank<=3
order by mser.sales_rank, mser.sales_row;

The I think your answer involves several logical steps. I'm calling them 'logical steps' because they will be combined together into one statement at the end (so the solution only has one query).
I'll start with adding variables for start and end dates rather than hardcoding them.
-- Period variables to define start and end of quarter
DECLARE #PeriodStart date = '20191001';
DECLARE #PeriodEnd date '20200101';
Note that when used, we use >= #PeriodStart but < #PeriodEnd so the enddate is midnight on the morning of the first day of the next quarter.
Then you need to work out the sales for each employee, regardless of store. I think you can do this simply with something like
SELECT emp_id, SUM(amount) AS TotalSales
FROM [sales] S
WHERE sales_dateline >= #PeriodStart AND sales_dateline < #PeriodEnd
GROUP BY emp_id;
Notes
I don't know how you want to do date filtering for sales - the name salesdateline field implies it's not simply a datetime, but possible a text field or reference to another table. While I've assumed a datetime in the above, I will leave any modifications up to you.
Assume 'amount' in the Sales table is a total for the sale for that product, not the unit price. If it's the unit price, you need to multiple this by the qty field e.g., SUM(amount * qty).
Then, you need to determine in which store each person was last working in during Q4 2019 (Oct 1 - Dec 31). This means if an employee started in Store 1 during the quarter, then movied to Store 2, their record will be for Store 2 (but take into account sales for Store 1).
The below gives an approach for this
It first finds a single record per Employee per Store if the employee has worked in that store within the relevant period (regardless of how long, or over how many different sessions).
It finds the record with the latest startdate within that period - that will be the store the Employee is assigned to for reporting
SELECT store_id, emp_id
FROM
(SELECT store_id, emp_id,
ROW_NUMBER() OVER (PARTITION BY store_id, emp_id ORDER BY start_dt) AS rn
FROM [emp] E
WHERE (start_dt >= #PeriodStart AND start_dt < #PeriodEnd) -- started in period
OR (end_dt >= #PeriodStart AND end_dt < #PeriodEnd) -- ended in period
OR (start_dt < #PeriodStart AND end_dt >= #PeriodEnd) -- worked through period
) AS AllEmpStores
WHERE AllEmpStores.rn = 1;
Note that the previous version of my answer, I just reported all stores each employee worked at within that period. Therefore if a great salesperson worked at 10 stores within the period, they could feasibly show in the top 3 list for every store. The code for that was
SELECT DISTINCT E.Store_ID, E.Emp_ID
FROM [Emp] E
WHERE (E.start_dt >= #PeriodStart AND E.start_dt < #PeriodEnd) -- started in period
OR (E.end_dt >= #PeriodStart AND E.end_dt < #PeriodEnd) -- ended in period
OR (E.start_dt < #PeriodStart AND E.end_dt >= #PeriodEnd); -- worked through period
Now you have the two datasets (sales per employee, and employees by store), you can then join the first to the second, so you have Store_ID, Emp_ID, and Total_Sales (across all stores).
At that point you just need the top 3 per store, which you can do with a windowed function.
DECLARE #PeriodStart date = '20191001';
DECLARE #PeriodEnd date '20200101';
WITH S_CTE AS
(SELECT emp_id, SUM(amount) AS TotalSales
FROM [sales] S
WHERE sales_dateline >= #PeriodStart AND sales_dateline < #PeriodEnd
GROUP BY emp_id;
),
E_CTE AS
(SELECT store_id, emp_id
FROM
(SELECT store_id, emp_id,
ROW_NUMBER() OVER (PARTITION BY store_id, emp_id ORDER BY start_dt) AS rn
FROM [emp] E
WHERE (start_dt >= #PeriodStart AND start_dt < #PeriodEnd)
OR (end_dt >= #PeriodStart AND end_dt < #PeriodEnd)
OR (start_dt < #PeriodStart AND end_dt >= #PeriodEnd)
) AS AllEmpStores
WHERE AllEmpStores.rn = 1
)
SELECT store_ID, emp_ID, TotalSales
FROM
(SELECT E_CTE.store_ID, E_CTE.emp_ID, S_CTE.TotalSales,
DENSE_RANK() OVER (PARTITION BY E_CTE.storeID ORDER BY S_CTE.TotalSales DESC) AS SalesRank
FROM E_CTE
INNER JOIN S_CTE ON E_CTE.emp_id = S_CTE.emp_id
) AS A
WHERE A.SalesRank <= 3
Edits/updates:
Now finds 'latest' store per employee only (rather than all stores) within period
Clarified filtering on sales
Added DENSE_RANK based on question comments
... And I just fixed a typo where E_CTE and S_CTE were the wrong way around.

Related

How to join partitioned table with another one

Sorry for the newbie question, but I'm really having trouble with the following issue:
Say, I have this code in place:
WITH active_pass AS (SELECT DATE_TRUNC(fr.day, MONTH) AS month, id,
CASE
WHEN SUM(fr.imps) > 100 THEN 1
WHEN SUM(fr.imps) < 100 THEN 0
END AS active_or_passive
FROM table1 AS fr
WHERE day between (CURRENT_DATE() - 730) AND (CURRENT_DATE() - EXTRACT(DAY FROM CURRENT_DATE()))
GROUP BY month, id
ORDER BY month desc),
# summing the score for each customer (sum for the whole year)
active_pass_assigned AS (SELECT id, month,
SUM(SUM(active_or_passive)) OVER (PARTITION BY id ORDER BY month rows BETWEEN 3 PRECEDING AND 1 PRECEDING) AS trailing_act
FROM active_pass AS a
GROUP BY month, id
ORDER BY MONTH desc)
What it does is it creates a trailing total over the last 3 months to see how many of those last 3 month the customer was active. However, I have no idea how to join with the next table to get a sum of revenue that said client generated. What I tried is this:
SELECT c.id, DATE_TRUNC(day, MONTH) AS month, SUM(revenue) AS Rev, name
FROM table2 AS c
JOIN active_pass_assigned AS a
ON c.id = a.id
WHERE day between (CURRENT_DATE() - 365) AND (CURRENT_DATE() - EXTRACT(DAY FROM CURRENT_DATE()))
GROUP BY month, id, name
ORDER BY month DESC
However, it returns waaay higher values for Revenue than the actual ones and I have no idea why. Furthermore, could you please tell me how to join those two tables together so that I only get the customer's revenue on the months his activity was equal to 3?

How to count only the working days between two dates?

I have the following table called vacations, where the employee number is displayed along with the start and end date of their vacations:
id_employe
start
end
1001
2020-12-24
2021-01-04
What I am looking for is to visualize the amount of vacation days that each employee had, but separating them by employee number, month, year and number of days; without taking into account non-business days (Saturdays, Sundays and holidays).
I have the following query, which manages to omit Saturday and Sunday from the posting:
SELECT id_employee,
EXTRACT(YEAR FROM t.Date) AS year,
EXTRACT(MONTH FROM t.Date) AS month,
SUM(WEEKDAY(`Date`) < 5) AS days
FROM (SELECT v.id_employee,
DATE_ADD(v.start, interval s.seq - 1 DAY) AS Date
FROM vacations v CROSS JOIN seq_1_to_100 s
WHERE DATE_ADD(v.start, interval s.seq - 1 DAY) <= v.end
ORDER BY v.id_employee, v.start, s.seq ) t
GROUP BY id_employee, EXTRACT(YEAR_MONTH FROM t.Date);
My question is, how could I in addition to skipping the weekends, also skip the holidays? I suppose that I should establish another table where the dates of those holidays are stored, but how could my * query * be adapted to perform the comparison?
If we consider that the employee 1001 took his vacations from 2020-12-24 to 2021-01-04 and we take Christmas and New Years as holidays, we should get the following result:
id_employee
month
year
days
1001
12
2020
5
1001
1
2021
1
After you have created a table that stores the holiday dates, then you probably can do something like this:
SELECT id_employee,
EXTRACT(YEAR FROM t.Date) AS year,
EXTRACT(MONTH FROM t.Date) AS month,
SUM(CASE WHEN h.holiday_date IS NULL THEN WEEKDAY(`Date`) < 5 END) AS days
FROM (SELECT v.id_employee,
DATE_ADD(v.start, interval s.seq - 1 DAY) AS Date
FROM vacations v CROSS JOIN seq_1_to_100 s
WHERE DATE_ADD(v.start, interval s.seq - 1 DAY) <= v.end
ORDER BY v.id_employee, v.start, s.seq ) t
LEFT JOIN holidays h ON t.date=h.holiday_date
GROUP BY id_employee, EXTRACT(YEAR_MONTH FROM t.Date);
Assuming that the holidays table structure would be something like this:
CREATE TABLE holidays (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
holiday_date DATE,
holiday_description VARCHAR(255));
Then LEFT JOIN it to your current query and change the SUM() slightly by adding CASE expression to check. If the ON t.date=h.holiday_date in the left join matches, there will be result of field h.holiday_date, otherwise it will be NULL, hence only the CASE h.holiday_date WHEN IS NULL .. will be considered.
Demo fiddle
Adding this solution compatible with both MariaDB and MySQL version that supports common table expression:
WITH RECURSIVE cte AS
(SELECT id_employee, start, start lvdt, end FROM vacations
UNION ALL
SELECT id_employee, start, lvdt+INTERVAL 1 DAY, end FROM cte
WHERE lvdt+INTERVAL 1 DAY <=end)
SELECT id_employee,
YEAR(v.lvdt) AS year,
MONTH(v.lvdt) AS month,
SUM(CASE WHEN h.holiday_date IS NULL THEN WEEKDAY(v.lvdt) < 5 END) AS days
FROM cte v
LEFT JOIN holidays h
ON v.lvdt=h.holiday_date
GROUP BY id_employee,
YEAR(v.lvdt),
MONTH(v.lvdt);

Days Since Last Help Ticket was Filed

I am trying to create a report to show me the last date a customer filed a ticket.
Customers can file dozens of tickets. I want to know when the last ticket was filed and show how many days it's been since they have done so.
The fields I have are:
Customer,
Ticket_id,
Date_Closed
All from the Same table "Tickets"
I'm thinking I want to do a ranking of tickets by min date? I tried this query to grab something but it's giving me all the tickets from the customer. (I'm using SQL in a product called Domo)
select * from (select *, rank() over (partition by "Ticket_id"
order by "Date_Closed" desc) as date_order
from tickets ) zd
where date_order = 1
This should be simple enough,
SELECT customer,
MAX (date_closed) last_date,
ROUND((SYSDATE - MAX (date_closed)),0) days_since_last_ticket_logged
FROM emp
GROUP BY customer
select Customer, datediff(day, date_closed, current_date) as days_since_last_tkt
from
(select *, rank() over (partition by Customer order by "Date_Closed" desc) as date_order
from tickets) zd
join tickets t on zd.date_closed = t.date_closed
where zd.date_order = 1
Or you can simply do
select customer, datediff(day, max(Date_closed), current_date) as days_since_last_tkt
from tickets
group by customer
To select other fields
select t.*
from tickets t
join (select customer, max(Date_closed) as mxdate,
datediff(day, max(Date_closed), current_date) as days_since_last_tkt
from tickets
group by customer) tt
on t.customer = tt.customer and tt.mxdate = t.date_closed
I would do this with a simple sub-query to select the last closed date for the customer. Then compare this to today with datediff() to get the number of days since last closed.
Select
LastTicket.Customer,
LastTicket.LastClosedDate,
DateDiff(day,LastTicket.LastClosedDate,getdate()) as DaysSinceLastClosed
From
(select
tickets.customer
max(tickets.dateClosed) as LastClosedDate
from tickets
Group By tickets.Customer) as LastTicket
Based on the responses this is what I did:
select "Customer",
Max("date_closed") "last_date,
round(datediff(DAY, CURRENT_DATE, max("date_closed")), 0) as "Closed_date"
from tickets
group by "Customer"
ORDER BY "Customer"

Query to apply rate from the interval of dates

Let's say I have two tables in my oracle database
Table A : stDate, endDate, salary
For example:
03/02/2010 28/02/2010 2000
05/03/2012 29/03/2012 2500
Table B : DateOfActivation, rate
For example:
01/01/2010 1.023
01/11/2011 1.063
01/01/2012 1.075
I would like to have a SQL query displaying the sum of salary of table A with each salary multiplied by the rate of table B depending on the activation date.
Here, for the first salary the good rate is the first one (1.023) because the second rate has a date of activation that is later than stDate and endDate interval.
For the second salary, the third rate is applied because activation date of the rate was before the interval of dates of the second salary.
so the sum is this one : 2000 * 1.023 + 2500 * 1.075 = 4733.5
I hope I am clear
Thanks
Assuming the rate must be active before the beginning of the interval (i.e. DateOfActivation < stDate), you could do something like this (see fiddle):
SELECT SUM(salary*
(SELECT rate from TableB WHERE DateOfActivation=
(SELECT MAX(DateOfActivation) FROM TableB WHERE DateOfActivation < stDate)
)) FROM TableA;
This problem becomes much easier if DateofActivation is a true effective dated table with rate_start_date and rate_end_date such that a new row cannot be created where its start date or end_date will lie within an existing rate_start_date -- rate_end_date pair. The currently active row typically would have a NULL value for rate_end_date. In addition, Likely, you would want an EMP_ID on the salary table to be able to sum the rows to finish the calculation; and one needs to consider the following cases:
Start Date is between rate_start and rate_end
End Date is between rate_start and rate_end
Rate_start and Rate_end are between start_date and end_date (sandwiched)
If you run the following snippet you will see we can artificially create our rate_end_dates as follows:
SELECT D.ACTIVEDATE, D.RATE, NVL(MIN(E.ACTIVEDATE)-1,SYSDATE) ENDDATE
FROM XX_DATEOFACTIVATION D, XX_DATEOFACTIVATION E
WHERE D.ACTIVEDATE<E.ACTIVEDATE(+)
GROUP BY D.ACTIVEDATE, D.RATE
ORDER BY D.ACTIVEDATE
Proposed code is as follows:
SELECT DISTINCT * FROM
(SELECT S.*, T.RATE, S.SALARY*T.RATE
FROM XX_SAL_HIST S,
(SELECT D.ACTIVEDATE, D.RATE, NVL(MIN(E.ACTIVEDATE)-1,SYSDATE) ENDDATE
FROM XX_DATEOFACTIVATION D, XX_DATEOFACTIVATION E
WHERE D.ACTIVEDATE<E.ACTIVEDATE(+)
GROUP BY D.ACTIVEDATE, D.RATE) T -- creating synthetic rate_end_date
WHERE S.STDATE BETWEEN T.ACTIVEDATE AND T.ENDDATE)
UNION
(SELECT S.*, T.RATE, S.SALARY*T.RATE
FROM XX_SAL_HIST S,
(SELECT D.ACTIVEDATE, D.RATE, NVL(MIN(E.ACTIVEDATE)-1,SYSDATE) ENDDATE
FROM XX_DATEOFACTIVATION D, XX_DATEOFACTIVATION E
WHERE D.ACTIVEDATE<E.ACTIVEDATE(+)
GROUP BY D.ACTIVEDATE, D.RATE) T -- creating synthetic rate_end_date
WHERE S.ENDDATE BETWEEN T.ACTIVEDATE AND T.ENDDATE)
UNION
(SELECT S.*, T.RATE, S.SALARY*T.RATE
FROM XX_SAL_HIST S,
(SELECT D.ACTIVEDATE, D.RATE, NVL(MIN(E.ACTIVEDATE)-1,SYSDATE) ENDDATE
FROM XX_DATEOFACTIVATION D, XX_DATEOFACTIVATION E
WHERE D.ACTIVEDATE<E.ACTIVEDATE(+)
GROUP BY D.ACTIVEDATE, D.RATE) T -- creating synthetic rate_end_date
WHERE T.ACTIVEDATE BETWEEN S.STDATE AND S.ENDDATE)
The first thing to do is to transform Table B (Table2 in the query) to have, for each row, the start and end date
Select DateOfActivation AS startDate
, rate
, NVL(LEAD(DateOfActivation, 1) OVER (ORDER BY DateOfActivation)
, TO_DATE('9999/12/31', 'yyyy/mm/dd')) AS endDate
From Table2
Now we can join this table with Table A (Table1 in the query)
WITH Rates AS (
Select DateOfActivation AS startDate
, rate
, NVL(LEAD(DateOfActivation, 1) OVER (ORDER BY DateOfActivation)
, TO_DATE('9999/12/31', 'yyyy/mm/dd')) AS endDate
From Table2)
Select SUM(s.salary * r.rate)
From Rates r
INNER JOIN Table1 s ON s.stDate < r.endDate AND s.endDate > r.startDate
The JOIN condition get every row in Table A that are at least partially in the activation period of the rate, if you need it to be inclusive you can alter it as in the following query
WITH Rates AS (
Select DateOfActivation AS startDate
, rate
, NVL(LEAD(DateOfActivation, 1) OVER (ORDER BY DateOfActivation)
, TO_DATE('9999/12/31', 'yyyy/mm/dd')) AS endDate
From Table2)
Select SUM(s.salary * r.rate)
From Rates r
INNER JOIN Table1 s ON s.stDate >= r.startDate AND s.endDate <= r.endDate

Query that returns data for every Friday for 1 year

I have a table that contains the number of orders a company makes per day from a given vendor, along with other information. I want to write a query that returns the number of orders from a vendor along with other info for every Friday for 1 year - i have figured this out as follows:
with dt as
( select next_day(trunc(add_months(sysdate,-12)) + 7*(level-1),'FRI') d
from dual connect by level <= 53 )
select *
from dt, vendor where vendor.dt = dt.d
But, suppose there are 10 vendors in total; and on a given Friday, orders were placed only from 6 vendors. Then for the remaining 4 vendors, i want to re-run the query for a Thursday and so on. Any help is appreciated.
I don't know your tables' structures, so I'll show you an example how I would do it.
Say there is a table orders:
CREATE TABLE orders(
vendor varchar2(100), order_date date
);
then the query for this table could be:
SELECT to_char( order_date, 'IW' ) as Week_nbr,
vendor,
count(*) As number_of_orders
FROM (
SELECT t.*,
max( order_date ) OVER (Partition by vendor, to_char( order_date, 'IW' ))
As lastest_date_within_a_week
FROM orders t
) xx
WHERE order_date = lastest_date_within_a_week
GROUP BY to_char( order_date, 'IW' ),
vendor
ORDER BY 1,2
This line:
max( order_date ) OVER (Partition by vendor, to_char( order_date, 'IW' ))
As lastest_date_within_a_week
is looking for a latest (oldest) orders' date within each week for each vendor. If a vendor has only 2 orders in a given week, one in Monday, and second in Thursday, then this function returns a date of Thurdsay.
And here:
WHERE order_date = lastest_date_within_a_week
we are taking vendor's orders only from last day in a week.
Then the query is doing a simple GROUP BY week_nr, vendor and a COUNT(*)
A working demo: http://sqlfiddle.com/#!4/96df3/1
You can sum data for every single date showing the day of the week with to_char(sysdate,'d'). Then you wrap the query and sum for the max day of the week grouping by vendor, like this sum(imp) keep (dense_rank last order by day_of_week).
Watch out for NLS. Day of week depends on it