Aggregate before and after a date column - sql

I have two tables: db.transactions and db.salesman, which I would like to combine in order to create an output that has aggregated sales before each salesman's hire date and after each salesman's hire date.
select * from db.transactions
index sales_rep sales trx_date
1 Tom 200 9/18/2020
2 Jerry 435 6/21/2020
3 Patrick 1400 4/30/2020
4 Tom 560 5/24/2020
5 Francis 240 1/2/2021
select * from db.salesman
index sales_rep hire_date
1 Tom 8/19/2020
2 Jerry 1/28/2020
3 Patrick 4/6/2020
4 Francis 9/4/2020
I would like to aggregate sales from db.transactions before and after each sales rep's hire date.
Expected output:
index sales_rep hire_date agg_sales_before_hire_date agg_sales_after_hire_date
1 Tom 8/19/2020 1200 5000
2 Jerry 1/28/2020 500 900
3 Patrick 4/6/2020 5000 300
4 Francis 9/4/2020 2900 1500
For a single sales rep, to calculate the agg_sales_before_hire_date is likely:
select tx.sales_rep, tx.sum(sales)
from db.transactions tx
inner join db.salesman sm on sm.sales_rep = tx.sales_rep
where hire_date < '8/19/2020' and sales_rep = 'Tom'
group by tx.sales_rep
PostGRESQL. I am also open to the idea of doing it into Tableau or Python.

Using CROSS JOIN LATERAL
select
sa.sales_rep, sa.hire_date,
l.agg_sales_before_hire_date,
l.agg_sales_after_hire_date
from salesman sa
cross join lateral
(
select
sum(tx.sales) filter (where tx.trx_date < sa.hire_date) agg_sales_before_hire_date,
sum(tx.sales) filter (where tx.trx_date >= sa.hire_date) agg_sales_after_hire_date
from transactions tx
where tx.sales_rep = sa.sales_rep
) l;

Use conditional aggregation:
select tx.sales_rep,
sum(case when tx.txn_date < sm.hire_date then sales else 0 end) as before_sales,
sum(case when tx.txn_date >= sm.hire_date then sales else 0 end) as after_sales
from db.transactions tx inner join
db.salesman sm
on sm.sales_rep = tx.sales_rep
group by tx.sales_rep;
EDIT:
In Postgres, you would use filter for the logic:
select tx.sales_rep,
sum(sales) filter (where tx.txn_date < sm.hire_date) as before_sales,
sum(sales) filter (where tx.txn_date >= sm.hire_date then sales) as after_sales

Related

How to assign filters to row number () function in sql

I am trying to extract only single row after name = system in each case where the town is not Austin.
In case 1001 there are 8 rows, row # 4 is system, output should be only the row with Name=Terry and Date Moved=7/4/2019 (Next entry with town /= Austin)
Case Name Town Date Moved Row #(Not in table)
1001 Ted Madisson 9/7/2018 1
1001 Joyal Boston 10/4/2018 2
1001 Beatrice Chicago 1/1/2019 3
1001 System Chicago 1/5/2019 4
1001 John Austin 4/11/2019 5
1001 Simon Austin 6/11/2019 6
1001 Terry Cleveland 7/4/2019 7
1001 Hawkins Newyork 8/4/2019 8
1002 Devon Boston 12/4/2018 1
1002 Joy Austin 12/7/2018 2
1002 Rachael Newyork 12/19/2018 3
1002 Bill Chicago 1/4/2019 4
1002 System Dallas 2/12/2019 5
1002 Phil Austin 3/16/2019 6
1002 Dan Seattle 5/18/2019 7
1002 Claire Birmingham 7/7/2019 8
Tried sub query with row number function and not in ('Austin') filter
ROW_NUMBER() OVER(PARTITION BY Case ORDER BY Moved_date ASC) AS ROWNUM
Please note there are > 10k cases.
You can try this below script-
WITH CTE AS
(
SELECT [Case],[Name],Town,[Date Moved],
ROW_NUMBER() OVER (PARTITION BY [Case] ORDER BY [Date Moved]) [Row #]
FROM your_table
)
SELECT A.*
FROM CTE A
INNER JOIN
(
SELECT C.[Case],C.Town,MAX(C.[Row #]) MRN
FROM CTE C
INNER JOIN
(
SELECT *
FROM CTE A
WHERE A.Name = 'System'
)D ON C.[Case] = D.[Case] AND C.[Row #] > D.[Row #]
AND C.Town = 'Austin'
GROUP BY C.[Case],C.Town
)B ON A.[Case] = B.[Case] AND A.[Row #] = B.MRN+1
Output is -
Case Name Town Date Moved Row #
1001 Terry Cleveland 7/4/2019 6
1002 Dan Seattle 5/18/2019 7
Here are three possibilities. I'm still concerned about ties though. The first one will return multiple rows while the others only one per case:
with matches as (
select t1."case", min(t2."Date Moved") as "Date Moved"
from Movements r1 inner join Movements t2 on t1."case" = t2."case"
where t1.name = 'System' and t2.Town <> 'Austin'
and t2."Date Moved" > t1."Date Moved"
group by t1."case"
)
select t.*
from Movements t inner join matches m
on m."case" = t."case" and m."Date Moved" = t."Date Moved";
select m2.*
from Movements m1 cross apply (
select top 1 * from Movements m2
where m2.Town <> 'Austin' and m2."Date Moved" > m1."Date Moved"
order by m2."Date Moved"
) as match
where m1.name = 'System';
with m1 as (
select *,
count(case when name = 'System') over (partition by "case" order by "Date Moved") as flag
from Movements
), m2 as (
select *,
row_number() over (partition by "case" order by "Date Moved") as rn
from m1
where flag = 1 and name <> 'System' and Town <> 'Austin'
)
select * from m2 where rn = 1;
I'm basically assuming this is SQL Server. You might need a few minor tweaks if not.
It also does not require a town named Austin to fall between the "System" row and the desired row as I do not believe that was a stated requirement.

Find out the top 3 customers by sum of sales from different groups for the last 30 days - Amazon interview

This was my Amazon SQL interview question which I bombed miserably.
We have 3 tables:
customers orders catalog
cust_id order_date catalog_id
cust_name order_id catalog_name
unit_price cust_id
quantity
catalog_id
The output expected was to find top 3 customers from the 3 catalog / business units for the last 30 days. I tried partitioning over total sales but the last 30 day sales and multiple joins threw me off. Following were the columns requested:
cust_id cust_name catalog_name total_sales(unit_price*quantity)
1 David Books 1400
2 John Books 1200
3 Lisa Books 1000
4 Paul DVDs 500
2 John DVDs 313.5
5 James DVDs 220
6 Alice TV 110
1 David TV 87.5
7 Jerry TV 56
I understand basic 'partitioning over order by' however I have not used it over multiple tables with a datestamp. Kindly help me in understanding this concept. Thank you all in advance!
The query below should give you an idea.
select *
from (select c.cust_id,c.cust_name,ct.catalog_name,sum(o.unit_price * o.quantity) as total_sales,
,dense_rank() over(partition by ct.catalog_name order by sum(o.unit_price * o.quantity) desc) as rnk
from customers c
join orders o on o.cust_id = c.cust_id
join catalog ct on ct.catalog_id = o.catalog_id
--last 30 days filter
where o.order_date >= date_add(day,-30,cast(getdate() as date)) and o.order_date < cast(getdate() as date)
group by c.cust_id,c.cust_name,ct.catalog_name
) t
where rnk <= 3

pgsql -Showing top 10 products's sales and other products as 'others' and its sum of sales

I have a table called "products" where it has 100 records with sales details. My requirement is so simple that I was not able to do it.
I need to show the top 10 product names with sales and other product names as "others" and its sales. so totally my o/p will be 11 rows. 11-th row should be others and sum of sales of all remaining products. Can anyone give me the logic?
O/p should be like this,
Name sales
------ -----
1 colgate 9000
2 pepsodent 8000
3 closeup 7000
4 brittal 6000
5 ariies 5000
6 babool 4000
7 imami 3000
8 nepolop 2500
9 lactoteeth 2000
10 menwhite 1500
11 Others 6000 (sum of sales of remaining 90 products)
here is my sql query,
select case when rank<11 then prod_cat else 'Others' END as prod_cat,
total_sales,ID,rank from (select ROW_NUMBER() over (order by (sum(i.grandtotal)) desc) as rank,pc.name as prod_cat,sum(i.grandtotal) as total_sales, pc.m_product_category_id as ID`enter code here`
from adempiere.c_invoice i join adempiere.c_invoiceline il on il.c_invoice_id=i.c_invoice_id join adempiere.m_product p on p.m_product_id=il.m_product_id join adempiere.m_product_category pc on pc.m_product_category_id=p.m_product_category_id
where extract(year from i.dateacct)=extract(year from now())
group by pc.m_product_category_id) innersql
order by total_sales desc
o/p what i got is,
prod_cat total_sales id rank
-------- ----------- --- ----
BSHIRT 4511697.63 460000015 1
BT-SHIRT 2725167.03 460000016 2
SHIRT 2630471.56 1000003 3
BJEAN 1793514.07 460000005 4
JEAN 1115402.90 1000004 5
GT-SHIRT 1079596.33 460000062 6
T SHIRT 446238.60 1000006 7
PANT 405189.00 1000005 8
GDRESS 396789.02 460000059 9
BTROUSER 393739.48 460000017 10
Others 164849.41 1000009 11
Others 156677.00 1000008 12
Others 146678.00 1000007 13
As #e4c5 suggests, use UNION:
select id, prod_cat, sum(total_sales) as total_sales
with
totals as (
select --pc.m_product_category_id as id,
pc.name as prod_cat,
sum(i.grandtotal) as total_sales,
ROW_NUMBER() over (order by sum(i.grandtotal) desc) as rank
from adempiere.c_invoice i
join adempiere.c_invoiceline il on (il.c_invoice_id=i.c_invoice_id)
join adempiere.m_product p on (p.m_product_id=il.m_product_id)
join adempiere.m_product_category pc on (pc.m_product_category_id=p.m_product_category_id)
where i.dateacct >= date_trunc('year', now()) and i.dateacct < date_trunc('year', now()) + interval '1' year
group by pc.m_product_category_id, pc.name
),
rankedothers as (
select prod_cat, total_sales, rank
from totals where rank <= 10
union
select 'Others', sum(total_sales), 11
from totals where rank > 10
)
select prod_cat, total_sales
from ranked_others
order by rank
Also, I recommend using sargable conditions like the one above, which is slightly more complicated than the one you implemented, but generally worth the extra effort.

How to Sum the 1st record of one column with the 2nd record of another column?

I am trying the Sum the 2nd record of one column with the 1st record of another column and store the result in a new column
Here is the example SQL Server table
Emp_Code Emp_Name Month Opening_Balance
G101 Sam 1 1000
G102 James 2 -2500
G103 David 3 3000
G104 Paul 4 1800
G105 Tom 5 -1500
I am trying to get the output as below on the new Reserve column
Emp_Code Emp_Name Month Opening_Balance Reserve
G101 Sam 1 1000 1000
G102 James 2 -2500 -1500
G103 David 3 3000 1500
G104 Paul 4 1800 3300
G105 Tom 5 -1500 1800
Actually the rule for calculating the Reserve column is that
For Month-1 it's the same as Opening Balance
For rest of the months its Reserve for Month-2 = Reserve for Month-1 + Opening Balance for Month-2
You seem to want a cumulative sum. In SQL Server 2012+, you would do:
select t.*,
sum(opening_balance) over (order by [Month]) as Reserve
from t;
In earlier versions, you would do this with a correlated subquery or apply:
select t.*,
(select sum(t2.opening_balance) from t t2 where t2.[Month] <= t.[Month]) as reserve
from t;
You can do a self join.
SELECT t.Emp_Code, t.Emp_Name, t.Month, t.Opening_Balance, t.Opening_Balance + n.Reserve
FROM Table1 t
JOIN Table2 n
ON t.Month = n.Month - 1

How to select items according to their sums in SQL?

I've got the following table:
ID Name Sales
1 Kalle 1
2 Kalle -1
3 Simon 10
4 Simon 20
5 Anna 11
6 Anna 0
7 Tina 0
I want to write a SQL query that only returns the rows that
represents a salesperson with sum of sales > 0.
ID Name Sales
3 Simon 10
4 Simon 20
5 Anna 11
6 Anna 0
Is this possible?
You can easily get names of the people with the sum of sales that are greater than 0 by using the a HAVING clause:
select name
from yourtable
group by name
having sum(sales) > 0;
This query will return both Simon and Anna, then if you want to return all of the details for each of these names you can use the above in a WHERE clause to get the final result:
select id, name, sales
from yourtable
where name in (select name
from yourtable
group by name
having sum(sales) > 0);
See SQL Fiddle with Demo.
You can make it like this, I think the join will be more effective than the where name in() clause.
SELECT Sales.name, Sales.sales
FROM Sales
JOIN (SELECT name FROM Sales GROUP BY Sales.name HAVING SUM(sales) > 0) AS Sales2 ON Sales2.name = Sales.name
This will work on some databases, like oracle, mssql, db2
SELECT ID, Name, Sales
FROM
(
SELECT ID, Name, Sales, sum(sales) over (partition by name) sum1
FROM <table>
) a
WHERE sum1 > 0