I have two tables: db.transactions and db.salesman, which I would like to combine in order to create an output that has aggregated sales before each salesman's hire date and after each salesman's hire date.
select * from db.transactions
index sales_rep sales trx_date
1 Tom 200 9/18/2020
2 Jerry 435 6/21/2020
3 Patrick 1400 4/30/2020
4 Tom 560 5/24/2020
5 Francis 240 1/2/2021
select * from db.salesman
index sales_rep hire_date
1 Tom 8/19/2020
2 Jerry 1/28/2020
3 Patrick 4/6/2020
4 Francis 9/4/2020
I would like to aggregate sales from db.transactions before and after each sales rep's hire date.
Expected output:
index sales_rep hire_date agg_sales_before_hire_date agg_sales_after_hire_date
1 Tom 8/19/2020 1200 5000
2 Jerry 1/28/2020 500 900
3 Patrick 4/6/2020 5000 300
4 Francis 9/4/2020 2900 1500
For a single sales rep, to calculate the agg_sales_before_hire_date is likely:
select tx.sales_rep, tx.sum(sales)
from db.transactions tx
inner join db.salesman sm on sm.sales_rep = tx.sales_rep
where hire_date < '8/19/2020' and sales_rep = 'Tom'
group by tx.sales_rep
PostGRESQL. I am also open to the idea of doing it into Tableau or Python.
Using CROSS JOIN LATERAL
select
sa.sales_rep, sa.hire_date,
l.agg_sales_before_hire_date,
l.agg_sales_after_hire_date
from salesman sa
cross join lateral
(
select
sum(tx.sales) filter (where tx.trx_date < sa.hire_date) agg_sales_before_hire_date,
sum(tx.sales) filter (where tx.trx_date >= sa.hire_date) agg_sales_after_hire_date
from transactions tx
where tx.sales_rep = sa.sales_rep
) l;
Use conditional aggregation:
select tx.sales_rep,
sum(case when tx.txn_date < sm.hire_date then sales else 0 end) as before_sales,
sum(case when tx.txn_date >= sm.hire_date then sales else 0 end) as after_sales
from db.transactions tx inner join
db.salesman sm
on sm.sales_rep = tx.sales_rep
group by tx.sales_rep;
EDIT:
In Postgres, you would use filter for the logic:
select tx.sales_rep,
sum(sales) filter (where tx.txn_date < sm.hire_date) as before_sales,
sum(sales) filter (where tx.txn_date >= sm.hire_date then sales) as after_sales
Related
I am trying to extract only single row after name = system in each case where the town is not Austin.
In case 1001 there are 8 rows, row # 4 is system, output should be only the row with Name=Terry and Date Moved=7/4/2019 (Next entry with town /= Austin)
Case Name Town Date Moved Row #(Not in table)
1001 Ted Madisson 9/7/2018 1
1001 Joyal Boston 10/4/2018 2
1001 Beatrice Chicago 1/1/2019 3
1001 System Chicago 1/5/2019 4
1001 John Austin 4/11/2019 5
1001 Simon Austin 6/11/2019 6
1001 Terry Cleveland 7/4/2019 7
1001 Hawkins Newyork 8/4/2019 8
1002 Devon Boston 12/4/2018 1
1002 Joy Austin 12/7/2018 2
1002 Rachael Newyork 12/19/2018 3
1002 Bill Chicago 1/4/2019 4
1002 System Dallas 2/12/2019 5
1002 Phil Austin 3/16/2019 6
1002 Dan Seattle 5/18/2019 7
1002 Claire Birmingham 7/7/2019 8
Tried sub query with row number function and not in ('Austin') filter
ROW_NUMBER() OVER(PARTITION BY Case ORDER BY Moved_date ASC) AS ROWNUM
Please note there are > 10k cases.
You can try this below script-
WITH CTE AS
(
SELECT [Case],[Name],Town,[Date Moved],
ROW_NUMBER() OVER (PARTITION BY [Case] ORDER BY [Date Moved]) [Row #]
FROM your_table
)
SELECT A.*
FROM CTE A
INNER JOIN
(
SELECT C.[Case],C.Town,MAX(C.[Row #]) MRN
FROM CTE C
INNER JOIN
(
SELECT *
FROM CTE A
WHERE A.Name = 'System'
)D ON C.[Case] = D.[Case] AND C.[Row #] > D.[Row #]
AND C.Town = 'Austin'
GROUP BY C.[Case],C.Town
)B ON A.[Case] = B.[Case] AND A.[Row #] = B.MRN+1
Output is -
Case Name Town Date Moved Row #
1001 Terry Cleveland 7/4/2019 6
1002 Dan Seattle 5/18/2019 7
Here are three possibilities. I'm still concerned about ties though. The first one will return multiple rows while the others only one per case:
with matches as (
select t1."case", min(t2."Date Moved") as "Date Moved"
from Movements r1 inner join Movements t2 on t1."case" = t2."case"
where t1.name = 'System' and t2.Town <> 'Austin'
and t2."Date Moved" > t1."Date Moved"
group by t1."case"
)
select t.*
from Movements t inner join matches m
on m."case" = t."case" and m."Date Moved" = t."Date Moved";
select m2.*
from Movements m1 cross apply (
select top 1 * from Movements m2
where m2.Town <> 'Austin' and m2."Date Moved" > m1."Date Moved"
order by m2."Date Moved"
) as match
where m1.name = 'System';
with m1 as (
select *,
count(case when name = 'System') over (partition by "case" order by "Date Moved") as flag
from Movements
), m2 as (
select *,
row_number() over (partition by "case" order by "Date Moved") as rn
from m1
where flag = 1 and name <> 'System' and Town <> 'Austin'
)
select * from m2 where rn = 1;
I'm basically assuming this is SQL Server. You might need a few minor tweaks if not.
It also does not require a town named Austin to fall between the "System" row and the desired row as I do not believe that was a stated requirement.
This was my Amazon SQL interview question which I bombed miserably.
We have 3 tables:
customers orders catalog
cust_id order_date catalog_id
cust_name order_id catalog_name
unit_price cust_id
quantity
catalog_id
The output expected was to find top 3 customers from the 3 catalog / business units for the last 30 days. I tried partitioning over total sales but the last 30 day sales and multiple joins threw me off. Following were the columns requested:
cust_id cust_name catalog_name total_sales(unit_price*quantity)
1 David Books 1400
2 John Books 1200
3 Lisa Books 1000
4 Paul DVDs 500
2 John DVDs 313.5
5 James DVDs 220
6 Alice TV 110
1 David TV 87.5
7 Jerry TV 56
I understand basic 'partitioning over order by' however I have not used it over multiple tables with a datestamp. Kindly help me in understanding this concept. Thank you all in advance!
The query below should give you an idea.
select *
from (select c.cust_id,c.cust_name,ct.catalog_name,sum(o.unit_price * o.quantity) as total_sales,
,dense_rank() over(partition by ct.catalog_name order by sum(o.unit_price * o.quantity) desc) as rnk
from customers c
join orders o on o.cust_id = c.cust_id
join catalog ct on ct.catalog_id = o.catalog_id
--last 30 days filter
where o.order_date >= date_add(day,-30,cast(getdate() as date)) and o.order_date < cast(getdate() as date)
group by c.cust_id,c.cust_name,ct.catalog_name
) t
where rnk <= 3
I have a table called "products" where it has 100 records with sales details. My requirement is so simple that I was not able to do it.
I need to show the top 10 product names with sales and other product names as "others" and its sales. so totally my o/p will be 11 rows. 11-th row should be others and sum of sales of all remaining products. Can anyone give me the logic?
O/p should be like this,
Name sales
------ -----
1 colgate 9000
2 pepsodent 8000
3 closeup 7000
4 brittal 6000
5 ariies 5000
6 babool 4000
7 imami 3000
8 nepolop 2500
9 lactoteeth 2000
10 menwhite 1500
11 Others 6000 (sum of sales of remaining 90 products)
here is my sql query,
select case when rank<11 then prod_cat else 'Others' END as prod_cat,
total_sales,ID,rank from (select ROW_NUMBER() over (order by (sum(i.grandtotal)) desc) as rank,pc.name as prod_cat,sum(i.grandtotal) as total_sales, pc.m_product_category_id as ID`enter code here`
from adempiere.c_invoice i join adempiere.c_invoiceline il on il.c_invoice_id=i.c_invoice_id join adempiere.m_product p on p.m_product_id=il.m_product_id join adempiere.m_product_category pc on pc.m_product_category_id=p.m_product_category_id
where extract(year from i.dateacct)=extract(year from now())
group by pc.m_product_category_id) innersql
order by total_sales desc
o/p what i got is,
prod_cat total_sales id rank
-------- ----------- --- ----
BSHIRT 4511697.63 460000015 1
BT-SHIRT 2725167.03 460000016 2
SHIRT 2630471.56 1000003 3
BJEAN 1793514.07 460000005 4
JEAN 1115402.90 1000004 5
GT-SHIRT 1079596.33 460000062 6
T SHIRT 446238.60 1000006 7
PANT 405189.00 1000005 8
GDRESS 396789.02 460000059 9
BTROUSER 393739.48 460000017 10
Others 164849.41 1000009 11
Others 156677.00 1000008 12
Others 146678.00 1000007 13
As #e4c5 suggests, use UNION:
select id, prod_cat, sum(total_sales) as total_sales
with
totals as (
select --pc.m_product_category_id as id,
pc.name as prod_cat,
sum(i.grandtotal) as total_sales,
ROW_NUMBER() over (order by sum(i.grandtotal) desc) as rank
from adempiere.c_invoice i
join adempiere.c_invoiceline il on (il.c_invoice_id=i.c_invoice_id)
join adempiere.m_product p on (p.m_product_id=il.m_product_id)
join adempiere.m_product_category pc on (pc.m_product_category_id=p.m_product_category_id)
where i.dateacct >= date_trunc('year', now()) and i.dateacct < date_trunc('year', now()) + interval '1' year
group by pc.m_product_category_id, pc.name
),
rankedothers as (
select prod_cat, total_sales, rank
from totals where rank <= 10
union
select 'Others', sum(total_sales), 11
from totals where rank > 10
)
select prod_cat, total_sales
from ranked_others
order by rank
Also, I recommend using sargable conditions like the one above, which is slightly more complicated than the one you implemented, but generally worth the extra effort.
I am trying the Sum the 2nd record of one column with the 1st record of another column and store the result in a new column
Here is the example SQL Server table
Emp_Code Emp_Name Month Opening_Balance
G101 Sam 1 1000
G102 James 2 -2500
G103 David 3 3000
G104 Paul 4 1800
G105 Tom 5 -1500
I am trying to get the output as below on the new Reserve column
Emp_Code Emp_Name Month Opening_Balance Reserve
G101 Sam 1 1000 1000
G102 James 2 -2500 -1500
G103 David 3 3000 1500
G104 Paul 4 1800 3300
G105 Tom 5 -1500 1800
Actually the rule for calculating the Reserve column is that
For Month-1 it's the same as Opening Balance
For rest of the months its Reserve for Month-2 = Reserve for Month-1 + Opening Balance for Month-2
You seem to want a cumulative sum. In SQL Server 2012+, you would do:
select t.*,
sum(opening_balance) over (order by [Month]) as Reserve
from t;
In earlier versions, you would do this with a correlated subquery or apply:
select t.*,
(select sum(t2.opening_balance) from t t2 where t2.[Month] <= t.[Month]) as reserve
from t;
You can do a self join.
SELECT t.Emp_Code, t.Emp_Name, t.Month, t.Opening_Balance, t.Opening_Balance + n.Reserve
FROM Table1 t
JOIN Table2 n
ON t.Month = n.Month - 1
I've got the following table:
ID Name Sales
1 Kalle 1
2 Kalle -1
3 Simon 10
4 Simon 20
5 Anna 11
6 Anna 0
7 Tina 0
I want to write a SQL query that only returns the rows that
represents a salesperson with sum of sales > 0.
ID Name Sales
3 Simon 10
4 Simon 20
5 Anna 11
6 Anna 0
Is this possible?
You can easily get names of the people with the sum of sales that are greater than 0 by using the a HAVING clause:
select name
from yourtable
group by name
having sum(sales) > 0;
This query will return both Simon and Anna, then if you want to return all of the details for each of these names you can use the above in a WHERE clause to get the final result:
select id, name, sales
from yourtable
where name in (select name
from yourtable
group by name
having sum(sales) > 0);
See SQL Fiddle with Demo.
You can make it like this, I think the join will be more effective than the where name in() clause.
SELECT Sales.name, Sales.sales
FROM Sales
JOIN (SELECT name FROM Sales GROUP BY Sales.name HAVING SUM(sales) > 0) AS Sales2 ON Sales2.name = Sales.name
This will work on some databases, like oracle, mssql, db2
SELECT ID, Name, Sales
FROM
(
SELECT ID, Name, Sales, sum(sales) over (partition by name) sum1
FROM <table>
) a
WHERE sum1 > 0