How to aggregate using distinct values across two columns? - sql

I have the following data in an orders table:
revenue expenses location_1 location_2
3 6 London New York
6 11 Paris Toronto
1 8 Houston Sydney
1 4 Chicago Los Angeles
2 5 New York London
7 11 New York Boston
4 6 Toronto Paris
5 11 Toronto New York
1 2 Los Angeles London
0 0 Mexico City London
I would like to create a result set that has 3 columns:
a list of the 10 DISTINCT city names
the sum of revenue for each city
the sum of expenses for each city
The desired result is:
location revenue expenses
London 6 13
New York 17 33
Paris 10 17
Toronto 15 28
Houston 1 8
Sydney 1 8
Chicago 1 4
Los Angeles 2 6
Boston 7 11
Mexico City 0 0
Is it possible to aggregate on distinct values across two columns? If yes, how would I do it?
Here is a fiddle:
http://sqlfiddle.com/#!9/0b1105/1

Shorter (and often faster):
SELECT location, sum(revenue) AS rev, sum(expenses) AS exp
FROM (
SELECT location_1 AS location, revenue, expenses FROM orders
UNION ALL
SELECT location_2 , revenue, expenses FROM orders
) sub
GROUP BY 1;
May be faster:
WITH cte AS (
SELECT location_1, location_2, revenue AS rev, expenses AS exp
FROM orders
)
SELECT location, sum(rev) AS rev, sum(exp) AS exp
FROM (
SELECT location_1 AS location, rev, exp FROM cte
UNION ALL
SELECT location_2 , rev, exp FROM cte
) sub
GROUP BY 1;
The (materialized!) CTE adds overhead, which may outweigh the benefit. Depends on many factors like total table size, available indexes, possible bloat, available RAM, storage speed, Postgres version, ...
fiddle

You could UNION ALL two queries and then select from it...
select location, sum(rev) as rev, sum(exp) as exp
from (
select location_1 as location, sum(revenue) as rev, sum(expenses) as exp
from orders
group by location_1
union all
select location_2 as location, sum(revenue) as rev, sum(expenses) as exp
from orders
group by location_2
)z
group by location
order by 1

Related

Select unique countries with more than one customer

I need to show the countries that have more than one individual.
Customers
customer_id first_name last_name age country
1 John Doe 31 USA
2 Robert Luna 22 USA
3 David Robinson 22 UK
4 John Reinhardt 25 UK
5 Betty Doe 28 UAE
So the query should return
customer_id first_name last_name age country
1 John Doe 31 USA
2 Robert Luna 22 USA
3 David Robinson 22 UK
4 John Reinhardt 25 UK
I tried tis query but it didn't work.
SELECT last_name, Country
FROM Customers
GROUP BY Country
HAVING COUNT(Customer_id) > 1;
The actual table can be found here
Try using the following query. Thanks
SELECT * FROM CUSTOMERS C
WHERE C.COUNTRY IN (SELECT COUNTRY FROM CUSTOMERS GROUP BY COUNTRY HAVING COUNT(*)>1)
You could use a windowed count as a filter:
with c as (
select *, Count(*) over(partition by country) cnt
from Customers
)
select *
from c
where cnt > 1;

Aggregate before and after a date column

I have two tables: db.transactions and db.salesman, which I would like to combine in order to create an output that has aggregated sales before each salesman's hire date and after each salesman's hire date.
select * from db.transactions
index sales_rep sales trx_date
1 Tom 200 9/18/2020
2 Jerry 435 6/21/2020
3 Patrick 1400 4/30/2020
4 Tom 560 5/24/2020
5 Francis 240 1/2/2021
select * from db.salesman
index sales_rep hire_date
1 Tom 8/19/2020
2 Jerry 1/28/2020
3 Patrick 4/6/2020
4 Francis 9/4/2020
I would like to aggregate sales from db.transactions before and after each sales rep's hire date.
Expected output:
index sales_rep hire_date agg_sales_before_hire_date agg_sales_after_hire_date
1 Tom 8/19/2020 1200 5000
2 Jerry 1/28/2020 500 900
3 Patrick 4/6/2020 5000 300
4 Francis 9/4/2020 2900 1500
For a single sales rep, to calculate the agg_sales_before_hire_date is likely:
select tx.sales_rep, tx.sum(sales)
from db.transactions tx
inner join db.salesman sm on sm.sales_rep = tx.sales_rep
where hire_date < '8/19/2020' and sales_rep = 'Tom'
group by tx.sales_rep
PostGRESQL. I am also open to the idea of doing it into Tableau or Python.
Using CROSS JOIN LATERAL
select
sa.sales_rep, sa.hire_date,
l.agg_sales_before_hire_date,
l.agg_sales_after_hire_date
from salesman sa
cross join lateral
(
select
sum(tx.sales) filter (where tx.trx_date < sa.hire_date) agg_sales_before_hire_date,
sum(tx.sales) filter (where tx.trx_date >= sa.hire_date) agg_sales_after_hire_date
from transactions tx
where tx.sales_rep = sa.sales_rep
) l;
Use conditional aggregation:
select tx.sales_rep,
sum(case when tx.txn_date < sm.hire_date then sales else 0 end) as before_sales,
sum(case when tx.txn_date >= sm.hire_date then sales else 0 end) as after_sales
from db.transactions tx inner join
db.salesman sm
on sm.sales_rep = tx.sales_rep
group by tx.sales_rep;
EDIT:
In Postgres, you would use filter for the logic:
select tx.sales_rep,
sum(sales) filter (where tx.txn_date < sm.hire_date) as before_sales,
sum(sales) filter (where tx.txn_date >= sm.hire_date then sales) as after_sales

Count occurrences with exclude criteria

I have a Table
City ID
Austin 123
Austin 123
Austin 123
Austin 145
Austin 145
Chicago 12
Chicago 12
Houston 24
Houston 45
Houston 45
Now I want to count the occurrences of all Citis with different ids so since Chicago has only one id (=12) I am not interested in Chicago and it should not appear in the resultset that should looks like this:
city Id Occurrences
Austin 123 3
Austin 145 2
Houston 34 1
Houston 45 2
I am able to get myself an overview with
select city, Id from Table
group by city, Id
But I am not sure how to only select the once having different ids and to count them.
Could anyone help me out here?
You can use window functions and aggregation:
select city, id, occurences
from (
select city, id, count(*) occurences, count(*) over(partition by city) cnt_city
from mytable
group by city, id
) t
where cnt_city > 1

SQL ordering cities ascending and persons descending

I have been stuck in complicated problem. I do not know the version of this SQL, it is school edition. But it is not relevant info now anyway.
I want order cities ascending and numbers descending. With descending numbers I mean when there is same city couple times it orders then biggest number first.
I also need row numbers, I have tried SELECT ROW_NUMBER() OVER(ORDER BY COUNT(FIRST_NAME)) row with no succes.
I have two tables called CUSTOMERS and EMPLOYEES. Both of them having FIRST_NAME, LAST_NAME, CITY.
Now I have this kind of code:
SELECT
CITY, COUNT(FIRST_NAME),
CASE WHEN COUNT(FIRST_NAME) >= 0 THEN 'CUSTOMERS'
END
FROM CUSTOMERS
GROUP BY CITY
UNION
SELECT
CITY, COUNT(FIRST_NAME),
CASE WHEN COUNT(FIRST_NAME) >= 0 THEN 'EMPLOYEES'
END
FROM EMPLOYEES
GROUP BY CITY
This SQL code gives me list like this:
CITY
NEW YORK 2 CUSTOMERS
MIAMI 1 CUSTOMERS
MIAMI 4 EMPLOYEES
LOS ANGELES 1 CUSTOMERS
CHIGACO 1 CUSTOMERS
HOUSTON 1 CUSTOMERS
DALLAS 2 CUSTOMERS
SAN JOSE 2 CUSTOMERS
SEATTLE 2 CUSTOMERS
SEATTLE 5 EMPLOYEES
BOSTON 1 CUSTOMERS
BOSTON 3 EMPLOYEES
I want it look like this:
ROW CITY
1 NEW YORK 2 CUSTOMERS
2 MIAMI 4 EMPLOYEES
3 MIAMI 1 CUSTOMERS
4 LOS ANGELES 1 CUSTOMERS
5 CHIGACO 1 CUSTOMERS
6 HOUSTON 1 CUSTOMERS
7 DALLAS 2 CUSTOMERS
8 SAN JOSE 2 CUSTOMERS
9 SEATTLE 5 EMPLOYEES
10 SEATTLE 2 CUSTOMERS
11 BOSTON 3 EMPLOYEES
12 BOSTON 1 CUSTOMERS
You can use window functions in the ORDER BY:
SELECT c.*
FROM ((SELECT CITY, COUNT(*) as cnt, 'CUSTOMERS' as WHICH
FROM CUSTOMERS
GROUP BY CITY
) UNION ALL
(SELECT CITY, COUNT(*), 'EMPLOYEES'
FROM EMPLOYEES
GROUP BY CITY
)
) c
ORDER BY MAX(cnt) OVER (PARTITION BY city) DESC,
city,
cnt DESC;

SQL Query to get the top [n] records, with a twist

I have an inventory table with hundreds of thousands of records. Let's say these are the first few records:
Item_Cat Item City Qty
---------------------------------
Furniture Table Boston 150
Furniture Table Phoenix 175
Furniture Table Tampa 300
Furniture Chair Dallas 150
Furniture Chair Boston 150
Furniture Chair LA 220
Furniture Chair Boise 50
Furniture Sofa Chicago 110
Hardware Hammer New York 750
Hardware Hammer LA 50
How can I get the results like this:
Item_Cat Item Max_City1 Max_Qty1 Max_City2 Max_Qty2
----------------------------------------------------------
Furniture Table Tampa 300 Phoenix 175
Furniture Chair LA 220 Boston 150
Furniture Sofa Chicago 110 NULL NULL
Hardware Hammer New York 750 LA 50
Can this be done with the PIVOT function? Or maybe with other SQL functions (MAX, TOP n, etc. maybe?)
One way is to use the window function row_number() and partition by item_cat, item and then use conditional aggregation.
Something like this should work:
WITH cte AS (
SELECT
Item_Cat, Item, City, Qty,
rn = ROW_NUMBER() OVER (PARTITION BY Item_Cat, Item ORDER BY Qty DESC)
FROM t -- your table
)
SELECT
Item_Cat
, Item
, Max_City1 = MAX(CASE WHEN rn = 1 THEN City END)
, Max_Qty1 = MAX(CASE WHEN rn = 1 THEN Qty END)
, Max_City2 = MAX(CASE WHEN rn = 2 THEN City END)
, Max_Qty2 = MAX(CASE WHEN rn = 2 THEN Qty END)
FROM cte
GROUP BY Item_Cat, Item
ORDER BY Item_Cat, Max_qty1 DESC
Sample SQL Fiddle
This should work in all versions of SQL Server from 2005 (if memory serves me right).