Avoid Unions to get TOP count - sql

Here are two tables:
LocationId Address City State Zip
1 2100, 1st St Austin TX 76819
2 2200, 2nd St Austin TX 76829
3 2300, 3rd St Austin TX 76839
4 2400, 4th St Austin TX 76849
5 2500, 5th St Austin TX 76859
6 2600, 6th St Austin TX 76869
TripId PassengerId FromLocationId ToLocationId
1 746896 1 2
2 746896 2 1
3 234456 1 3
4 234456 3 1
5 234456 1 4
6 234456 4 1
7 234456 1 6
8 234456 6 1
9 746896 1 2
10 746896 2 1
11 746896 1 2
12 746896 2 1
I want TOP 5 locations which each passenger has traveled to (does not matter if its from or to location). I can get it using a UNION, but was wondering if there was a better way to do this.
My Solution:
select top 5 *
from
(select count(l.LocationId) as cnt, l.LocationId, l.Address1, l.Address2, l.City, St.State , l.Zip
from
Trip t
join LOCATION l on t.FromLocationId = l.LocationId
where t.PassengerId = 746896
group by count(l.LocationId) as cnt, l.LocationId, l.Address1, l.Address2, l.City, St.State , l.Zip
UNION
select count(l.LocationId) as cnt, l.LocationId, l.Address1, l.Address2, l.City, St.State , l.Zip
from
Trip t
join LOCATION l on t.ToLocationId = l.LocationId
where t.PassengerId = 746896
group by count(l.LocationId) as cnt, l.LocationId, l.Address1, l.Address2, l.City, St.State , l.Zip
) as tbl
order by cnt desc

This will give you top 5 location.
SELECT TOP 5 tmp.fromlocationid AS locationid,
Count(tmp.fromlocationid) AS Times
FROM (SELECT fromlocationid
FROM trip
UNION ALL
SELECT tolocationid
FROM trip) tmp
GROUP BY tmp.fromlocationid
Method 1: This will give you top 5 location of each passenger.
WITH cte AS
( SELECT passengerid,
locationid,
Count(locationid) AS Times,
Row_number() OVER(partition BY passengerid ORDER BY passengerid ASC) AS RowNum
FROM (SELECT tripid, passengerid, fromlocationid AS locationid
FROM trip
UNION ALL
SELECT tripid, passengerid, tolocationid AS locationid
FROM trip) tmp
GROUP BY passengerid, locationid )
SELECT *
FROM cte
WHERE rownum <= 5
ORDER BY passengerid, Times DESC
Method 2: Same result without Union Operator (Top 5 location of each passenger)
WITH cte AS
( SELECT passengerid,
locationid,
Count(locationid) AS Times,
Row_number() OVER(partition BY passengerid ORDER BY passengerid ASC) AS RowNum
FROM trip
UNPIVOT ( locationid
FOR subject IN (fromlocationid, tolocationid) ) u
GROUP BY passengerid, locationid )
SELECT *
FROM cte
WHERE rownum <= 5
ORDER BY passengerid, times DESC
If you also want to get the location details, you can simply join the location table.
SELECT cte.* , location.*
FROM cte
INNER JOIN location ON location.locationid = cte.locationid
WHERE rownum <= 5
ORDER BY passengerid, times DESC
Reference
- https://stackoverflow.com/a/19056083/6327676

YOou'll need to replace the SELECT *'s with the columns you need, however, something like this should work:
WITH Visits AS (
SELECT *,
COUNT(*) OVER (PARTITION BY t.PassengerID, L.LocationID) AS Visits
FROM Trip T
JOIN [Location] L ON T.FromLocationId = L.LocationId),
Rankings AS (
SELECT *,
DENSE_RANK() OVER (PARTITION BY V.PassengerID ORDER BY Visits DESC) AS Ranking
FROM Visits V)
SELECT *
FROM Rankings
WHERE Ranking <= 5;

Further simplified solution
select top 3 * from
(
Select distinct count(locationId) as cnt, locationId from trip
unpivot
(
locationId
for direction in (fromLocationId, toLocationId)
)u
where passengerId IN (746896, 234456)
group by direction, locationId
)as tbl2
order by cnt desc;
Solution combining columns
The main issue for me is avoiding union to combine the two columns.
The UNPIVOT command can do this.
select top 3 * from (
select count(locationId) cnt, locationId
from
(
Select valu as locationId, passengerId from trip
unpivot
(
valu
for loc in (fromLocationId, toLocationId)
)u
)united
where passengerId IN (746896, 234456)
group by locationId
) as tbl
order by cnt desc;
http://sqlfiddle.com/#!18/cec8b/136
If you want to get the counts by direction:
select top 3 * from (
select count(locationId) cnt, locationId, direction
from
(
Select valu as locationId, direction, passengerId from trip
unpivot
(
valu
for direction in (fromLocationId, toLocationId)
)u
)united
where passengerId IN (746896, 234456)
group by locationId, direction
) as tbl
order by cnt desc;
http://sqlfiddle.com/#!18/cec8b/139
Same Results as you ( minus some minor descriptions )
select top 3 * from
(
select distinct * from (
select count(locationId) cnt, locationId
from
(
Select valu as locationId, direction, passengerId from trip
unpivot
(
valu
for direction in (fromLocationId, toLocationId)
)u
)united
where passengerId IN (746896, 234456)
group by locationId, direction
) as tbl
)as tbl2
order by cnt desc;

You can do this without union all:
select top (5) t.passengerid, v.locationid, count(*)
from trip t cross apply
(values (fromlocationid), (tolocationid)) v(locationid) join
location l
on v.locationid = l.locationid
where t.PassengerId = 746896
group by t.passengerid, v.locationid
order by count(*) desc;
If you want an answer for all passengers, it would be a similar idea, using row_number(), but your query suggests you want the answer only for one customer at a time.
You can include additional fields from location as well.
Here is a SQL Fiddle.

Related

How to select varying count of items per colum value?

I work with Postgresql.
I have a sql code
SELECT lp."RegionId", COUNT(w."Id") FROM public.workplace w
GROUP BY lp."RegionId"
that returns to me
RegionId | Count
1 | 3
2 | 12
3 | 5
I have table 'person'. Each person have RegionId.
So i for region 1 i want to select first 3 persons, for region 2 select first 12 persons, for region 3 select first 5 persons.
So how can i use it as subquery to table 'person'?
WITH (SELECT lp."RegionId", COUNT(w."Id") FROM public.workplace w
GROUP BY lp."RegionId") AS pc
SELECT * FROM public.person p
???????
limit pc."Count"
???
Something like:
SELECT p.*
FROM (SELECT *, row_number() OVER (PARTITION BY RegionId ORDER BY PersonId) AS rn
FROM person) AS p
JOIN (SELECT RegionId, count(*) AS cnt
FROM workplace
GROUP BY RegionId) AS r ON p.RegionId = r.RegionId
WHERE p.rn <= r.cnt
ORDER BY p.RegionId, p.PersonId;

Doing a cross pivot in Google BigQuery

I have asked a previous question about doing a multi-level aggregation query on the X-axis here: Get the top patent countries, codes in a BQ public dataset.
Here is how the query (copied from the accepted answer works) to get:
Top 2 Countries by Count, and within those countries, top 2 Codes by Count
WITH A AS (
SELECT country_code
FROM `patents-public-data.patents.publications`
GROUP BY country_code
ORDER BY COUNT(1) DESC
LIMIT 2
), B AS (
SELECT
country_code,
application_kind,
COUNT(1) application_kind_count
FROM `patents-public-data.patents.publications`
WHERE country_code IN (SELECT country_code FROM A)
GROUP BY country_code, application_kind
), C AS (
SELECT
country_code,
application_kind,
application_kind_count,
DENSE_RANK() OVER(PARTITION BY country_code ORDER BY application_kind_count DESC) AS application_kind_rank
FROM B
)
SELECT
country_code,
application_kind,
application_kind_count
FROM C
WHERE application_kind_rank <= 2
And I get something like:
country_code application_kind count
JP A 125
JP U 124
CN A 118
CN U 101
Now I would like to add the following pivot on the y-axis: to get the following:
X: Top 2 Countries by Count, and within those countries, top 2 Codes by Count
Y: Top 2 family_id by Count, Top 2 priority_date by Count
The final results would then look like:
I am able to build the Y-query in a second query --
WITH A AS (
SELECT family_id
FROM `patents-public-data.patents.publications`
GROUP BY family_id
ORDER BY COUNT(1) DESC
LIMIT 2
), B AS (
SELECT
family_id,
priority_date,
COUNT(1) priority_date_count
FROM `patents-public-data.patents.publications`
WHERE family_id IN (SELECT family_id FROM A)
GROUP BY family_id, priority_date
), C AS (
SELECT
family_id,
priority_date,
priority_date_count,
DENSE_RANK() OVER(PARTITION BY family_id ORDER BY priority_date_count DESC) AS priority_date_rank
FROM B
)
SELECT
family_id,
priority_date,
priority_date_count
FROM C
WHERE priority_date_rank <= 2
However, I am not quite sure how to merge them together, in a single query or in two.
Below is for BigQuery Standard SQL and is just demo of the approach and not pretending to be 100% representing requested logic
WITH A_X AS (
SELECT country_code FROM `patents-public-data.patents.publications`
GROUP BY country_code ORDER BY COUNT(1) DESC LIMIT 2
), B_X AS (
SELECT country_code, application_kind, COUNT(1) application_kind_count
FROM `patents-public-data.patents.publications` WHERE country_code IN (SELECT country_code FROM A_X)
GROUP BY country_code, application_kind
), C_X AS (
SELECT country_code, application_kind, application_kind_count,
DENSE_RANK() OVER(PARTITION BY country_code ORDER BY application_kind_count DESC) AS application_kind_rank
FROM B_X
), X AS (
SELECT country_code, application_kind, application_kind_count
FROM C_X WHERE application_kind_rank <= 2
), A_Y AS (
SELECT family_id FROM `patents-public-data.patents.publications`
JOIN X USING(country_code, application_kind)
GROUP BY family_id
ORDER BY COUNT(1) DESC LIMIT 2
), B_Y AS (
SELECT family_id, priority_date, COUNT(1) priority_date_count
FROM `patents-public-data.patents.publications` WHERE family_id IN (SELECT family_id FROM A_Y)
GROUP BY family_id, priority_date
), C_Y AS (
SELECT family_id, priority_date, priority_date_count,
DENSE_RANK() OVER(PARTITION BY family_id ORDER BY priority_date_count DESC) AS pos_date
FROM B_Y
), Y AS (
SELECT family_id, priority_date, pos_date, DENSE_RANK() OVER(ORDER BY family_id) pos_family
FROM C_Y WHERE pos_date <= 2
)
SELECT country_code, application_kind,
COUNTIF(pos_family = 1 AND pos_date = 1) `family1_date1`,
COUNTIF(pos_family = 1 AND pos_date = 2) `family1_date2`,
COUNTIF(pos_family = 2 AND pos_date = 1) `family2_date1`,
COUNTIF(pos_family = 2 AND pos_date = 2) `family2_date2`
FROM `patents-public-data.patents.publications`
JOIN Y USING(family_id, priority_date)
WHERE country_code IN (SELECT country_code FROM X)
AND application_kind IN (SELECT application_kind FROM x)
GROUP BY country_code, application_kind
the result is
Obviously, there are number of zeroes above because of intersection logic

Find top N most frequent categories with top N most frequent sub-categories for each category

I'm trying to make a single query that will retrieve:
The top e.g. 3 most popular brands from a list of cars. For each of the top 3 brands I want to retrieve the top 5 most popular models.
I tried with both a ranking/partitioning strategy and a distinct ON strategy but I cannot seem to figure out how I can get the limits to works within two queries.
Here is some sample data: http://sqlfiddle.com/#!15/1e81d5/1
From the ranking query I would expect an output like this, given the sample data (order not important):
brand car_mode count
'Audi' 'A4' 3
'Audi' 'A1' 3
'Audi' 'Q7' 2
'Audi' 'Q5' 2
'Audi' 'A3' 2
'VW' 'Passat' 3
'VW' 'Beetle' 3
'VW' 'Caravelle' 2
'VW' 'Golf' 2
'VW' 'Fox' 2
'Volvo' 'V70' 3
'Volvo' 'V40' 3
'Volvo' 'S60' 2
'Volvo' 'XC70' 2
'Volvo' 'V50' 2
Turns out I could use LATERAL join as suggested in comments. Thanks.
SELECT brand, car_model, the_count
FROM
(
SELECT brand FROM cars GROUP BY brand ORDER BY COUNT(*) DESC LIMIT 3
) o1
INNER JOIN LATERAL
(
SELECT car_model, count(*) as the_count
FROM cars
WHERE brand = o1.brand
GROUP BY brand, car_model
ORDER BY count(*) DESC LIMIT 5
) o2 ON true;
http://sqlfiddle.com/#!15/1e81d5/9
you can try by using cte and window function row_number()
with cte as
(
select brand,car_model,count(*) as cnt from cars group by brand,car_model
) , cte2 as
(
select * ,row_number() over(partition by brand order by cnt desc) rn from cte
)
select brand,car_model,cnt from cte2 where rn<=5
demo link
You can use window functions for this:
select brand, car_model, cnt_car
from (select c.*, dense_rank() over (order by cnt_brand, brand) as seqnum_b
from (select brand, car_model, count(*) as cnt_car,
row_number() over (partition by brand order by count(*) desc) as seqnum_bc,
sum(count(*)) over (partition by brand) as cnt_brand
from cars c
group by brand, car_model
) c
) c
where seqnum_bc <= 5 and seqnum_b <= 3
order by cnt_brand desc, brand, cnt desc;
If you know that each brand (or at least each top brand) has at least five cars, then you can simplify the query to:
select brand, car_model, cnt_car
from (select brand, car_model, count(*) as cnt_car,
row_number() over (partition by brand order by count(*) desc) as seqnum_bc,
sum(count(*)) over (partition by brand) as cnt_brand
from cars c
group by brand, car_model
) c
where seqnum_bc <= 5
order by cnt_brand desc, brand, cnt desc
limit 15

SQL - Finding Customer's largest Location by Order $

I have a table with customer IDs, location IDs, and their order values. I need to select the location ID for each customer with the largest spend
Customer | Location | Order $
1 | 1A | 100
1 | 1A | 20
1 | 1B | 100
2 | 2A | 50
2 | 2B | 20
2 | 2B | 50
So I would get
Customer | Location | Order $
1 | 1A | 120
2 | 2B | 70
I tried something like this:
SELECT
a.CUST
,a.LOC
,c.BOOKINGS
FROM (SELECT DISTINCT TOP 1 b.CUST, b.LOC, sum(b.ORDER_VAL) as BOOKINGS
FROM ORDER_TABLE b
GROUP BY b.CUST, b.LOC
ORDER BY BOOKINGS DESC) as c
INNER JOIN ORDER_TABLE a
ON a.CUST = c.CUST
But that just returns the top order.
Just use variables to emulate ROW_NUM()
DEMO
SELECT *
FROM ( SELECT `Customer`, `Location`, SUM(`Order`) as `Order`,
#rn := IF(#customer = `Customer`,
#rn + 1,
IF(#customer := `Customer`, 1, 1)
) as rn
FROM Table1
CROSS JOIN (SELECT #rn := 0, #customer := '') as par
GROUP BY `Customer`, `Location`
ORDER BY `Customer`, SUM(`Order`) DESC
) t
WHERE t.rn = 1
Firs you have to sum the values for each location:
select Customer, Location, Sum(Order) as tot_order
from order_table
group by Customer, Location
then you can get the maximum order with MAX, and the top location with a combination of group_concat that will return all locations, ordered by total desc, and substring_index in order to get only the top one:
select
Customer,
substring_index(
group_concat(Location order by tot_order desc),
',', 1
) as location,
Max(tot_order) as max_order
from (
select Customer, Location, Sum(Order) as tot_order
from order_table
group by Customer, Location
) s
group by Customer
(if there's a tie, two locations with the same top order, this query will return just one)
This seems like an order by using aggregate function problem. Here is my stab at it;
SELECT
c.customer,
c.location,
SUM(`order`) as `order_total`,
(
SELECT
SUM(`order`) as `order_total`
FROM customer cm
WHERE cm.customer = c.customer
GROUP BY location
ORDER BY `order_total` DESC LIMIT 1
) as max_order_amount
FROM customer c
GROUP BY location
HAVING max_order_amount = order_total
Here is the SQL fiddle. http://sqlfiddle.com/#!9/2ac0d1/1
This is how I'd handle it (maybe not the best method?) - I wrote it using a CTE first, only to see that MySQL doesn't support CTEs, then switched to writing the same subquery twice:
SELECT B.Customer, C.Location, B.MaxOrderTotal
FROM
(
SELECT A.Customer, MAX(A.OrderTotal) AS MaxOrderTotal
FROM
(
SELECT Customer, Location, SUM(`Order`) AS OrderTotal
FROM Table1
GROUP BY Customer, Location
) AS A
GROUP BY A.Customer
) AS B INNER JOIN
(
SELECT Customer, Location, SUM(`Order`) AS OrderTotal
FROM Table1
GROUP BY Customer, Location
) AS C ON B.Customer = C.Customer AND B.MaxOrderTotal = C.OrderTotal;
Edit: used the table structure provided
This solution will provide multiple rows in the event of a tie.
SQL fiddle for this solution
How about:
select a.*
from (
select customer, location, SUM(val) as s
from orders
group by customer, location
) as a
left join
(
select customer, MAX(b.tot) as t
from (
select customer, location, SUM(val) as tot
from orders
group by customer, location
) as b
group by customer
) as c
on a.customer = c.customer where a.s = c.t;
with
Q_1 as
(
select customer,location, sum(order_$) as order_sum
from cust_order
group by customer,location
order by customer, order_sum desc
),
Q_2 as
(
select customer,max(order_sum) as order_max
from Q_1
group by customer
),
Q_3 as
(
select Q_1.customer,Q_1.location,Q_1.order_sum
from Q_1 inner join Q_2 on Q_1.customer = Q_2.customer and Q_1.order_sum = Q_2.order_max
)
select * from Q_3
Q_1 - selects normal aggregate, Q_2 - selects max(aggregate) out of Q_1 and Q_3 selects customer,location, sum(order) from Q_1 which matches with Q_2

Select Max two rows of each account SQL Server

I have this table
ID AGE ACCNUM NAME
--------------------------------
1 10 55409 Intro
2 6 55409 Chapter1
3 4 55409 Chapter2
4 3 69591 Intro
5 6 69591 Outro
6 0 40322 Intro
And I need a query that returns the two max age from each ACCNUM
in this case, records:
1, 2, 4, 5, 6
I have tried too many queries but nothing works for me.
I tried this query
Select
T1.accnum, T1.age
from
table1 as T1
inner join
(select
accnum, max(age) as max
from table1
group by accnum) as T2 on T1.accnum = T2.accnum
and (T1.age = T2.max or T1.age = T2.max -1)
TSQL Ranking Functions: Row_Number() https://msdn.microsoft.com/en-us/library/ms186734.aspx
select id, age, accnum, name
from
(
select id, age, accnum, name, ROW_NUMBER() Over (Partition By accnum order by age desc) as rn
from yourtable
) a
where a.rn <= 2
You can use row_number():
select accnum
, age
from ( select accnum
, age
, row_number() over(partition by accnum order by age desc) as r
from table1 as T1) t where r < 3
CODE:
WITH CTE AS (SELECT ID, AGE, ACCNUM, NAME,
ROW_NUMBER() OVER(PARTITION BY ACCNUM ORDER BY AGE DESC) AS ROW_NUM
FROM T1)
SELECT ID, AGE, ACCNUM, NAME
FROM CTE
WHERE ROW_NUM <= 2
Uses a common table expression to achieve the desired result.
SQL Fiddle