Find the most popular combinations SQL - sql

I have 2 tables I want to join to explore the most popular combinations of location, by distinct id, ordered by count. I get location from l, date from d. The results from this join would be:
id loc_id location date
1 111 NYC 20200101
1 222 LA 20200102
2 111 NYC 20200103
2 333 LON 20200103
3 444 NYC 20200105
4 444 LA 20200106
4 555 PAR 20200107
5 111 NYC 20200110
5 222 LA 20200111
I would like to use STRING_AGG if possible, but get an error with the WITHIN statement -
'expecting ')' but got WITHIN
..( I'm on BigQuery for this). Here is what I've attempted so far.
SELECT t.combination, count(*) count
FROM (
SELECT
STRING_AGG(location, ',') WITHIN GROUP (ORDER BY d.date) combination
FROM location as l
JOIN date d
USING (loc_id)
GROUP BY id
) t
WHERE date BETWEEN 20190101 AND 20200228 GROUP BY t.combination
ORDER BY count DESC;
I want to end up with something like:
combination count
NYC, LA 3
NYC, LON 1
LA, PAR 1
NYC 1
If there's another method I'd be happy to change from string_agg.

The correct BQ syntax would be:
SELECT t.combination, count(*) count
FROM (SELECT STRING_AGG(location, ',' ORDER BY d.date) as combination
FROM location l JOIN
date d
USING (loc_id)
GROUP BY id
) t
WHERE date BETWEEN 20190101 AND 20200228
GROUP BY t.combination
ORDER BY count DESC;
Note that your JOIN condition still looks wrong.
And if you are using dates, then I would expect DATE constants.
And your date filtering code won't work in the outer query, because you haven't selected the dates in the inner query. You probably want the filtering in the inner query.
This answer does not address these issues.
BigQuery has quite good documentation. There is no WITHIN GROUP for STRING_AGG().

Related

Retrieving most frequent value for each group in SQL Server

This is what I have:
AirlineName Departure_City No_of_DepartureCity Arrival_City No_of_ArrivalCity
---------------------------------------------------------------------------------------------------- -------------- ------------------- ------------ -----------------
Air Asia MY 2 JPN 2
Emirates Airlines MY 2 JPN 2
Malaysia Airlines MY 2 GER 2
Malaysia Airlines MY 1 JPN 1
Air Asia MY 1 KOR 1
This is what I want:
AirlineName Departure_City No_of_DepartureCity Arrival_City No_of_ArrivalCity
---------------------------------------------------------------------------------------------------- -------------- ------------------- ------------ -----------------
Air Asia MY 2 JPN 2
Emirates Airlines MY 2 JPN 2
Malaysia Airlines MY 2 GER 2
I have already written a query to retrieve the most frequent data for Departure_City and Arrival_City, but I can't make it grouped together and only show the most frequent data for each AirlineName.
This is my query so far:
SELECT Airline.AirlineName, Flight_Schedule.Departure_City, COUNT(Flight_Schedule.Departure_City) AS No_of_DepartureCity, Flight_Schedule.Arrival_City, COUNT(Flight_Schedule.Arrival_City) AS No_of_ArrivalCity
FROM Airline
LEFT JOIN Aircraft ON Airline.AirlineID = Aircraft.AirlineID
LEFT JOIN Flight_Schedule ON Aircraft.AircraftID = Flight_Schedule.AircraftID
GROUP BY Airline.AirlineName, Flight_Schedule.Departure_City, Flight_Schedule.Arrival_City
ORDER BY COUNT(Flight_Schedule.Departure_City)DESC , COUNT(Flight_Schedule.Arrival_City) DESC
You can make use of Rank or Dense_rank (If you want to select more than two rows having same number of cities) function
Demo
with CTE1 AS(
SELECT A.*,
RANK() OVER(PARTITION BY AirlineName ORDER BY No_of_ArrivalCity desc) as rn
FROM TABLE1 A)
SELECT * FROM CTE1 where rn = 1;
As you're grouping by lots of columns, instead of just 'AirlineName' it's grouping by all of the different values across those number of columns.
To return the number of AirlineName's and their frequency try this:
SELECT Airline.AirlineName, COUNT(*) AS [COUNT]
FROM Airline
GROUP BY Airline.AirlineName
ORDER BY COUNT(*) DESC
If you need the additional columns then your code is already correct, because of how you are grouping it and the individual values contained within the columns.

Reconciliation Automation Query

I have one database and time to time i change some part of query as per requirement.
i want to keep record of results of both before and after result of these queries in one table and want to show queries which generate difference.
For Example,
Consider following table
emp_id country salary
---------------------
1 usa 1000
2 uk 2500
3 uk 1200
4 usa 3500
5 usa 4000
6 uk 1100
Now, my before query is :
Before Query:
select count(emp_id) as count,country from table where salary>2000 group by country;
Before Result:
count country
2 usa
1 uk
After Query:
select count(emp_id) as count,country from table where salary<2000 group by country;
After Query Result:
count country
2 uk
1 usa
My Final Result or Table I want is:
column 1 | column 2 | column 3 | column 4 |
2 usa 2 uk
1 uk 1 usa
...... but if query results are same than it shouldn't show in this table.
Thanks in advance.
I believe that you can use the same approach as here.
select t1.*, t2.* -- if you need specific columns without rn than you have to list them here
from
(
select t.*, row_number() over (order by count) rn
from
(
-- query #1
select count(emp_id) as count,country from table where salary>2000 group by country;
) t
) t1
full join
(
select t.*, row_number() over (order by count) rn
from
(
-- query #2
select count(emp_id) as count,country from table where salary<2000 group by country;
) t
) t2 on t1.rn = t2.rn

Counting distinct values from multiple columns

Given the following schema:
id departure arrival
0 BOS LAX
1 SFO SEA
2 MIA LAX
3 RDU BOS
4 JFK DEN
5 LAX SEA
I need to count the total occurrences of each airport. For example, BOS should be 2 (one departure and one arrival).
I'm able to do this with two separate queries:
SELECT departure, COUNT(*) FROM legs
GROUP BY departure ORDER BY COUNT(departure) DESC
and
SELECT arrival, COUNT(*) FROM legs
GROUP BY arrival ORDER BY COUNT(arrival) DESC
but I haven't been able to figure out or find a way to do it in one query. I'd like to have something like the following:
airport count
BOS 2
LAX 2
SEA 2
JFK 1
Do it with union:
select departure as airport, count(*) as count
from (select departure from legs
union all
select arrival from legs)t
group by departure
Use a FULL [OUTER] JOIN on two separate aggregates:
SELECT airport, COALESCE(d.ct, 0) + COALESCE(a.ct, 0) AS "count"
FROM (
SELECT departure AS airport, count(*) AS ct
FROM legs
GROUP BY 1
) d
FULL JOIN (
SELECT arrival AS airport, count(*) AS ct
FROM legs
GROUP BY 1
) a USING (airport)
ORDER BY "count" DESC, airport;
This way you can easily return additional columns for arrival and departure, and you can use indexes on the base table if you should want to select certain airports.
Recent related answer:
Add up conditional counts on multiple columns of the same table

Specific Ordering in SQL

I have a SQL Server 2008 database. In this database, I have a result set that looks like the following:
ID Name Department LastOrderDate
-- ---- ---------- -------------
1 Golf Balls Sports 01/01/2015
2 Compact Disc Electronics 02/01/2015
3 Tires Automotive 01/15/2015
4 T-Shirt Clothing 01/10/2015
5 DVD Electronics 01/07/2015
6 Tennis Balls Sports 01/09/2015
7 Sweatshirt Clothing 01/04/2015
...
For some reason, my users want to get the results ordered by department, then last order date. However, not by department name. Instead, the departments will be in a specific order. For example, they want to see the results ordered by Electronics, Automotive, Sports, then Clothing. To throw another kink in works, I cannot update the table schema.
Is there a way to do this with a SQL Query? If so, how? Currently, I'm stuck at
SELECT *
FROM
vOrders o
ORDER BY
o.LastOrderDate
Thank you!
You can use case expression ;
order by case when department = 'Electronics' then 1
when department = 'Automotive' then 2
when department = 'Sports' then 3
when department = 'Clothing' then 4
else 5 end
create a table for the departments that has the name (or better id) of the department and the display order. then join to that table and order by the display order column.
alternatively you can do a order by case:
ORDER BY CASE WHEN Department = 'Electronics' THEN 1
WHEN Department = 'Automotive' THEN 2
...
END
(that is not recommended for larger tables)
Here solution with CTE
with c (iOrder, dept)
as (
Select 1, 'Electronics'
Union
Select 2, 'Automotive'
Union
Select 3, 'Sports'
Union
Select 4, 'Clothing'
)
Select * from c
SELECT o.*
FROM
vOrders o join c
on c.dept = o.Department
ORDER BY
c.iOrder

SQL Query - SELECT distinct IDs with 2 extra column

Im working in an SQL Query like this: (sorted by the station visits)
TRAIN_ID TYPE STATION
111 'KC' New York
111 'KC' Washington
111 'KC' Boston
111 'KC' Denver
222 'FC' London
222 'FC' Paris
I'd like to SELECT distinct trains, and actual row must include the first and the last station like:
TRAIN_ID TYPE FIRSTSTATION LASTSTATION
111 'KC' New York Denver
222 'FC' Denver Paris
Anyone can give a hand? Thank you in anticipation!
Assuming you find something to define an order on the stations so that you can identify the "last" and "first" one, the following should work:
WITH numbered_stations AS (
SELECT train_id,
type,
row_number() over (partition by train_id order by some_order_column) as rn,
count(*) over (partition by train_id) as total_stations
FROM the_unknown_table
)
SELECT f.train_id,
f.type,
f.station as first_station,
l.station as last_station
FROM (SELECT train_id,
type
station
FROM numbered_stations
WHERE rn = 1
) f
JOIN (SELECT train_id,
type,
station
FROM numbered_stations
WHERE rn = total_stations) l
ON f.train_id = l.train_id
ORDER BY train_id
This assumes that some_order_column can be used to identify the last and first station.
It also assumes that the type is always the same for all combinations of train_id and station.
The shown syntax is standard ANSI SQL and should work on most modern DBMS.