Counting distinct values from multiple columns - sql

Given the following schema:
id departure arrival
0 BOS LAX
1 SFO SEA
2 MIA LAX
3 RDU BOS
4 JFK DEN
5 LAX SEA
I need to count the total occurrences of each airport. For example, BOS should be 2 (one departure and one arrival).
I'm able to do this with two separate queries:
SELECT departure, COUNT(*) FROM legs
GROUP BY departure ORDER BY COUNT(departure) DESC
and
SELECT arrival, COUNT(*) FROM legs
GROUP BY arrival ORDER BY COUNT(arrival) DESC
but I haven't been able to figure out or find a way to do it in one query. I'd like to have something like the following:
airport count
BOS 2
LAX 2
SEA 2
JFK 1

Do it with union:
select departure as airport, count(*) as count
from (select departure from legs
union all
select arrival from legs)t
group by departure

Use a FULL [OUTER] JOIN on two separate aggregates:
SELECT airport, COALESCE(d.ct, 0) + COALESCE(a.ct, 0) AS "count"
FROM (
SELECT departure AS airport, count(*) AS ct
FROM legs
GROUP BY 1
) d
FULL JOIN (
SELECT arrival AS airport, count(*) AS ct
FROM legs
GROUP BY 1
) a USING (airport)
ORDER BY "count" DESC, airport;
This way you can easily return additional columns for arrival and departure, and you can use indexes on the base table if you should want to select certain airports.
Recent related answer:
Add up conditional counts on multiple columns of the same table

Related

How to check how many times some values are duplicated?

I have table like below:
city | segment
------------------
London | A
London | B
New York | A
Berlin | B
Barcelona | C
Barcelona | H
Barcelona | E
Each city should have only one segment, but as you can see there are two cities (London and Barcelona) that have more than one segment.
It is essential that in result table I need only these cities which have > 1 segmnet
As a result I need somethig like below:
city - city based on table above
no_segments - number of segments which have defined city based on table above
segments - segments of defined city based on table above
city
no_segments
segments
London
2
A
B
Barcelona
3
C
H
E
How can I do that in Oracle?
You can use COUNT(*) OVER ()(in order to get number of segments) and ROW_NUMBER()(in order to prepare the results those will be conditionally displayed) analytic functions such as
WITH t1 AS
(
SELECT city,
segment,
COUNT(*) OVER (PARTITION BY city) AS no_segments,
ROW_NUMBER() OVER (PARTITION BY city ORDER BY segment) rn
FROM t
)
SELECT DECODE(rn,1,city) AS city,
DECODE(rn,1,no_segments) AS no_segments,
segment
FROM t1
WHERE no_segments > 1
ORDER BY t1.city, segment
Demo
Another way to do this is:
SELECT NULLIF(CITY, PREV_CITY) AS CITY,
SEGMENT
FROM (SELECT CITY,
LAG(CITY) OVER (ORDER BY CITY DESC) AS PREV_CITY,
SEGMENT,
COUNT(SEGMENT) OVER (PARTITION BY CITY) AS CITY_SEGMENT_COUNT
FROM CITY_SEGMENTS)
WHERE CITY_SEGMENT_COUNT > 1
Using LAG() to determine the "previous" CITY allows us to directly compare the CITY values, which in my mind is clearer that using ROW_NUMBER = 1.
db<>fiddle here
;with cte as (
Select city, count(seg) as cntseg
From table1
Group by city having count(seg) > 1
)
Select a.city, b.cntseg, a.seg
From table1 as a join cte as b
On a.city = b.city

Find the most popular combinations SQL

I have 2 tables I want to join to explore the most popular combinations of location, by distinct id, ordered by count. I get location from l, date from d. The results from this join would be:
id loc_id location date
1 111 NYC 20200101
1 222 LA 20200102
2 111 NYC 20200103
2 333 LON 20200103
3 444 NYC 20200105
4 444 LA 20200106
4 555 PAR 20200107
5 111 NYC 20200110
5 222 LA 20200111
I would like to use STRING_AGG if possible, but get an error with the WITHIN statement -
'expecting ')' but got WITHIN
..( I'm on BigQuery for this). Here is what I've attempted so far.
SELECT t.combination, count(*) count
FROM (
SELECT
STRING_AGG(location, ',') WITHIN GROUP (ORDER BY d.date) combination
FROM location as l
JOIN date d
USING (loc_id)
GROUP BY id
) t
WHERE date BETWEEN 20190101 AND 20200228 GROUP BY t.combination
ORDER BY count DESC;
I want to end up with something like:
combination count
NYC, LA 3
NYC, LON 1
LA, PAR 1
NYC 1
If there's another method I'd be happy to change from string_agg.
The correct BQ syntax would be:
SELECT t.combination, count(*) count
FROM (SELECT STRING_AGG(location, ',' ORDER BY d.date) as combination
FROM location l JOIN
date d
USING (loc_id)
GROUP BY id
) t
WHERE date BETWEEN 20190101 AND 20200228
GROUP BY t.combination
ORDER BY count DESC;
Note that your JOIN condition still looks wrong.
And if you are using dates, then I would expect DATE constants.
And your date filtering code won't work in the outer query, because you haven't selected the dates in the inner query. You probably want the filtering in the inner query.
This answer does not address these issues.
BigQuery has quite good documentation. There is no WITHIN GROUP for STRING_AGG().

Retrieving most frequent value for each group in SQL Server

This is what I have:
AirlineName Departure_City No_of_DepartureCity Arrival_City No_of_ArrivalCity
---------------------------------------------------------------------------------------------------- -------------- ------------------- ------------ -----------------
Air Asia MY 2 JPN 2
Emirates Airlines MY 2 JPN 2
Malaysia Airlines MY 2 GER 2
Malaysia Airlines MY 1 JPN 1
Air Asia MY 1 KOR 1
This is what I want:
AirlineName Departure_City No_of_DepartureCity Arrival_City No_of_ArrivalCity
---------------------------------------------------------------------------------------------------- -------------- ------------------- ------------ -----------------
Air Asia MY 2 JPN 2
Emirates Airlines MY 2 JPN 2
Malaysia Airlines MY 2 GER 2
I have already written a query to retrieve the most frequent data for Departure_City and Arrival_City, but I can't make it grouped together and only show the most frequent data for each AirlineName.
This is my query so far:
SELECT Airline.AirlineName, Flight_Schedule.Departure_City, COUNT(Flight_Schedule.Departure_City) AS No_of_DepartureCity, Flight_Schedule.Arrival_City, COUNT(Flight_Schedule.Arrival_City) AS No_of_ArrivalCity
FROM Airline
LEFT JOIN Aircraft ON Airline.AirlineID = Aircraft.AirlineID
LEFT JOIN Flight_Schedule ON Aircraft.AircraftID = Flight_Schedule.AircraftID
GROUP BY Airline.AirlineName, Flight_Schedule.Departure_City, Flight_Schedule.Arrival_City
ORDER BY COUNT(Flight_Schedule.Departure_City)DESC , COUNT(Flight_Schedule.Arrival_City) DESC
You can make use of Rank or Dense_rank (If you want to select more than two rows having same number of cities) function
Demo
with CTE1 AS(
SELECT A.*,
RANK() OVER(PARTITION BY AirlineName ORDER BY No_of_ArrivalCity desc) as rn
FROM TABLE1 A)
SELECT * FROM CTE1 where rn = 1;
As you're grouping by lots of columns, instead of just 'AirlineName' it's grouping by all of the different values across those number of columns.
To return the number of AirlineName's and their frequency try this:
SELECT Airline.AirlineName, COUNT(*) AS [COUNT]
FROM Airline
GROUP BY Airline.AirlineName
ORDER BY COUNT(*) DESC
If you need the additional columns then your code is already correct, because of how you are grouping it and the individual values contained within the columns.

SQLite percentages with small values

So I have this table of subscribers of users and the country they are in.
UserID | Name | Country
-------+-------------------+------------
1 | Zaphod Beeblebrox | UK
2 | Arthur Dent | UK
3 | Gene Kelly | USA
4 | Nat King Cole | USA
I need to produce a list of all the users by percentage from each of the countries. I also need all the smaller member countries (under 1%) to be collapsed into an "OTHERS" category.
I can accomplish a simple "top x" of members trivially with a
SELECT COUNTRY, COUNT(*) AS POPULATION FROM SUBSCRIBERS GROUP BY COUNTRY ORDER BY POPULATION DESC LIMIT 10
and can generate the percentages by PHP server side code, but I don't quite know how to:
Do all of it in SQL including percentage calculations directly in the result
Club all under 1% members into a single OTHERS category.
So I need something like this:
Country | Population
--------+-----------
USA | 25.4%
Brazil | 12%
UK | 5%
OTHERS | 65%
Appreciate the help!
Here is query for this, I used a subquery to count the total number of rows and then used that to get the percentage value for each. The 'Others' category was generated in a separate query. Rows are sorted by descending population with the Others row last.
SELECT * FROM
(SELECT country , ROUND((100.0*COUNT(*)/count_all),1) ||'%' AS population
FROM (SELECT count(*) count_all FROM subscribers) AS sq,
subscribers s
WHERE (SELECT 100*count(*)/count_all
FROM subscribers s2
WHERE s2.country = s.country) > 1
GROUP BY country
ORDER BY population DESC)
UNION ALL
SELECT 'OTHERS', IFNULL(ROUND(100.0*COUNT(*)/count_all,1),0.0) ||'%' AS population
FROM (SELECT count(*) count_all FROM subscribers) AS sq,
subscribers s
WHERE (SELECT 100*count(*)/count_all
FROM subscribers s2
WHERE s2.country = s.country) <= 1
Ok I think I might have found a way to do this that's a hell of a lot quicker on execution speed:
SELECT territory,
Round(Sum(percentage), 3) AS Population
FROM (SELECT
Round((Count(*)*100.0)/(SELECT Count(*) FROM subscribers),3) AS Percentage,
CASE
WHEN ((Count(*)*100.0)/(SELECT Count(*) FROM subscribers)) > 2 THEN
country
ELSE 'Other'
END AS Territory
FROM subscribers
GROUP BY country
ORDER BY percentage DESC)
GROUP BY territory
ORDER BY population DESC;

Counting occurrences in several columns

I'm making an app that shows people movies to rate, a-la hot-or-not. I'd like to write a query that gets me the number of times a movie has been rated. The table for ratings looks like this:
| id | winner | loser |
| 1 | 1 | 2 |
| 2 | 2 | 3 |
| 3 | 1 | 3 |
I can get the number of times a movie has "won" by running a query like this:
SELECT winner, count(winner) AS number_of_wins
FROM movie_results
GROUP BY winner
ORDER BY number_of_wins DESC;
But I'd like to get another query that shows the total number of times a movie was pitched against other movies, i.e. the number of times a movie has appeared to be rated, whether it was rated above or below the other movie. What is the easiest way to achieve this, using only SQL queries?
Here is one method, using union all:
select movie, count(*) as nummatches, sum(win) as numwins
from ((select winner as movie, 1 as win from match_results) union all
(select loser, 0 from match_results)
) wl
group by movie;
You can do a full join between two derived tables where each table contains the number of losses and wins for each player.
select
coalesce(winner,loser) player,
coalesce(number_of_wins,0) number_of_wins,
coalesce(number_of_losses,0) number_of_losses,
coalesce(number_of_wins,0) + coalesce(number_of_losses,0) number_of_matches
from (
select winner, count(*) number_of_wins
from movie_results
group by winner
) winners full join (
select loser, count(*) number_of_losses
from movie_results
group by loser
) losers on losers.loser = winners.winner
http://sqlfiddle.com/#!15/980d6/3