Is there a way to count group and subgroup using OVER? - sql

I have a very large table with some information about countries including:
City Name | Province | Country | ...
Honolulu HI US
Hilo HI US
Kihei HI US
Annapolis MD US
Laurel MD US
Sidney LD AU
Camberra PP AU
Darwin PP AU
...
And I want my query to look like this (preferably using OVER function to spare performance):
Country | Count_C | Province | Count_P
US 5 MD 2
US 5 HI 3
AU 3 LD 1
AU 3 PP 2
...
I've already managed doing this, but not without losing performance with some subqueries (the query took very long to run in the large table)
Bad Code:
SELECT country_name AS Country
,Count(*) OVER (PARTITION BY country_name) AS Count_C
,province AS Province
,Count(*) OVER (PARTITION BY province) AS Count_P
FROM country_list
GROUP BY country_name
,province
ORDER BY 1 DESC
,4 DESC

I think you just want aggregation with one window function:
SELECT country_name as Country,
SUM(COUNT(*)) OVER (PARTITION BY country_name) as country_cnt,
province,
Count(*) as province_count
FROM country_list
GROUP BY country_name, province
ORDER BY Country DESC, Province DESC;

Related

How to check how many times some values are duplicated?

I have table like below:
city | segment
------------------
London | A
London | B
New York | A
Berlin | B
Barcelona | C
Barcelona | H
Barcelona | E
Each city should have only one segment, but as you can see there are two cities (London and Barcelona) that have more than one segment.
It is essential that in result table I need only these cities which have > 1 segmnet
As a result I need somethig like below:
city - city based on table above
no_segments - number of segments which have defined city based on table above
segments - segments of defined city based on table above
city
no_segments
segments
London
2
A
B
Barcelona
3
C
H
E
How can I do that in Oracle?
You can use COUNT(*) OVER ()(in order to get number of segments) and ROW_NUMBER()(in order to prepare the results those will be conditionally displayed) analytic functions such as
WITH t1 AS
(
SELECT city,
segment,
COUNT(*) OVER (PARTITION BY city) AS no_segments,
ROW_NUMBER() OVER (PARTITION BY city ORDER BY segment) rn
FROM t
)
SELECT DECODE(rn,1,city) AS city,
DECODE(rn,1,no_segments) AS no_segments,
segment
FROM t1
WHERE no_segments > 1
ORDER BY t1.city, segment
Demo
Another way to do this is:
SELECT NULLIF(CITY, PREV_CITY) AS CITY,
SEGMENT
FROM (SELECT CITY,
LAG(CITY) OVER (ORDER BY CITY DESC) AS PREV_CITY,
SEGMENT,
COUNT(SEGMENT) OVER (PARTITION BY CITY) AS CITY_SEGMENT_COUNT
FROM CITY_SEGMENTS)
WHERE CITY_SEGMENT_COUNT > 1
Using LAG() to determine the "previous" CITY allows us to directly compare the CITY values, which in my mind is clearer that using ROW_NUMBER = 1.
db<>fiddle here
;with cte as (
Select city, count(seg) as cntseg
From table1
Group by city having count(seg) > 1
)
Select a.city, b.cntseg, a.seg
From table1 as a join cte as b
On a.city = b.city

Retrieving most frequent value for each group in SQL Server

This is what I have:
AirlineName Departure_City No_of_DepartureCity Arrival_City No_of_ArrivalCity
---------------------------------------------------------------------------------------------------- -------------- ------------------- ------------ -----------------
Air Asia MY 2 JPN 2
Emirates Airlines MY 2 JPN 2
Malaysia Airlines MY 2 GER 2
Malaysia Airlines MY 1 JPN 1
Air Asia MY 1 KOR 1
This is what I want:
AirlineName Departure_City No_of_DepartureCity Arrival_City No_of_ArrivalCity
---------------------------------------------------------------------------------------------------- -------------- ------------------- ------------ -----------------
Air Asia MY 2 JPN 2
Emirates Airlines MY 2 JPN 2
Malaysia Airlines MY 2 GER 2
I have already written a query to retrieve the most frequent data for Departure_City and Arrival_City, but I can't make it grouped together and only show the most frequent data for each AirlineName.
This is my query so far:
SELECT Airline.AirlineName, Flight_Schedule.Departure_City, COUNT(Flight_Schedule.Departure_City) AS No_of_DepartureCity, Flight_Schedule.Arrival_City, COUNT(Flight_Schedule.Arrival_City) AS No_of_ArrivalCity
FROM Airline
LEFT JOIN Aircraft ON Airline.AirlineID = Aircraft.AirlineID
LEFT JOIN Flight_Schedule ON Aircraft.AircraftID = Flight_Schedule.AircraftID
GROUP BY Airline.AirlineName, Flight_Schedule.Departure_City, Flight_Schedule.Arrival_City
ORDER BY COUNT(Flight_Schedule.Departure_City)DESC , COUNT(Flight_Schedule.Arrival_City) DESC
You can make use of Rank or Dense_rank (If you want to select more than two rows having same number of cities) function
Demo
with CTE1 AS(
SELECT A.*,
RANK() OVER(PARTITION BY AirlineName ORDER BY No_of_ArrivalCity desc) as rn
FROM TABLE1 A)
SELECT * FROM CTE1 where rn = 1;
As you're grouping by lots of columns, instead of just 'AirlineName' it's grouping by all of the different values across those number of columns.
To return the number of AirlineName's and their frequency try this:
SELECT Airline.AirlineName, COUNT(*) AS [COUNT]
FROM Airline
GROUP BY Airline.AirlineName
ORDER BY COUNT(*) DESC
If you need the additional columns then your code is already correct, because of how you are grouping it and the individual values contained within the columns.

How do I keep only 1 result according the alphabetical order in a tie with SQL queries?

The question asked to take the city with the highest sum of goods that bought by customer for each country. Basically, there are cities that have the same number of goods, but we only keep the first one in alphabetical order. The result only contains country name, the city with highest number of goods and their goods in sum.
Table Schema:
Country table:
country_name
city_name
Goods table:
city_name
user_id
number_of_goods
My queries result:
France Paris 85
Germany Berlin 100
Germany Frankfurt 100
Germany Luxembourg 100
Netherlands Amsterdam 75
Spain Barcelona 93
The right result should be:
France Paris 85
Germany Berlin 100
Netherlands Amsterdam 75
Spain Barcelona 93
You can use row_number() :
select t.*
from (select t.*, row_number() over (partition by country order by city) as seq,
max(no_goods) over (partition by country) as max_good
from table t
) t
where seq = 1;
use aggregation functions min() for city and max() for no_of_goods.
select t1.country, t1.no_of_goods, min(t2.city) as city
from
(select country, max(no_of_goods) as no_of_goods from tableA
group by country) t1
left join tableA t2 on t2.no_of_goods = t1.no_of_goods and t1.country = t2.country
group by t1.country, t1.no_of_goods
see dbfiddle.
Basically, there are cities that have the same number of goods, but we only keep the first one in alphabetical order.
Based on your sample data, all cities in a country seem to have the same number_of_goods. If so, you can just use aggregation:
select c.country, min(c.city_name), max(number_of_goods)
from countries c join
goods g
on c.city_name = g.city_name
group by c.country;

Top 10 of total amount paid aggregated by provider, partitioned by state - PostgreSQL

I have a database of medicare data with three tables: provider metadata (doctor's unique number, name, city, state, credentials, etc); hcpcs metadata (code, description, if it's for drugs or not); provider_services (doctor's unique number, hcpcs code, number of services completed by that doctor, average cost)
I'm trying to get the top 10 payments by state, aggregated by provider. However I'm running into an issue where 1) I can't figure out how to rank by the total payment and 2) I can't figure out how to aggregate the providers. Here's the best query I've gotten so far:
SELECT *
FROM (
SELECT p.npi,
p.nppes_provider_last_org_name AS last_name,
p.nppes_provider_first_name AS first_name,
p.nppes_provider_city AS city,
p.nppes_provider_state AS state,
(ps.average_medicare_payment_amt * ps.line_srvc_cnt) AS total_amount,
RANK() OVER (PARTITION BY p.nppes_provider_state ORDER BY ps.average_medicare_payment_amt desc) AS rank
FROM provider_services ps
JOIN provider p ON ps.npi = p.npi
) t
WHERE rank <= 10
GROUP BY t.last_name, t.npi, t.first_name, t.city, t.state, t.total_amount, t.rank
ORDER BY state ASC;
This results in something like:
| LAST | FIRST| STATE | TOTAL | RANK |
|-------|------|----|---------|---|
| DOE | JANE | AK | 3000.41 | 10|
| SMITH | JOHN | AK | 6000.98 | 7 |
| COLE | ANN | AK | 1000 | 4 |
| SMITH | JOHN | AK | 1560.32 | 1 |
So my issues are 1. the providers aren't aggregating (John Smith with the same unique number showing up multiple times) and 2. I can only get it to compile with that average_payment_amt and not total_amt so the rankings are really screwed up.
Consider following adjustments:
Avoid ever using SELECT * in aggregate queries with GROUP BY. It is a wonder this query was allowed in PostgreSQL without error but such use of SELECT * may be shorthand for all columns specified in GROUP BY.
Use calculated expression for total_amount in the window function's ORDER BY clause.
Apply an aggregation function like SUM on your total_amount and do not include it as grouping column. In fact, you do not mention how you want to aggregate by provider.
Rank based on state throws off aggregation based on different column: provider. Right now it appears you want to use rank only for filtering records and not display.
Below achieves the following:
Sums total payment amounts by provider for the top 10 payment amounts per state.
SELECT t.npi, t.last_name, t.first_name, t.city, t.state,
SUM(t.total_amount) AS total_amount
FROM (
SELECT p.npi,
p.nppes_provider_last_org_name AS last_name,
p.nppes_provider_first_name AS first_name,
p.nppes_provider_city AS city,
p.nppes_provider_state AS state,
(ps.average_medicare_payment_amt * ps.line_srvc_cnt) AS total_amount,
RANK() OVER (PARTITION BY p.nppes_provider_state
ORDER BY ps.average_medicare_payment_amt * ps.line_srvc_cnt DESC) AS rank
FROM provider_services ps
JOIN provider p ON ps.npi = p.npi
) t
WHERE rank <= 10
GROUP BY t.npi, t.last_name, t.first_name, t.city, t.state
ORDER BY t.state ASC;
Now, below achieves the following if this is your intention:
Displays records of top 10 payments per state in state and rank order (where providers can repeat if they ranked multiple times within or between states).
SELECT t.*
FROM (
SELECT p.npi,
p.nppes_provider_last_org_name AS last_name,
p.nppes_provider_first_name AS first_name,
p.nppes_provider_city AS city,
p.nppes_provider_state AS state,
(ps.average_medicare_payment_amt * ps.line_srvc_cnt) AS total_amount,
RANK() OVER (PARTITION BY p.nppes_provider_state
ORDER BY ps.average_medicare_payment_amt * ps.line_srvc_cnt DESC) AS rank
FROM provider_services ps
JOIN provider p ON ps.npi = p.npi
) t
WHERE rank <= 10
ORDER BY t.state, t.rank;
I am guessing that you actually want to aggregate in the subquery and rank by the total amount:
SELECT t.*
FROM (SELECT p.npi,
p.nppes_provider_last_org_name AS last_name,
p.nppes_provider_first_name AS first_name,
p.nppes_provider_state AS state,
SUM(ps.average_medicare_payment_amt * ps.line_srvc_cnt) AS total_amount,
RANK() OVER (PARTITION BY p.nppes_provider_state ORDER BY SUM(ps.average_medicare_payment_amt * ps.line_srvc_cnt) DESC) as rnk
FROM provider_services ps JOIN
provider p
ON ps.npi = p.npi
) t
WHERE rnk <= 10
ORDER BY state ASC, total_amount DESC;

SQLite percentages with small values

So I have this table of subscribers of users and the country they are in.
UserID | Name | Country
-------+-------------------+------------
1 | Zaphod Beeblebrox | UK
2 | Arthur Dent | UK
3 | Gene Kelly | USA
4 | Nat King Cole | USA
I need to produce a list of all the users by percentage from each of the countries. I also need all the smaller member countries (under 1%) to be collapsed into an "OTHERS" category.
I can accomplish a simple "top x" of members trivially with a
SELECT COUNTRY, COUNT(*) AS POPULATION FROM SUBSCRIBERS GROUP BY COUNTRY ORDER BY POPULATION DESC LIMIT 10
and can generate the percentages by PHP server side code, but I don't quite know how to:
Do all of it in SQL including percentage calculations directly in the result
Club all under 1% members into a single OTHERS category.
So I need something like this:
Country | Population
--------+-----------
USA | 25.4%
Brazil | 12%
UK | 5%
OTHERS | 65%
Appreciate the help!
Here is query for this, I used a subquery to count the total number of rows and then used that to get the percentage value for each. The 'Others' category was generated in a separate query. Rows are sorted by descending population with the Others row last.
SELECT * FROM
(SELECT country , ROUND((100.0*COUNT(*)/count_all),1) ||'%' AS population
FROM (SELECT count(*) count_all FROM subscribers) AS sq,
subscribers s
WHERE (SELECT 100*count(*)/count_all
FROM subscribers s2
WHERE s2.country = s.country) > 1
GROUP BY country
ORDER BY population DESC)
UNION ALL
SELECT 'OTHERS', IFNULL(ROUND(100.0*COUNT(*)/count_all,1),0.0) ||'%' AS population
FROM (SELECT count(*) count_all FROM subscribers) AS sq,
subscribers s
WHERE (SELECT 100*count(*)/count_all
FROM subscribers s2
WHERE s2.country = s.country) <= 1
Ok I think I might have found a way to do this that's a hell of a lot quicker on execution speed:
SELECT territory,
Round(Sum(percentage), 3) AS Population
FROM (SELECT
Round((Count(*)*100.0)/(SELECT Count(*) FROM subscribers),3) AS Percentage,
CASE
WHEN ((Count(*)*100.0)/(SELECT Count(*) FROM subscribers)) > 2 THEN
country
ELSE 'Other'
END AS Territory
FROM subscribers
GROUP BY country
ORDER BY percentage DESC)
GROUP BY territory
ORDER BY population DESC;