SQL to calculate the overall total and sub total number but unique - sql

I have an example data as below.
I want to calculate user count per city and user count per country.
Here's what I want:
How can I implement it in BigQuery as simple as possible?
Thanks a million!

You can use analytical function as follows:
select distinct country, city,
count(distinct username) over (partition by country, city) as distinct_users_per_city,
count(distinct username) over (partition by country) as distinct_users_per_country
from your_Table t

I want to calculate user count per city and user count per country
I feel like Jessie from Tokyo and Jessie from Okinawa are two different users and needs to be counted as such for country count! Same goes for Jack from Chicago and Jack from New York!
Below code does this
select distinct country, city,
count(distinct username) over (partition by country, city) as user_count_per_city,
count(distinct username || '|' || city) over(partition by country) as user_count_per_country
from `project.dataset.table`
if applied to sample data in your question - output is
which is different from yours (expected/presented in the question) for above described reason

Related

Oracle SQL distinct Count(*) with 3 columns

I'm trying to display 3 columns in a table.
Something like:
ZIPCODE
SUBSCRIBERS
MEMBERS
12345
5
10
12346
3
8
In which each zipcode is a distinct zipcode that has a number of "subscribers" within it. The subscribers would be the original employee, that can just be defined as DEPNO=0 (they are the original employee and not a dependent), the members would just be everyone in the zipcode which I am able to get with a statement that looks like the SQL below. I am pulling from a table called EMPDEP
SELECT DISTINCT ZIPCODE, COUNT(*) OVER (PARTITION BY ZIPCODE) as Subscribers FROM EMPDEP where depno=0
This statement will get me a Subscriber count but I want the total member count in there as well which would just be
SELECT DISTINCT ZIPCODE, COUNT(*) OVER (PARTITION BY ZIPCODE) as Members FROM EMPDEP
but getting all 3 of these in 1 query is killing me as I can't get the nesting down correctly, at least I'm assuming I will need that?
Any tips on how to do this?
Huh? Why are you using window functions? Just use aggregation:
SELECT ZIPCODE, COUNT(*) as Members,
SUM(CASE WHEN depno = 0 THEN 1 ELSE 0 END) as Subscribers
FROM EMPDEP
GROUP BY ZIPCODE;

Column must appear in group by or aggregate function in nested query

I have the following table.
Fights (fight_year, fight_round, winner, fid, city, league)
I am trying to query the following:
For each year that appears in the Fights table, find the city that held the most fights. For example, if in year 1992, Jersey held more fights than any other city did, you should print out (1992, Jersey)
Here's what I have so far but I keep getting the following error. I am not sure how I should construct my group by functions.
ERROR: column, 'ans.fight_round' must appear in the GROUP BY clause or be used in an aggregate function. Line 3 from (select *
select fight_year, city, max(*)
from (select *
from (select *
from fights as ans
group by (fight_year)) as l2
group by (ans.city)) as l1;
In Postgres, I would recommend aggregation and distinct on:
select distinct on (flight_year) flight_year, city, count(*) cnt
from flights
group by flight_year, city
order by flight_year, count(*) desc
This counts how many fights each city had each year, and retains the city with most fight per year.
If you want to allow ties, then use window functions:
select flight_year, city, cnt
from (
select flight_year, city, count(*) cnt,
rank() over(partition by flight_year order by count(*) desc) rn
from flights
group by flight_year, city
) f
where rn = 1
Although row_number is the easiest way as done by #GMB. Can try this alternative as well
select city, fight_year
from fights
group by city, fightyear
having count(*) = sum(case when fid is not null then 1 end)

how to find difference between no_of_value and no_of_distinct columns values?

Let be the number of CITY entries in STATION, and let be the number of distinct CITY names in STATION; query the value of from STATION. In other words, find the difference between the total number of CITY entries in the table and the number of distinct CITY entries in the table.
Input Format
The STATION table is described as follows:
enter image description here
where LAT_N is the northern latitude and LONG_W is the western longitude.
Use distinct in count function.
select count(city) - count(distinct city)
from station
SELECT count(city) - count(DISTINCT city) FROM station;
Do not forget to add semicolon ';' after the query
You could use having for filtering the resul on aggregated function
select city, count(*), count(distinct city)
from station
group by city
having count(*) <> count(distinct city)
If I understand correctly:
select count(city) - count(distinct city)
from station;
You would do this to get the number of duplicated values in the table. I might be more interested in the list of cities and the number of duplicates:
select city, count(*) - 1 as numdups
from station
group by city
having count(*) > 1;

SQL distinct records but filter nulls for one field

I am new to SQL and looking for some help here. Please see the first screenshot. the first two records have exact values except for the country field. the second screenshot is what I want to retrieve. I want to get the first one (with a not-null country). I also want to keep the third record since it's the only record for id 345, although it has a null country. I tried the following query but didn't get me the results I want.
SELECT DISTINCT id, first, last, age, gender, city, state, zip
FROM Person
WHERE country IS NOT NULL
Something like
select id, first, last, age, gender, city, state, zip, min(country)
from persons
group by id, first, last, age, gender, city, state, zip
should work, since by default the ordering of varchar2 values is ascending and nulls are last.
You can use the row_number analytic function to do this.
By partitioning by id and then filtering by rn = 1, we ensure that we get no more than one row per distinct id value.
The order by clause in the row_number function is what determines which row gets returned. In this case, rows with a non-null country value are prioritized. If you have more complex rules to determine which row you want to return per id, just adjust the order by clause accordingly.
select id, first, last, age, gender, city, state, zip, country
from (select id, first, last, age, gender, city, state, zip, country
row_number() over(
partition by id
order by case when country is not null then 1 else 2 end) as rn
from tbl)
where rn = 1

Select count / duplicates

I have a table with all U.S. zip codes. each row contains the city and state name for the zip code. I'm trying to get a list of cities that show up in multiple states. This wouldn't be a problem if there weren't X amount of zip codes in the same city...
So basically, I just want to the city in a state to count as 1 instead of it counting the city/state 7 times because there are 2+ zip codes in that city/state...
I'm not really sure how to do this. I know I need to use count but how do I tell the mysql to only count a given city/state combo as 1?
SELECT City, Count(City) As theCount
FROM (Select City, State From tblCityStateZips Group By City, State) As C
GROUP By City
HAVING COUNT Count(City) > 1
This would return all cities, with count, that were contained in more than one state.
Greenville 39
Greenwood 2
GreenBriar 3
etc.
First group on state and city, then group the result on city:
select City
from (
select State, City
from ZipCode
group by State, City
) x
group by City
having count(*) > 1
Will this do the trick
Select CityName, Count (Distinct State) as StateCount
From CityStateTable
Group by CityName
HAVING Count (Distinct State) > 1
Try using a select distinct
SELECT DISTINCT city, state FROM table GROUP BY city
You probably should have created a separate table for zip codes then to avoid the duplication.
You want to look into the GROUP BY Aggregate.