SQL (COUNT(*) / locations.area) - sql

We are learning SQL at school, and my professor has this sql code in his documents.
SELECT wp.city, (COUNT(*) / locations.area) AS population_density
FROM world_poulation AS wp
INNER JOIN location
ON wp.city = locations.city
WHERE locations.state = “Hessen”
GROUP BY wp.city, locations.area
Everything is almost clear for me, just the aggregate function with /locations.area doesn't make any sense to me. Can anybody help?
Thank you in advance!

Look at what the query is grouped on, that tells you what each group consists of. In this case, each group is a city, and contains all the rows that have the same value for wp.city (and as the location table is joined on that value too, the locations.area is only included in the grouping so that it can be used in the result).
So each group has a number of rows, and the COUNT(*) aggregate will contain the number of rows for each group. The value of (COUNT(*) / locations.area) will be the number of rows in the group divided by the value of locations.area for that group.
If you would have data like this:
world_population
name city
--------- ---------
John London
Peter London
Sarah London
Malcolm London
Ian Cardiff
Johanna Stockholm
Sven Stockholm
Egil Stockholm
locations
city state area
----------- -------------- ---------
London Hessen 2
Cardiff Somehere else 14
Stockholm Hessen 1
Then you would get a result with two groups (as Cardiff is not in the state Hessen). One group has four people from London which has the area 2, so the population density would be 2. The other group has three people from Stockholm which has the area 1, so the population density would be 3.
Side note: There is a typo in the query, as it joins in the table location but refers to it as locations everywhere else.

Try writing it like:
SELECT wp.city,
locations.area,
COUNT(*) AS population,
(COUNT(*) / locations.area) AS population_density
FROM world_poulation AS wp
INNER JOIN location
ON wp.city = locations.city
WHERE locations.state = “Hessen”
GROUP BY wp.city, locations.area
The key is the GROUP BY statement. You are showing pairs of cities and areas. The COUNT(*) is the number of times a given pair shows up in the table you created by joining world population and location. The area is just a number, so you can divide the area by the COUNT.

Related

How to grouby data in one column and distribute it in another column in HiveSQL?

I have the following data:
CompanyID
Department
No of People
Country
45390
HR
100
UK
45390
Service
250
UK
98712
Service
300
US
39284
Admin
142
Norway
85932
Admin
260
Germany
I wish to know how many people belong to the same department from different countries?
Required Output
Department
No of People
Country
HR
100
UK
Service
250
UK
300
US
Admin
142
Norway
260
Germany
I was able to get the data but the Department was repeated by this query.
""" select Department, Country,count(Department) from dataset
group by Country,Department
order by Department """
How can I get the desired output?
The result set that you are producing is not really a relational result set. Why? Because rows depend on what is in the "previous" row. And in a relational database, there is no such thing as a "previous" row. This type of processing is often handled in the application layer.
Of course, SQL can do what you want. You just need to be careful:
select (case when 1 = row_number() over (partition by Department order by Country)
then Department
end) as Department,
Country, count(*) as num_people,
from dataset
group by Country,Department
order by Department, Country;
Note that the order by needs to match the window function clause to be sure that what row_number() considered to be the first row is really the first row in the result set.

How to design table relationship where the foreign key can mean "all rows", "some rows" or "one row"?

I hope you can help me with this. I've used pseudocode to keep everything simple.
I have a table which describes locations.
location_table
location = charfield(200) # New York, London, Tokyo
A product manager now wants locations to be as follows:
Global = select every location
Asia = select every location in Asia
US = select every location in US
Current system = London (etc.)
This is my proposed redesign.
location_table
location = charfield(200) # New York, London, Tokyo
continent = foreign key to continent_table
continent_table
continent = charfield(50) # "None", "Global", Asia, Europe
But this seems horrible. It means in my code I'll always need to check if the customer is using "global" or "none", and then select the corresponding location records. For example, there will be code like this scattered everywhere:
get continent
if continent is global, select everything from location_table
else if continent is none, select location from location_table
else select location from location_table where foreign key is continent
My feeling is this is a known problem, and there is a known solution for it. Any ideas?
Thank you.
What you seem to have here is a set of locations, and then a set of location groups. Those groups might be all of the locations (global), or a subset of them.
You can build this with an intermediate table between the locations and a new location sets table which associates locations and location sets.
You might build the location set table and the join table so that the individual locations are also location sets, but ones which join only to one location. That way all location selections come from one table -- the location sets.
So you end up with three different types of location set:
Ones which map 1:1 with a location
One which maps 1:all ("global")
Ones which map 1:many (continents and other areas)
It's conceivable that this could be created as a hierarchy, but those queries can be inefficient because the join cardinalities tend to be obscured from the optimiser.
You could do this using a hierarchy, and a self referencing foreign key, e.g.
LocationID Name ParentLocationID LocationType
------------------------------------------------------------------
1 Planet Earth NULL Planet
2 Africa 1 Continent
3 Antartica 1 Continent
4 Asia 1 Continent
5 Australasia 1 Continent
6 Europe 1 Continent
7 North America 1 Continent
8 South America 1 Continent
9 United States 7 Country
10 Canada 7 Country
11 Mexico 7 Country
12 California 9 State
13 San Diego 12 City
14 England 6 Country
15 Cornwall 14 County
16 Truro 15 City
Hierarchical data usually requires either recursion, or multiple joins to get all levels, this answer contains links to articles comparing performance on the major DBMS.
Many DBMS now support recursive Common table expressions, and since no DBMS is specified I will use SQL Server syntax because it is what I am most comfortable with, a quick example would be.
DECLARE #LocationID INT = 7; -- NORTH AMERICA
WITH LocationCTE AS
( SELECT l.LocationID, l.Name, l.ParentLocationID, l.LocationType
FROM dbo.Location AS l
WHERE LocationID = #LocationID
UNION ALL
SELECT l.LocationID, l.Name, l.ParentLocationID, l.LocationType
FROM dbo.Location AS l
INNER JOIN LocationCTE AS c
ON c.LocationID = l.ParentLocationID
)
SELECT *
FROM LocationCTE;
Output based on above sample data
LocationID Name ParentLocationID LocationType
-----------------------------------------------------------------
7 North America 1 Continent
9 United States 7 Country
10 Canada 7 Country
11 Mexico 7 Country
12 California 9 State
13 San Diego 12 City
Online Demo
Supplying a value of 1 (Planet Earth) for the location ID will return the full table, or supplying a locationID of 11 (Mexico) would only return this one row, because there is nothing smaller than this in the sample data.
I'll go with your answer and say that I don't find it quite horrible to look everytime a customer to check if he searches by city or location, or nothing. That would be the role of the backend code and would always lead to different queries depending on what option he chooses.
But I would remove "None", "Global" from the continent table, and just use other queries when these option are not chosen. You would end up with the 3 possibles SQL queries you have, and I don't find it to be bad design per se. Maybe other solution are more performant, but this one seems to be more readable and logical. It's just optional querying with join tables.
Other answer will trade performance/duplication for readability (which isn't a bad thing, depending on how many time you will be relying on this condition in your application, in how many queries you'll be using it, and how many cities you have).
For readability and non-repetition, the best thing would be to concentrate these condition in one SQL function wich take a string parameter and return all location depending on the input (but at the cost of preformance).
Use levels:
0 -> None
00 -> Global
001 -> Europe
002 -> Asia
003 -> Africa
select location from location_table where continent like '[value]%'
Using a fixed length code, you can prefix regions, and then add one more digit for a region inside a region, and so on.
Ok, let me try to improve it.
Consider the world, it has the minimum level (or maximum depending on how you see it)
World ID = '0' (1 digit)
Now, select how you want to divide the world: (Continents, Half-Continents, ...) and assign the next level.
Europe ID = '01' (First digit World + Second digit Europe)
Asia ID = '02'
America ID = '03'
...
Next Level: Countries. (At least 2 digits)
England ID = '0101' (World + Continent + Country)
Deutchland ID = '0102'
....
Texas ID = '0301'
....
Next Level: Regions (2 digits)
Yorkshire ID = '010101' (World + Continent + Country + Region)
....
Next Level: Cities (2 or 3 digits)
London ID = '01010101' (World + Continent + Country + Region + City)
And so on.
Now, the same SELECT some_aggregate, statistics, ... FROM ... can be used for no matter what region, simply change:
WHERE Region like '0%' --> The whole world
WHERE Region like '02%' --> Asia
WHERE Region like '01010101%' --> London
WHERE Region like '02%' AND Region like '01%' --> Asia & Europe

Select distinct values with count in PostgreSQL

This is a heavily simplified version of an SQL problem I'm dealing with. Let's say I've got a table of all the cities in the world, like this:
country city
------------
Canada Montreal
Cuba Havanna
China Beijing
Canada Victoria
China Macau
I want to count how many cities each country has, so that I would end up with a table as such:
country city_count
------------------
Canada 50
Cuba 10
China 200
I know that I can get the distinct country values with SELECT distinct country FROM T1 and I suspect I need to construct a subquery for the city_count column. But my non-SQL brain is just telling me I need to loop through the results...
Thanks!
Assuming the only reason for a new row is a unique city
select country, count(country) AS City_Count
from table
group by country

oracle - sql query select max from each base

I'm trying to solve this query where i need to find the the top balance at each base. Balance is in one table and bases are in another table.
This is the existing query i have that returns all the results but i need to find a way to limit it to 1 top result per baseID.
SELECT o.names.name t.accounts.bidd.baseID, MAX(t.accounts.balance)
FROM order o, table(c.accounts) t
WHERE t.accounts.acctype = 'verified'
GROUP BY o.names.name, t.accounts.bidd.baseID;
accounts is a nested table.
this is the output
Name accounts.BIDD.baseID MAX(T.accounts.BALANCE)
--------------- ------------------------- ---------------------------
Jerard 010 1251.21
john 012 3122.2
susan 012 3022.2
fin 012 3022.2
dan 010 1751.21
What i want the result to display is calculate the highest balance for each baseID and only display one record for that baseID.
So the output would look only display john for baseID 012 because he has the highest.
Any pointers in the right direction would be fantastic.
I think the problem is cause of the "Name" column. since you have three names mapped to one base id(12), it is considering all three records as unique ones and grouping them individually and not together.
Try to ignore the "Name" column in select query and in the "Group-by" clause.
SELECT t.accounts.bidd.baseID, MAX(t.accounts.balance)
FROM order o, table(c.accounts) t
WHERE t.accounts.acctype = 'verified'
GROUP BY t.accounts.bidd.baseID;

statistic syntax in access

I want to do some statistic for the Point in my appliation,this is the columns for Point table:
id type city
1 food NewYork
2 food Washington
3 sport NewYork
4 food .....
Each point belongs to a certain type and located at the certain city.
Now I want to caculate the numbers of points in different city for each type.
For example, there are two types here :food and sport.
Then I want to know:
how many points of `food` and `sport` at NewYork
how many points of `food` and `sport` at Washington
how many points of `food` and `sport` at Chicago
......
I have tried this:
select type,count(*) as num from point group by type ;
But I can not group the by the city.
How to make it?
Update
id type city
1 food NewYork
2 sport NewYork
3 food Chicago
4 food San
And I want to get something like this:
NewYork Chicago San
food 2 1 1
sport 1 0 0
I will use the html table and chart to display these datas.
So I need to do the counting, I can use something like this:
select count(*) from point where type='food' and city ='San'
select count(*) from point where type='food' and city ='NewYork'
....
However I think this is a bad idea,so I wonder if I can use the sql to do the counting.
BTW,for these table data,how do people organization their structure using json?
this's what you want:
SELECT city,
COUNT(CASE WHEN [type] = 'food' THEN 1 END) AS FoodCount,
COUNT(CASE WHEN [type] = 'sport' THEN 1 END) AS SportCount
FROM point
GROUP BY city
UPDATE:
To get the results in an aggregated row/column format you need to use a pivot table. In Access it's called a Crosstab query. You can use the Crosstab query wizard to generate the query via a nice UI or cut straight to the SQL:
TRANSFORM COUNT(id) AS CountOfId
SELECT type
FROM point
GROUP BY type
PIVOT city
The grouping is used to count the number of Id's for each type. The additional PIVOT clause groups the data by city and displays each grouping in a separate column. The end result looks something like this:
NewYork Chicago San
food 2 1 1
sport 1 0 0