Subquery yields different results when used alone - sql

I have to write a query across two different tables country and city. The goal is to get every district and that district's population for every country. As the district is just an attribute of each city, I have to subsume all the populations of every city belonging to a district.
My query so far looks like this:
SELECT country.name, country.population, array_agg(
(SELECT (c.district, sum(city.population))
FROM city GROUP BY c.district))
AS districts
FROM country
FULL OUTER JOIN city c ON country.code = c.countrycode
GROUP BY country.name, country.population;
The result:
name | population | districts
---------------------------------------------+------------+------------------------------------------------------------------------------------------------------------------
Afghanistan | 22720000 | {"(Balkh,1429559884)","(Qandahar,1429559884)","(Herat,1429559884)","(Kabol,1429559884)"}
Albania | 3401200 | {"(Tirana,1429559884)"}
Algeria | 31471000 | {"(Blida,1429559884)","(Béjaïa,1429559884)","(Annaba,1429559884)","(Batna,1429559884)","(Mostaganem,1429559884)"
American Samoa | 68000 | {"(Tutuila,1429559884)","(Tutuila,1429559884)"}
So apparently it sums all the city-populations of the world. I need to limit that somehow to each district alone.
But if I run the Subquery alone as
SELECT (city.district, sum(city.population)) FROM city GROUP BY city.district;
it gives me the districts with their population:
row
----------------------------------
(Bali,435000)
(,4207443)
(Dnjestria,194300)
(Mérida,224887)
(Kochi,324710)
(Qazvin,291117)
(Izmir,2130359)
(Meta,273140)
(Saint-Denis,131480)
(Manitoba,618477)
(Changhwa,354117)
I realized it has to do something with the abbreviation that I use when joining. I used it for convenience but it seems to have real consequences because if I don't use it, it gives me the error
more than one row returned by a subquery used as an expression
Also, if I use
sum(c.population)
in the subquery it won't execute because
aggregate function calls cannot be nested
This abbreviation when joining apparently changes a lot.
I hope someone can shed some light on that.

Solved it myself.
Window functions are the most convenient method for this kind of task:
SELECT DISTINCT
country.name
, country.population
, city.district
, sum(city.population) OVER (PARTITION BY city.district)
AS district_population
, sum(city.population) OVER (PARTITION BY city.district)/ CAST(country.population as float)
AS district_share
FROM
country JOIN city ON country.code = city.countrycode
;
But it also works with subselects:
SELECT DISTINCT
country.name
, country.population
, city.district
,(
SELECT
sum(ci.population)
FROM
city ci
WHERE ci.district = city.district
) AS district_population
,(
SELECT
sum(ci2.population)/ CAST(country.population as float)
FROM
city ci2
WHERE ci2.district = city.district
) AS district_share
FROM
country JOIN city ON country.code = city.countrycode
ORDER BY
country.name
, country.population
;

Related

finding value in a list created via subquery

Thank you Stack-Community,
This is probably obvious for most of you but I just don't understand why it doesn't work.
I am using the Northwind database and lets say I am trying to find the countries that or not occurring twice but are listed either more than twice or less often.
I already figured out other ways of doing it with a having statement, so I am not looking for alternatives but trying to understand why my initial attempt is not working.
I look at it and look at it and it makes perfect sense to me. Can someone explain what's the problem?
SELECT country, count(country)
FROM Customers
WHERE 2 not in (SELECT count(country) FROM Customers GROUP BY country)
GROUP BY country
;
You need correlated subquery:
SELECT country, count(country)
FROM Customers c
WHERE 2 not in (SELECT count(country) FROM Customers c2
WHERE c2.country = c.country )
GROUP BY country;
Otherwise you get something like:
SELECT country, count(country)
FROM Customers c
WHERE 2 not in (1,2,3) -- false in every case and empty resultset
GROUP BY country;
Imagine that you have:
1, 'UK' -- 1
2, 'DE' -- 2
3, 'DE'
4, 'RU' -- 1
Now you will get equivalent of
SELECT country, count(country)
FROM Customers c
WHERE 2 not in (1,2,1) -- false in every case and empty resultset
GROUP BY country;
-- 0 rows selected

Recursion in PostgreSQL

Let's assume we have a table borders(country1,country2) that contains two countries that border each other, eg. (Sweden, Norway), etc. I would like to find all the countries that can be reached from a given country, say Sweden, by using border crossing only.
Here's the first part of my solution:
WITH RECURSIVE border(countryin) AS (
select distinct country
from (select country2::character varying(4) as country
from borders where country1 = 'S'
union
select country1::character varying(4) as country
from borders where country2 = 'S' ) a
UNION
select distinct sp.country::varchar(4)
from (select country1::varchar(4) as country, country2 as n
from borders) sp
join (select country2::varchar(4) as country, country1 as n, countryin as temp
from borders, border) st
on sp.country = st.n
and sp.country in st.temp
where true
)
SELECT distinct countryin, name
FROM border, country
where countryin = code ;
The only thing that I cannot get to work is how to set a constraint so that a specific country exists in the result border table. I tried using and sp.country in st.temp, and several other ways, but I cannot get it to work.
Could some one give me a hint of how this can be solved?
Current Results:
Right now, I get an error stating "
ERROR: syntax error at or near "st"
LINE 4: ...s, border) st on sp.country = st.n and sp.country in st.temp
"
Desired Results
List all counties that can be reached recursively using borders starting from 'S'. So, if we have (S,N), (N,R), (R,C), (D,A), we would get: (N,R,C)
I believe there is room for improvement, but seem like do the job.
base case, you get the "other" country where 'S' appear in any side
recursive case get new country with border with any country already in travel path, but avoid the one with 'S' so doesnt return to origin. Also include a variable to track recursive depth so doesnt keep looping for ever. (dont remember how many country are now).
After finish I add filter DISTINCT to remove duplicate.
Maybe I could include a filter on the recursive case To avoid travel back to same countries. Not sure which one is more efficient.
AND ( b.country1 NOT IN (SELECT country FROM Travel)
AND b.country2 NOT IN (SELECT country FROM Travel)
)
SQL Fiddle DEMO
WITH RECURSIVE travel(r_level, country) AS (
select distinct 1 as r_level,
CASE WHEN country1 = 'S' THEN country2
ELSE country1
END as country
from borders
where country1 = 'S'
or country2 = 'S'
UNION
select distinct t.r_level + 1 as r_level,
CASE WHEN b.country1 = t.country THEN b.country2
ELSE b.country1
END as country
from borders b
join travel t
ON (b.country1 = t.country OR b.country2 = t.country)
AND (b.country1 <> 'S' AND b.country2 <> 'S')
WHERE t.r_level < 300
)
SELECT DISTINCT country
FROM travel
OUTPUT
| country |
|---------|
| N |
| R |
| C |
Please feel free to provide a more complete sqlFiddle with more country to improve the testing.

Divide two queries in SQL then group by

I am looking for the rate change between new accounts and all accounts, I have both queries listed below. I need to divide NewAccounts by AllAccounts, take that percentage and group by town in the same query. Thanks
SELECT DISTINCT Count(NewAccounts), Town
FROM (SELECT Stuff)
WHERE (Newaccounts)
Group By Town
;
SELECT DISTINCT Count(AllAccounts), Town
FROM (SELECT DifferentSTUFF)
WHERE (AllAccounts)
Group By Town
You need to rewrite your queries as subqueries and join them together:
SELECT CAST(na.NewAccounts AS FLOAT) / aa.AllAccounts
FROM ( SELECT Count(NewAccounts) AS NewAccounts, Town
FROM (SELECT Stuff)
WHERE (Newaccounts)
GROUP BY Town
) na
INNER JOIN
( SELECT Count(AllAccounts) AS AllAccounts, Town
FROM (SELECT DifferentSTUFF)
WHERE (AllAccounts)
GROUP BY Town
) aa
ON aa.Town = na.Town;
n.b. I have removed DISTINCT from both queries as it is redundant. The cast to float on NewAccounts is to avoid the implicit conversion of the result integer division back to an integer.
You may need to alter this slightly depending on the availability of data in each of the queries, i.e. if you won't always have a result in the new accounts for a town it would be better written as:
SELECT CAST(COALESCE(na.NewAccounts, 0) AS FLOAT) / aa.AllAccounts
FROM
( SELECT Count(AllAccounts) AS AllAccounts, Town
FROM (SELECT DifferentSTUFF)
WHERE (AllAccounts)
GROUP BY Town
) aa
LEFT JOIN
( SELECT Count(NewAccounts) AS NewAccounts, Town
FROM (SELECT Stuff)
WHERE (Newaccounts)
GROUP BY Town
) na
ON aa.Town = na.Town

Speeding up a slow SQL query

I am using the MySQL world.sql database. Exactly what is in it doesn't matter, but the schema that matters to use looks like:
CREATE TABLE city (
name char(35),
country_code char(3),
population int(11),
);
CREATE TABLE country (
code char(3),
name char(52),
population int(11)
);
The query in question is, in english, "for each country, give me its name and population, along with the name and population for the city who has the highest ratio of its population to the country's population"
Currently I have the following SQL:
SELECT t.name, t.population, c.name, c.population
FROM country c
JOIN city t
ON t.country_code = c.code
WHERE t.population / c.population = (
SELECT MAX(tt.population / c.population)
FROM city tt
WHERE t.country_code = tt.country_code
)
Currently the query takes about 10 minutes to run on my SQLite database. The world.sql database isn't large (4000-5000 rows?) so I'm guessing I'm doing something wrong here.
I currently don't have any sort of indexes or anything: the database is an empty database with this dataset (https://dl.dropboxusercontent.com/u/7997532/world.sql) entered into it. Could anyone give me any pointers as to what I need to fix to make it run in a reasonable amount of time?
EDIT: well here's another twist to the question:
This runs in <2 seconds
SELECT t.name, t.population, c.name, c.population
FROM country c
JOIN city t
ON t.country_code = c.code
WHERE t.population * 1.0 / c.population = (
SELECT MAX(tt.population * 1.0 / c.population)
FROM city tt
WHERE tt.country_code = t.country_code
)
While this take 10 minutes to run
SELECT t.name, t.population, c.name, c.population
FROM country c
JOIN city t
ON t.country_code = c.code
AND t.population * 1.0 / c.population = (
SELECT MAX(tt.population * 1.0 / c.population)
FROM city tt
WHERE tt.country_code = t.country_code
)
Is the solution then to simply stuff as much as possible into the ON clause when i'm doing JOINs? It seems in this case I can get away without an index if I do that...
For each country, the city that has the highest ratio of population to it's country's population is the city with the highest population, so try this:
SELECT t.name, t.population, c.name, c.population
FROM country c
JOIN city t
ON t.country_code = c.code
And population =
(Select Max(population) from city
Where country_code = c.Code)
But this may still not improve performance much... if you have no indicies. You need to put an index on country.code, and on city.country_code
Ideally, I would first start with indexes and consider adding a computed field that pre-calculates the t.population / c.population into a link table
So for each country and city, you can look up it's ratio of population without computing in RBAR.
I suggest adding numeric primary keys to both tables and a foreign key on country_code in your city table. One of the benefits will be better performance because primary keys are indexed.
Edit starts here
Since the question doesn't ask you to provide the actual ratio, don't worry about trying to calculate it. The city with the highest population in the country will have the highest proportion of the country's population.

Query for Counting number of orders by UK postcode

I have got a table of orders placed by customer , what i want is to check from which part of the country orders are coming historically, I can only check this by postcodes , for intance an order with post code SK... means its stockport , similarly the post code starting from M .. means the order is from manchester, Is it possible to write a query which can count the orders by postcode.
Some of the fields of the Order table:
OrderNumber OGUID custID firstname last name address postcode email authorisation date etc...
Any suggestion or assistance will be appreciated.
Thanks
Here is way that works... but it can get too long for a huge list. I will try to find a way around that problem.
SELECT
CASE
WHEN postcode LIKE 'SK%' THEN 'SK'
WHEN postcode LIKE 'M%' THEN 'M'
END AS group_by_value
, COUNT(*) AS group_by_count
FROM [Table] a
GROUP BY
CASE
WHEN postcode LIKE 'SK%' THEN 'SK'
WHEN postcode LIKE 'M%' THEN 'M'
END
If you have a table that contains the city code and city name, then you might be able to use something like the following which joins your orders table to the codes using a LIKE:
select o.postcode,
c.city,
count(c.code) over(partition by c.code) Total
from orders o
inner join codes c
on o.postcode like '%'+c.code+'%'
See SQL Fiddle with Demo
You can use GROUP BY to get the total number of orders in each postcode:
select postcode, count(postcode) TotalOrdersByPostCode
from orders
group by postcode
If you want the City included, then you can also GROUP BY city:
select city, postcode, count(postcode) TotalOrdersByPostCode
from orders
group by city, postcode
select count(1) over(partition by postcode) as countByPostcode, othecolumnhere
from Order
Have you tried something like this? The town part of the postcode will be the first 1 or 2 bytes, delimited by a number after, I think. So this will give you the first few letters.
select substring(postcode,1, patindex('%[0-9]%',postcode)-1), count(*)
from Order
group by substring(postcode,1, patindex('%[0-9]%',postcode)-1)
Then you'll have to decode M into Manchester, W into West London, GU into Guildford etc...