SQL query does not return expected result - sql

I am running the examples to be found here: https://sqlzoo.net/wiki/SELECT_within_SELECT_Tutorial
I am given a table named world:
I would expect this:
Why do I get this result, and how can I correct my SQL query?

use
SELECT distinct continent,
( SELECT name
FROM world b
WHERE a.continent = b.continent
ORDER BY name
LIMIT 1)
FROM world a
or
SELECT continent,
( SELECT name
FROM world b
WHERE a.continent = b.continent
ORDER BY name
LIMIT 1)
FROM world a
GROUP BY continent
You are getting a row for every country in the world and there are many per continent - -you need to limit to just one continent returned in the result set.
This technique is called a correlated sub-query, if you were wondering.

Related

Group By Vs Distinct in SQL

SELECT continent, COUNT(name)
FROM world
WHERE population>200000000
GROUP BY continent
When i execute the query above the query runs fine. It basically shows the number of countries in each continent that has a population larger than 200000000.
However when I modify my query to the below :
SELECT DISTINCT(continent), COUNT(name)
FROM world
WHERE population>200000000
This does not work. I am wondering what the reason is. In this case I am saying for each distinct continent count the total countries with population larger than 200000000.
I just want to understand the reasoning so i can become better at writing queries.
Why does this not work?
SELECT DISTINCT(continent), COUNT(name)
FROM world
WHERE population > 200000000;
That is simple. You have an aggregation query, because you have COUNT() in the SELECT. You have no GROUP BY, so any other columns references in the SELECT must be the arguments of aggregations columns. So, continent generates an error.
You seem to also be under the impression that the parentheses around continent have some significance. They do not. Not at all. SQL has a construct, SELECT DISTINCT, which selects distinct values of rows.
Also note that DISTINCT is almost never used with aggregation functions.
I think you want:
SELECT continent
, COUNT(DISTINCT name) AS DistinctCountries
FROM world
WHERE population>200000000
GROUP BY continent
If want each row to represent a continent, you need to group by continent. Then count the distinct countries in the continent where your condition is met.
The first query and its order of evaluation:
FROM world: Get rows from the world table.
WHERE population>200000000: Only accept rows (countries?) with a population greater than 200000000.
GROUP BY continent: Aggregate the rows so as to get one result row per continent.
SELECT COUNT(name): For the continent show the count of its rows found in 3 where name is not null.
SELECT continent: show the continent.
The second query and its order of evaluation:
FROM world: Get rows from the world table.
WHERE population>200000000: Only accept rows (countries?) with a population greater than 200000000.
GROUP BY continent: Aggregate the rows so as to get one result row per continent.
SELECT COUNT(name): As there is no group by clause, this is saying you want one result row only, with the count of all rows found in 3 where name is not null.
SELECT (continent): The parentheses are superfluous. You are saying you want to show the continent. However, as you said with COUNT(name), you wanted to show one result row only, which continent are you talking about? It makes no sense to the DBMS and is invalid SQL. (There is one DBMS making an exception here, though: MySQL would just pick a continent arbitrarily rather than raising an error, a certain setting provided.)
SELECT DISTINCT: Of all result rows, you want duplicates removed, i.e. all rows showing the same continent and count.
Your error, as you can see, is in steps 4 and 5, where SELECT COUNT(name) without GROUP BY and SELECT (continent) don't match semantically.
GROUP BY AND DISTINCT are very much seperate in one way or the other.
Group by is used specifically to create and perform aggregation per groups while distinct is just used to have distinct/unique records or removing duplicates nothing else.
SELECT continent, COUNT(name)
FROM world
WHERE population>200000000
GROUP BY continent
The first query has a group by on continent it will group all rows which are having same continent into seperate groups after filtering via where.
This query will give you records of count per each continent
SELECT DISTINCT continent,
COUNT(name)
FROM world
WHERE population>200000000
The 2nd query means performing distinct and count on whole table but not groups (note) after filtering population. This query will give you distinct/unique continent but count is independent of groups and is of whole table

When is aliasing required when using SQL set theory clauses?

I just started learning SQL and am trying to learn from my mistakes. In one of my practice exercises, I had to find city names from the cities database are not listed as capital cities in countries database. Initially I tried the code below but it yielded an error.
SELECT name
FROM cities
EXCEPT
SELECT capital
FROM countries
ORDER BY capital ASC;
The correct code is:
SELECT city.name
FROM cities AS city
EXCEPT
SELECT country.capital
FROM countries AS country
ORDER BY name;
Can someone explain to me why aliasing made all the difference here?
An ORDER BY for a UNION, EXCEPT or INTERSECT sorts the complete result. The column names of the overall query are defined by the first query. So this query:
SELECT name
FROM cities
EXCEPT
SELECT capital
FROM countries
returns a result with a single column named name.
Adding an order by is conceptually the same as:
select *
from (
SELECT name
FROM cities
EXCEPT
SELECT capital
FROM countries
) x
order by ....;
As the inner query only returns a single column name, that's the only column you can use in the order by.
The aliases that you used in your second query don't change the column name of the overall result which determines the column names available for the order by clause.

How can I query to find max population of country from countries table?

I have a table "countries" with columns -> name,continent,area,popualtion.
Let's say I want to find the name and population of the chosen country with the highest population.
SELECT MAX(population) FROM countries;
The example above returns the maximum population.
I want it to also see the name of the country with that population.
SELECT name,MAX(population) FROM countries;
I am getting the error like below.
ERROR: column "countries.name" must appear in the GROUP BY clause or be used in an aggregate function
I can't think of another way to do it.
Here is an example of my query.
SELECT name,population
FROM countries
WHERE population >= (
SELECT MAX(population)
FROM countries)
;
This query works, but I am also curious why am I getting the error or if anyone knows if there is any better ways to accomplish this?
SELECT name, population
FROM countries
ORDER BY population DESC
LIMIT 1
MAX selects the maximum element from a list of values. In your first query,
SELECT MAX(population) FROM countries;
the list is formed by extracting the population field from all rows in countries, and then the maximum is selected. This collapses the list of rows down to a single row containing just the maximum.
In your second query,
SELECT name,MAX(population) FROM countries;
you (conceptually) get a list of all name fields from countries, but there's only one MAX(population). The database system doesn't know what to do with this: SELECT name FROM countries would return as many rows as there are in countries, but SELECT MAX(population) FROM countries would only return one row. This doesn't match up; it's unclear how many rows you want returned from this. This is why you get an error.
The error message says you need to either
use name in an aggregate function, which would collapse the list of rows down to a single value, which could be returned along the single MAX value, or
use a GROUP BY name clause, which would group the list of countries into entries with equal names first, then compute MAX(population) separately for each group. This makes no sense if all your countries have different names.
As far as I know there's no SQL syntax for "select the maximum population and then get the name field from the same row" (it's not quite clear what this would do anyway because there can be more than one country with a population equal to the maximum).
What you can do instead is sort the whole table, then select only fields from the first row:
SELECT name, population
FROM countries
ORDER BY population DESC
LIMIT 1
(I'm pretty sure Postgres optimizes this so there's no actual sort involved.)
Now if there is more than one country with a maximum population, you'll get a random result (we haven't told the database how to sort rows with equal population).
You can make use of Top keyword for selecting only single record
from countries table.
SELECT Top 1 name,population
FROM countries
order by population DESC

SQL ORACLE - Query that show a list how many people is over 50

I'm new (very) in the Oracle sql world
I have a table named ARTISTS with the columns: id_artist, artist_name, born_date, death_date, and also have a table called COUNTRY with the following columns: id_country, country_name
I have to write a query that shows the list of countries and how many artist over 50 have each one, something like this:
COUNTRY | ARTIST OVER 50
JAPAN................35
EEUU.................47
FRANCE............85
I get the age but I don't whAt to do for the rest of the query. I was trying to do something with the clausule count, group by and having but I only get error messages like 'missing expression' or problems with group by.
This is what I have
SELECT COUNTRY_NAME, COUNT (round(MONTHS_BETWEEN(DEATH_DATE,BORN_DATE)/12)),
FROM ARTISTS A JOIN COUNTRY C ON A.COUNTRY=C.ID_COUNTRY
HAVING COUNT (round(MONTHS_BETWEEN(DEATH_DATE,BORN_DATE)/12))>50
GROUP BY COUNTRY_NAME
I feel very very lost trying to do this query, as I say before, the only thing that I did is the query to know the age of every artist
round(MONTHS_BETWEEN(DEATH_DATE,BORN_DATE)/12)
I hope some of you could help me.
Thanks
SELECT COUNTRY_NAME, COUNT (round(MONTHS_BETWEEN(DEATH_DATE,BORN_DATE)/12))
FROM ARTISTS A JOIN COUNTRY C ON A.COUNTRY=C.ID_COUNTRY
GROUP BY COUNTRY_NAME
HAVING COUNT (round(MONTHS_BETWEEN(DEATH_DATE,BORN_DATE)/12))>50
Always use group by before having as you you filter the records after getting the available records....The query execution plan would be, it will first get the records by applying group by and only then apply having clause to filter those results.
Use GROUP BY before HAVING Clause
try this one :
SELECT COUNTRY_NAME, COUNT (round(MONTHS_BETWEEN(DEATH_DATE,BORN_DATE)/12))
FROM ARTISTS A JOIN COUNTRY C ON A.COUNTRY=C.ID_COUNTRY
GROUP BY COUNTRY_NAME
HAVING COUNT (round(MONTHS_BETWEEN(DEATH_DATE,BORN_DATE)/12))>50

Regarding use of a query

I am solving few sql queries myself,
in a question , which says
Find the largest country (by area) in each continent, show the continent, the name and the area:
SELECT continent, name, area
FROM world x
WHERE area >= ALL
(SELECT area FROM world y
WHERE y.continent=x.continent
AND area>0)
I don't understand what does he mean by world x and world y ? could anyone please explain that?
x and y are aliases. it allows you to identify the table in "WHERE y.continent=x.continent"
x and y are used as aliases (a short alternative name for reference purposes) of the table. This allows the use of the world table in two different scopes.
x and y are just aliases that are used to qualify the columns: if you had aliases and used the same table twice, it is not clear to which table instance a column belongs.
In your case, you are matching two instances of the same table on a column - continent - and the aliases are used to make it clear to the sql engine what is going on.
That is aliasing the table name, commonly written as:
FROM `table` AS `t`
x and y are table aliases. You use them to make the query more concise/readable and/or to use a query which selects the same table multiple times like here.
In SQL-Server 2005 and later you can use this query to get the desired result:
WITH CTE AS
(
SELECT continent, name, area,
rank=dense_rank() over(Partition By x.continent Order By area Desc)
From world
)
SELECT continent, name, area FROM CTE WHERE rank = 1
DENSE_RANK might return multiple countries per continent if they have the same largest area. If you just want one replace DENSE_RANK with ROW_NUMBER.