Regarding use of a query - sql

I am solving few sql queries myself,
in a question , which says
Find the largest country (by area) in each continent, show the continent, the name and the area:
SELECT continent, name, area
FROM world x
WHERE area >= ALL
(SELECT area FROM world y
WHERE y.continent=x.continent
AND area>0)
I don't understand what does he mean by world x and world y ? could anyone please explain that?

x and y are aliases. it allows you to identify the table in "WHERE y.continent=x.continent"

x and y are used as aliases (a short alternative name for reference purposes) of the table. This allows the use of the world table in two different scopes.

x and y are just aliases that are used to qualify the columns: if you had aliases and used the same table twice, it is not clear to which table instance a column belongs.
In your case, you are matching two instances of the same table on a column - continent - and the aliases are used to make it clear to the sql engine what is going on.

That is aliasing the table name, commonly written as:
FROM `table` AS `t`

x and y are table aliases. You use them to make the query more concise/readable and/or to use a query which selects the same table multiple times like here.
In SQL-Server 2005 and later you can use this query to get the desired result:
WITH CTE AS
(
SELECT continent, name, area,
rank=dense_rank() over(Partition By x.continent Order By area Desc)
From world
)
SELECT continent, name, area FROM CTE WHERE rank = 1
DENSE_RANK might return multiple countries per continent if they have the same largest area. If you just want one replace DENSE_RANK with ROW_NUMBER.

Related

SQL query does not return expected result

I am running the examples to be found here: https://sqlzoo.net/wiki/SELECT_within_SELECT_Tutorial
I am given a table named world:
I would expect this:
Why do I get this result, and how can I correct my SQL query?
use
SELECT distinct continent,
( SELECT name
FROM world b
WHERE a.continent = b.continent
ORDER BY name
LIMIT 1)
FROM world a
or
SELECT continent,
( SELECT name
FROM world b
WHERE a.continent = b.continent
ORDER BY name
LIMIT 1)
FROM world a
GROUP BY continent
You are getting a row for every country in the world and there are many per continent - -you need to limit to just one continent returned in the result set.
This technique is called a correlated sub-query, if you were wondering.

Group By Vs Distinct in SQL

SELECT continent, COUNT(name)
FROM world
WHERE population>200000000
GROUP BY continent
When i execute the query above the query runs fine. It basically shows the number of countries in each continent that has a population larger than 200000000.
However when I modify my query to the below :
SELECT DISTINCT(continent), COUNT(name)
FROM world
WHERE population>200000000
This does not work. I am wondering what the reason is. In this case I am saying for each distinct continent count the total countries with population larger than 200000000.
I just want to understand the reasoning so i can become better at writing queries.
Why does this not work?
SELECT DISTINCT(continent), COUNT(name)
FROM world
WHERE population > 200000000;
That is simple. You have an aggregation query, because you have COUNT() in the SELECT. You have no GROUP BY, so any other columns references in the SELECT must be the arguments of aggregations columns. So, continent generates an error.
You seem to also be under the impression that the parentheses around continent have some significance. They do not. Not at all. SQL has a construct, SELECT DISTINCT, which selects distinct values of rows.
Also note that DISTINCT is almost never used with aggregation functions.
I think you want:
SELECT continent
, COUNT(DISTINCT name) AS DistinctCountries
FROM world
WHERE population>200000000
GROUP BY continent
If want each row to represent a continent, you need to group by continent. Then count the distinct countries in the continent where your condition is met.
The first query and its order of evaluation:
FROM world: Get rows from the world table.
WHERE population>200000000: Only accept rows (countries?) with a population greater than 200000000.
GROUP BY continent: Aggregate the rows so as to get one result row per continent.
SELECT COUNT(name): For the continent show the count of its rows found in 3 where name is not null.
SELECT continent: show the continent.
The second query and its order of evaluation:
FROM world: Get rows from the world table.
WHERE population>200000000: Only accept rows (countries?) with a population greater than 200000000.
GROUP BY continent: Aggregate the rows so as to get one result row per continent.
SELECT COUNT(name): As there is no group by clause, this is saying you want one result row only, with the count of all rows found in 3 where name is not null.
SELECT (continent): The parentheses are superfluous. You are saying you want to show the continent. However, as you said with COUNT(name), you wanted to show one result row only, which continent are you talking about? It makes no sense to the DBMS and is invalid SQL. (There is one DBMS making an exception here, though: MySQL would just pick a continent arbitrarily rather than raising an error, a certain setting provided.)
SELECT DISTINCT: Of all result rows, you want duplicates removed, i.e. all rows showing the same continent and count.
Your error, as you can see, is in steps 4 and 5, where SELECT COUNT(name) without GROUP BY and SELECT (continent) don't match semantically.
GROUP BY AND DISTINCT are very much seperate in one way or the other.
Group by is used specifically to create and perform aggregation per groups while distinct is just used to have distinct/unique records or removing duplicates nothing else.
SELECT continent, COUNT(name)
FROM world
WHERE population>200000000
GROUP BY continent
The first query has a group by on continent it will group all rows which are having same continent into seperate groups after filtering via where.
This query will give you records of count per each continent
SELECT DISTINCT continent,
COUNT(name)
FROM world
WHERE population>200000000
The 2nd query means performing distinct and count on whole table but not groups (note) after filtering population. This query will give you distinct/unique continent but count is independent of groups and is of whole table

select rows where value is smaller than the value of a specific row

Hi I am new to SQL and have a question.
If I want to select all cities from a table of which the population is smaller than NYC (NYC is also in the table).
I can code like this
SELECT city FROM table where Population <= (select population from table where name = 'NYC')
My question is whether I can write it more concisely like
where city.population < NYC.population
No, you cannot write it more concisely that way, because NYC is not defined.
You can, however, also write this as a join:
SELECT t.city
FROM table t JOIN
table NYC
ON t.population < NYC.population AND
NYC.name = 'NYC';
It is not really more concise. And the difference between the two is really a matter of preference for this particular problem.

Comparing Geographic datatypes in SQL Server

Currently I am working on generating demographics of a database and we have added geography datatype in one of the tables. For demographics I have to produce max, min and avg of columns with other things.
Using
select MIN(Location) FROM SpatialTable
didn't work as geography datatype is incomparable.
So I used following query :
SELECT Location
FROM SpatialTable
WHERE Location.Lat IN (SELECT MIN(Location.Lat)
FROM SpatialTable
WHERE Location.Long IN (SELECT MIN(Location.Long)
FROM SpatialTable))
which basically selects the records with minimum Longitudes and then among those records it selects the one with minimum Latitude. But this can also be done other way round in which first MIN latitude is selected and among them MIN longitude is selected, like this:
SELECT
Location
FROM
SpatialTable
WHERE
Location.Long IN
(SELECT MIN(Location.Long)
FROM SpatialTable
WHERE Location.Lat IN (SELECT MIN(Location.Lat) FROM SpatialTable))
which may produce different result.
Is there a precise way to compare geographic data. I am using SQL Server 2008 R2 edition and my table has one Location column of geography type and an identity column.
To determine the minimum of a geography type, first you have to define what you mean by minimum. What is the minimum of a geography? It's like asking
what is the minimum of a dog?
How can one geography be less or more than another? Is London less than Paris*? Answer this, and you'll have your answer. At a guess, I'd say your answer may be the STDistance function.
*No, it's greater. Any fule knows that
Irrelevant to geography, you can use ROW_NUMBER() function to get row with min(or max) value on custom criteria.
SELECT x.* FROM
(
SELECT *
, ROW_NUMBER() OVER (ORDER BY Location.Lat, Location.Long) RN
FROM SpatialTable
) x
WHERE x.RN = 1

Beginning SQL How do I sum and pick columns in the same line?

I just started learning SQL on w3schools.com today, and want to make sure I'm on the right track.
I'm trying to solve this problem:
Write a SQL statement finding the combined population of the U.S. and Mexico (in this database).
I can't post the table here because of lack of reputation, but it is very simple. You are given that Mexico's country ID is 2 and U.S. id is 5.
the table has 4 columns, CITY_ID, NAME, COUNTRY_ID, and POPULATION. You do not know how many rows there are. So basically I need the to add up the POPULATION columns that have a corresponding '2' or '5' country ID.
This is what I have so far:
//this statement gives result-set with all the cities in Mexico and the U.S.
SELECT * FROM City
WHERE country_id=’5’
OR country_id=’2’
//here, I don't know how to reference the result-set
SELECT SUM(population) FROM result-set
The question also says to do it in one statement, is there a simpler way to do this? Thanks.
You put the SUM(population) expression into the query..
SELECT SUM(population) AS TotalPopulation
FROM City
WHERE country_id='5'
OR country_id='2'
Note that you can also write the x=a or x=b as x in (a,b), i.e.
WHERE country_id in (5,2)
(you can also drop the quotes if country_id is integer although it won't fail with them)
SELECT SUM(population)
FROM City
WHERE country_id=’5’
OR country_id=’2’
You're nearly there.
SELECT Sum(population) FROM City
WHERE country_id=2
OR country_id=5
I assume the country id is integer so speechmarks are not needed