I don't understand the need for self-joins. Can someone please explain them to me? - sql

SELECT region, name, population
FROM bbc x
WHERE population <= ALL (SELECT population FROM bbc y WHERE y.region=x.region AND population>0)
I dont understand the logic of x and y using for the same table.

x and y are taking to be two different instances of table bbc, To list a table two times in the same query, you must provide a table alias for at least one of instance of the table name. This table alias helps the query processor determine whether columns should present data from the right or left version of the table.

This query returns all regions with smalest population in each rgion. To make this query without self-join you'll need to do 2 queries for each region:
1.
set #min=Select min(population) from bbc where population>0 and region=#region
2.
select region, name, population from bbc where population=#min and region=#region

Related

Why does this code work with a less than sign when the question requests for a greater than sign (nested selects from SQL zoo)?

(updated) I'm trying to learn SQL from the following website:
https://sqlzoo.net/wiki/Nested_SELECT_Quiz
In this quiz, the second question asks to find countries belonging to regions with all populations over 50000 -
It says that this is the correct answer:
SELECT name,region,population FROM bbc x WHERE 50000 < ALL (SELECT population FROM bbc y WHERE x.region=y.region AND y.population>0)
Thats the answer it gives me. Can anyone explain in plain English why this works? If we're looking for population over 50,000 why is the code using a less than sign? And how do nested selects work in general then?
Order matters. 50000 < population is the same as population > 50000.
Why write it this funny way? Because you have to.
Specifically all is a quantified comparison predicate and it must be of the form <value> <operator> all(<subquery>). So 50000 < all(subquery). I can't say why it cannot be reversed, possibly to make parsing this special case easier.
And how do nested selects work in general then?
all is true if every row of the subquery meets the condition. 50000 < all(subquery) means that 50,000 is less than every row in the subquery (or every row in the subquery is over 50000).
SELECT name,region,population
FROM bbc x
WHERE 50000 < ALL (
SELECT population
FROM bbc y
WHERE x.region=y.region AND y.population>0
)
The subquery runs once for each row in bbc. x is the bbc table in the original query and y is the bbc table in the subquery. where x.region=y.region filters the subquery results to only rows in the same region as the original row.

HANA concat rows

I use SAP-HANA database. I have a simple 2 column table whose columns are number, name, noodles, fish . The rows are these:
number name noodles fish
1 tom x
1 tom x
1 jack
2 jack x
I would like to group the rows by the id, and concatenate the names into a field, and thus obtain this:
number name noodles fish
1 tom x x
2 jack x
Can you please tell me how we can perform this operation in sap-hana? Thanks in advance.
Well, you did not really concatenate the names, but instead kept the same ones (if you would have concatenated the names as well, you would get something like jackjack in your result). I guess your x's indicate some sort of ABAP-style flags.
In any case, you would do this with grouping. This is a completely non-HANA thing (you can use the same basic SQL for any DB). You can group against several columns. All other columns that you want to select must be used in an aggregated expression (e.g. a SUM, MAX, COUNT, etc.).
To get the output from your question, I wrote the following code:
SELECT "ID", "NAME", MAX("FISH"), MAX("NOODLES")
FROM #TEST GROUP BY "ID", "NAME";
And got the same output as you. I used the MAX function based on the following assumption: you would want to get X if there is any X in the "concatenated" (aggregated) rows in that column. You get nothing / space if all the "concatenated" rows have space in them.

SQL Query to Add Values from Column X for Every Entry That Has Y

I need to write a query that is going to calculate the sum of one column depending on the values of another. Basically I need to get the sum of a certain drug administered for each patient in one of my DB's tables. My table has an account number column (x), drug ID column (y) and an amount administered column (z). The thing is there can be multiple rows for each account number so what I need to do is pull the total amount of that drug administered for each patient account number. So in essence I need a query that will return the sum of z for for every x with a where clause at the end using column y. I hope I am explaining this clearly because thinking about it confuses me! Any help would be appreciated. Thanks guys!
This is a simple GROUP BY query, I'm not sure what's confusing you.
SELECT x, SUM(z) total_z
FROM table
WHERE y = 123
GROUP BY x
Use GROUP BY:
SELECT x, y, sum(z)
FROM t
GROUP by x, y

SQL: Redundant WHERE clause specifying column is > 0?

Help me understand this: In the sqlzoo tutorial for question 3a ("Find the largest country in each region"), why does attaching 'AND population > 0' to the nested SELECT statement make this correct?
The reason is because the:
AND population > 0
...is filtering out the null row for the region "Europe", name "Vatican", which complicates the:
WHERE population >= ALL (SELECT population
FROM ...)
...because NULL isn't a value, so Russia won't be ranked properly. The ALL operator requires that the value you were comparing to be greater or equal to ALL the values returned from the subquery, which can never happen when there's a NULL in there.
My query would've been either:
SELECT region, name, population
FROM bbc x
WHERE population = (SELECT MAX(population)
FROM bbc y
WHERE y.region = x.region)
...or, using a JOIN:
SELECT x.region, x.name, x.population
FROM bbc x
JOIN (SELECT y.region,
MAX(y.population) AS max_pop
FROM bbc y
GROUP BY y.region) z ON z.region = x.region
AND z.max_pop = x.population
No it doesn't. Largest country has a priori nonzero population.
It's like checking if a largest book has any pages in it.

Retrieve names by ratio of their occurrence

I'm somewhat new to SQL queries, and I'm struggling with this particular problem.
Let's say I have query that returns the following 3 records (kept to one column for simplicity):
Tom
Jack
Tom
And I want to have those results grouped by the name and also include the fraction (ratio) of the occurrence of that name out of the total records returned.
So, the desired result would be (as two columns):
Tom | 2/3
Jack | 1/3
How would I go about it? Determining the numerator is pretty easy (I can just use COUNT() and GROUP BY name), but I'm having trouble translating that into a ratio out of the total rows returned.
SELECT name, COUNT(name)/(SELECT COUNT(1) FROM names) FROM names GROUP BY name;
Since the denominator is fixed, the "ratio" is directly proportional to the numerator. Unless you really need to show the denominator, it'll be a lot easier to just use something like:
select name, count(*) from your_table_name
group by name
order by count(*) desc
and you'll get the right data in the right order, but the number that's shown will be the count instead of the ratio.
If you really want that denominator, you'd do a count(*) on a non-grouped version of the same select -- but depending on how long the select takes, that could be pretty slow.