Confused syntax in Where clause - sql

what does the line (rowid,0) mean in the following query
select * from emp
WHERE (ROWID,0) in (
select rowid, mod(rownum,2) from emp
);
i dont get the line WHERE (ROWID,0).
what is it?
thanx in advance

IN clause in Oracle SQL can support column groups. You can do things like this:
select ...
from tab1
where (tab1.col1, tab1.col2) in (
select tab2.refcol1, tab2.refcol2
from tab2
)
That can be useful in many cases.
In your particular case, the subquery use for the second expression mod(rownum,2). Since there is no order by, that means that rownum will be in whichever order the database retrieves the rows - that might be a full table scan or a fast full index scan.
Then by using mod every other row in the subquery gets the value 0, every other row gets the value 1.
The IN clause then filters on second value in the subquery being equal to 0. The end result is that this query retrieves half of your employees. Which half will depend on which access path the optimizer chooses.

Not sure what dialect of sql you're using, but it appears that since the subquery in the IN clause has two columns in the select list, then the (ROWID,0) indicates which columns align with the subquery. I have never seen multiple columns in an IN statment's select list before.

This is a syntax used by some databases (but not all) that allows you to do in with multiple values.
With in, this is the same as:
where exists (select 1
from emp e2
where e2.rowid = emp.rowid and
mod(rownum, 2) = 0
)
I should note that if you are using Oracle (which allows this syntax), then you are using rownum in a subquery with no order by. The results are going to be rather arbitrary. However, the intention seems to be to return every other row, in some sense.

Related

Subquery returns more than one item - how can the result set be compared to one column?

SELECT
"employees"."FIRST_NAME",
"employees"."LAST_NAME",
"employees"."SALARY"
FROM
'employees'
WHERE
(("employees"."SALARY" > (SELECT "employees"."SALARY"
FROM 'employees'
WHERE (("employees"."FIRST_NAME" = "Alexander")))))`
The subquery returns 2 values. How can they be compared with "employees"."salary" ? That is, there are 2 employees with first name "Alexander"... Replacing the subquery with a 2 element tuple gives a query that is not accepted from the SQL client... That is this query should be equivalent to the one above but it does not execute correctly:
SELECT
"employees"."FIRST_NAME", "employees"."LAST_NAME",
"employees"."SALARY"
FROM
'employees'
WHERE
(("employees"."SALARY" > (3500, 9000)))
What is going on?
From SQL Language Expressions/11. Subquery Expressions:
A SELECT statement enclosed in parentheses is a subquery. All types of
SELECT statement, including aggregate and compound SELECT queries
(queries with keywords like UNION or EXCEPT) are allowed as scalar
subqueries. The value of a subquery expression is the first row of
the result from the enclosed SELECT statement. The value of a
subquery expression is NULL if the enclosed SELECT statement returns
no rows.
Your query would not run in any other database than SQLite.
But, SQLite as you can see from the documentation, instead of throwing an error like the subquery returns more than 1 rows, allows the subquery by keeping only the 1st of the rows that it returns.
This is one of SQLite's non-standard sql features and in your case it leads to wrong results.
What you would want, I believe, is to compare each employee's salary to the max salary of all employees named 'Alexander'.
You can do this by changing the subquery to:
SELECT MAX(SALARY) FROM employees WHERE FIRST_NAME = 'Alexander'
This is a not correlated scalar subquery, so there is no need for any aliases.
Note assuming that the SQLite tag is correct , i.e. there are many flavours of SQL and that the database manager being is used is therefore important and relevant.
the subquery returns 2 values. How can they be compared with "employees"."salary" ?
You compare multiple values using a function that can take multiple values such as max, which could be what you require.
e.g.
SELECT "employees"."FIRST_NAME", "employees"."LAST_NAME", "employees"."SALARY" FROM 'employees' WHERE (("employees"."SALARY" > max(3500, 9000)))
What is going on?
The first is using a WHERE clause that is a valid expression that is either true or false. The second is misusing values i.e. a list of values is provided where a single value is expected.
First, writing your queries, you should not have to "quote" every part, it gets cluttered and bloated. Also, you can use aliases to help readability. you'll see soon. If you use quotes, use the single quotes around specific values such as a date like > '2022-02-22'.
Now on to your query. Your query is looking for salaries greater than a given person (Alexander), but there are multiple people by that name. To get ONE answer, you might need the MAX() salary for the critiera. So this essentially becomes TWO queries... one relying upon the other.
So, to get you an answer, the outer query is what you will get as the results, the WHERE query is pre-qualifying that one salary you are interested in.
Select
e.first_name,
e.last_name,
e.salary
from
employees e
where
e.salary > ( select max( e2.salary )
from employees e2
where e2.first_name = 'Alexander' )
Notice the where clause is getting whatever the MAX() salary value is from the employee table for the employee 'Alexander'. So now, that ONE value comes back as the basis for the outer query.
Notice the indentation, you can better see how the outer query is reliant on that. Also makes for more readable queries.

Get min/max value by key - different approaches

I have a table with two columns namely ID and KEY (let key here be an integer) such as
ID KEY
ABC 6
DEF 1
GHI 12
TASK: Get the ID of the MAX key
Solution 1:
Select Top(1) ID
from TABLE
order by KEY desc
Solution 2:
Select ID
from TABLE
where ID = MAX(ID)
EDIT: The query was invalid. This is what I meant:
Select ID
from TABLE
where KEY = (select max(KEY) from TABLE)
Is one of these solutions categorically better than the other? What are the advantages/disvantages of each solution.
EDIT:
Assume there is no index.
Case 1 - large table
Case 2 - small table
Background:
I am doing code review and I have found both solutions multiple times in different context - sometimes with indices, sometimes without, sometimes for large tables, sometimes for small.
The two queries are different (after your edits fixing the second one).
The first necessarily returns a single row.
The second returns all matching rows.
The first returns a row even when key is NULL.
The second does not.
You should use the logic that does what you want.
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list..
Solution 1 will be the best. A subquery in a where clause will be less optimal.
There really are lots of design techniques to look at for performance which I am not going to go into with this answer. I found this article yesterday which gave me more perspective https://www.red-gate.com/simple-talk/sql/database-administration/sql-server-storage-internals-101/
In Solution 1, the order by clause will just sort your query result.
Query execution order:
FROM clause ON clause OUTER clause WHERE clause GROUP BY clause HAVING clause SELECT clause DISTINCT clause ORDER BY clause TOP clause
You can use the following query:
Select ID,
RANK() OVER (ORDER BY KEY DESC) AS KeyRank
from table1
HAVING keyRank = 1
Solution 1 will work but Solution 2 will throw exception like bellow
Msg 147, Level 15, State 1, Line 22 An aggregate may not appear in the
WHERE clause unless it is in a subquery contained in a HAVING clause
or a select list, and the column being aggregated is an outer
reference.
You can go with query 1 ,
You cannot use query 2 because you cannot use aggregate function like that if you want to use where clause and aggregate function in your query you have to go with as below :
Select id from table where key in (select max(key) from test);
reference only using aggregate function and having clause
Select ID ,max(key)
from test
group by ID,key
having (key) >= 12
order by 1

Why are aggregate functions not allowed in where clause

I am looking for clarification on this. I am writing two queries below:
We have a table of employee name with columns ID , name , salary
1. Select name from employee
where sum(salary) > 1000 ;
2. Select name from employee
where substring_index(name,' ',1) = 'nishant' ;
Query 1 doesn't work but Query 2 does work. From my development experience, I feel the possible explanation to this is:
The sum() works on a set of values specified in the argument. Here
'salary' column is passed , so it must add up all the values of this
column. But inside where clause, the records are checked one by one ,
like first record 1 is checked for the test and so on. Thus
sum(salary) will not be computed as it needs access to all the column
values and then only it will return a value.
Query 2 works as substring_index() works on a single value and hence here it works on the value supplied to it.
Can you please validate my understanding.
The reason you can't use SUM() in the WHERE clause is the order of evaluation of clauses.
FROM tells you where to read rows from. Right as rows are read from disk to memory, they are checked for the WHERE conditions. (Actually in many cases rows that fail the WHERE clause will not even be read from disk. "Conditions" are formally known as predicates and some predicates are used - by the query execution engine - to decide which rows are read from the base tables. These are called access predicates.) As you can see, the WHERE clause is applied to each row as it is presented to the engine.
On the other hand, aggregation is done only after all rows (that verify all the predicates) have been read.
Think about this: SUM() applies ONLY to the rows that satisfy the WHERE conditions. If you put SUM() in the WHERE clause, you are asking for circular logic. Does a new row pass the WHERE clause? How would I know? If it will pass, then I must include it in the SUM, but if not, it should not be included in the SUM. So how do I even evaluate the SUM condition?
Why can't we use aggregate function in where clause
Aggregate functions work on sets of data. A WHERE clause doesn't have access to entire set, but only to the row that it is currently working on.
You can of course use HAVING clause:
select name from employee
group by name having sum(salary) > 1000;
If you must use WHERE, you can use a subquery:
select name from (
select name, sum(salary) total_salary from employee
group by name
) t where total_salary > 1000;
sum() is an aggregation function. In general, you would expect it to work with group by. Hence, your first query is missing a group by. In a group by query, having is used for filtering after the aggregation:
Select name
from employee
group by name
having sum(salary) > 1000 ;
Using having works since the query goes direct to the rows in that column while where fails since the query keep looping back and forth whenever conditions is not met.

Clarification on "rownum"

I have a table Table1
Name Date
A 01-jun-2010
B 03-dec-2010
C 12-may-2010
When i query this table with the following query
select * from table1 where rownum=1
i got output as
Name Date
A 01-jun-2010
But in the same way when i use the following queries i do not get any output.
select * from table1 where rownum=2
select * from table1 where rownum=3
Someone please give me guidance why it works like that, and how to use the rownum.
Tom has an answer for many Oracle related questions
In short, rownum is available after the where clause has been applied and before the order by clause is applied.
In the case of RowNum=2, the predicate in the where clause will never evaluate to true as RowNum starts at 1 and only increases if records matching the predicate can be found.
Adding rownums is one of the last things done after the result set has been fetched from the database. This means that the first row will always have rownum 1. Rownum is better used when you want to limit the result set, for instance when doing paging.
See this for more: http://www.orafaq.com/wiki/ROWNUM
(Not an Oracle expert by any means)
From what I understand, rownum numbers the rows in a result set.
So, in your example:
select * from table1 where rownum=2
How many rows are there going to be in the result set? Therefore, what rownum would be assigned to such a row? Can you see now why no result is actually returned?
In general, you should avoid relying on rownum, or any features that imply an order to results. Try to think about working with the entire set of results.
With that being said, I believe the following would work:
select * from (select rownum as rn,table1.* from table1) as t where t.rn = 2
Because in that case, you're numbering the rows within the subquery.

What is the difference between HAVING and WHERE in SQL?

What is the difference between HAVING and WHERE in an SQL SELECT statement?
EDIT: I have marked Steven's answer as the correct one as it contained the key bit of information on the link:
When GROUP BY is not used, HAVING behaves like a WHERE clause
The situation I had seen the WHERE in did not have GROUP BY and is where my confusion started. Of course, until you know this you can't specify it in the question.
HAVING: is used to check conditions after the aggregation takes place.
WHERE: is used to check conditions before the aggregation takes place.
This code:
select City, CNT=Count(1)
From Address
Where State = 'MA'
Group By City
Gives you a table of all cities in MA and the number of addresses in each city.
This code:
select City, CNT=Count(1)
From Address
Where State = 'MA'
Group By City
Having Count(1)>5
Gives you a table of cities in MA with more than 5 addresses and the number of addresses in each city.
HAVING specifies a search condition for a
group or an aggregate function used in SELECT statement.
Source
Number one difference for me: if HAVING was removed from the SQL language then life would go on more or less as before. Certainly, a minority queries would need to be rewritten using a derived table, CTE, etc but they would arguably be easier to understand and maintain as a result. Maybe vendors' optimizer code would need to be rewritten to account for this, again an opportunity for improvement within the industry.
Now consider for a moment removing WHERE from the language. This time the majority of queries in existence would need to be rewritten without an obvious alternative construct. Coders would have to get creative e.g. inner join to a table known to contain exactly one row (e.g. DUAL in Oracle) using the ON clause to simulate the prior WHERE clause. Such constructions would be contrived; it would be obvious there was something was missing from the language and the situation would be worse as a result.
TL;DR we could lose HAVING tomorrow and things would be no worse, possibly better, but the same cannot be said of WHERE.
From the answers here, it seems that many folk don't realize that a HAVING clause may be used without a GROUP BY clause. In this case, the HAVING clause is applied to the entire table expression and requires that only constants appear in the SELECT clause. Typically the HAVING clause will involve aggregates.
This is more useful than it sounds. For example, consider this query to test whether the name column is unique for all values in T:
SELECT 1 AS result
FROM T
HAVING COUNT( DISTINCT name ) = COUNT( name );
There are only two possible results: if the HAVING clause is true then the result with be a single row containing the value 1, otherwise the result will be the empty set.
The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions.
Check out this w3schools link for more information
Syntax:
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value
A query such as this:
SELECT column_name, COUNT( column_name ) AS column_name_tally
FROM table_name
WHERE column_name < 3
GROUP
BY column_name
HAVING COUNT( column_name ) >= 3;
...may be rewritten using a derived table (and omitting the HAVING) like this:
SELECT column_name, column_name_tally
FROM (
SELECT column_name, COUNT(column_name) AS column_name_tally
FROM table_name
WHERE column_name < 3
GROUP
BY column_name
) pointless_range_variable_required_here
WHERE column_name_tally >= 3;
The difference between the two is in the relationship to the GROUP BY clause:
WHERE comes before GROUP BY; SQL evaluates the WHERE clause before it groups records.
HAVING comes after GROUP BY; SQL evaluates HAVING after it groups records.
References
SQLite SELECT Statement Syntax/Railroad Diagram
Informix SELECT Statement Syntax/Railroad Diagram
HAVING is used when you are using an aggregate such as GROUP BY.
SELECT edc_country, COUNT(*)
FROM Ed_Centers
GROUP BY edc_country
HAVING COUNT(*) > 1
ORDER BY edc_country;
WHERE is applied as a limitation on the set returned by SQL; it uses SQL's built-in set oeprations and indexes and therefore is the fastest way to filter result sets. Always use WHERE whenever possible.
HAVING is necessary for some aggregate filters. It filters the query AFTER sql has retrieved, assembled, and sorted the results. Therefore, it is much slower than WHERE and should be avoided except in those situations that require it.
SQL Server will let you get away with using HAVING even when WHERE would be much faster. Don't do it.
WHERE clause does not work for aggregate functions
means : you should not use like this
bonus : table name
SELECT name
FROM bonus
GROUP BY name
WHERE sum(salary) > 200
HERE Instead of using WHERE clause you have to use HAVING..
without using GROUP BY clause, HAVING clause just works as WHERE clause
SELECT name
FROM bonus
GROUP BY name
HAVING sum(salary) > 200
Difference b/w WHERE and HAVING clause:
The main difference between WHERE and HAVING clause is, WHERE is used for row operations and HAVING is used for column operations.
Why we need HAVING clause?
As we know, aggregate functions can only be performed on columns, so we can not use aggregate functions in WHERE clause. Therefore, we use aggregate functions in HAVING clause.
One way to think of it is that the having clause is an additional filter to the where clause.
A WHERE clause is used filters records from a result. The filter occurs before any groupings are made. A HAVING clause is used to filter values from a group
In an Aggregate query, (Any query Where an aggregate function is used) Predicates in a where clause are evaluated before the aggregated intermediate result set is generated,
Predicates in a Having clause are applied to the aggregate result set AFTER it has been generated. That's why predicate conditions on aggregate values must be placed in Having clause, not in the Where clause, and why you can use aliases defined in the Select clause in a Having Clause, but not in a Where Clause.
I had a problem and found out another difference between WHERE and HAVING. It does not act the same way on indexed columns.
WHERE my_indexed_row = 123 will show rows and automatically perform a "ORDER ASC" on other indexed rows.
HAVING my_indexed_row = 123 shows everything from the oldest "inserted" row to the newest one, no ordering.
When GROUP BY is not used, the WHERE and HAVING clauses are essentially equivalent.
However, when GROUP BY is used:
The WHERE clause is used to filter records from a result. The
filtering occurs before any groupings are made.
The HAVING clause is used to filter values from a group (i.e., to
check conditions after aggregation into groups has been performed).
Resource from Here
From here.
the SQL standard requires that HAVING
must reference only columns in the
GROUP BY clause or columns used in
aggregate functions
as opposed to the WHERE clause which is applied to database rows
While working on a project, this was also my question. As stated above, the HAVING checks the condition on the query result already found. But WHERE is for checking condition while query runs.
Let me give an example to illustrate this. Suppose you have a database table like this.
usertable{ int userid, date datefield, int dailyincome }
Suppose, the following rows are in table:
1, 2011-05-20, 100
1, 2011-05-21, 50
1, 2011-05-30, 10
2, 2011-05-30, 10
2, 2011-05-20, 20
Now, we want to get the userids and sum(dailyincome) whose sum(dailyincome)>100
If we write:
SELECT userid, sum(dailyincome) FROM usertable WHERE
sum(dailyincome)>100 GROUP BY userid
This will be an error. The correct query would be:
SELECT userid, sum(dailyincome) FROM usertable GROUP BY userid HAVING
sum(dailyincome)>100
WHERE clause is used for comparing values in the base table, whereas the HAVING clause can be used for filtering the results of aggregate functions in the result set of the query
Click here!
When GROUP BY is not used, the WHERE and HAVING clauses are essentially equivalent.
However, when GROUP BY is used:
The WHERE clause is used to filter records from a result. The
filtering occurs before any groupings are made.
The HAVING clause is
used to filter values from a group (i.e., to check conditions after
aggregation into groups has been performed).
I use HAVING for constraining a query based on the results of an aggregate function. E.G. select * in blahblahblah group by SOMETHING having count(SOMETHING)>0