here is the query
SELECT * FROM customers
WHERE
NOT EXISTS
(
SELECT 1 FROM brochure_requests
WHERE brochure_requests.first_name = customers.customer_first_name AND
brochure_requests.last_name = customers.customer_last_name
)
This query works just fine but I am not sure why it works. In the NOT EXISTS part SELECT 1 what is the 1 for. When I ran this query
select 1 from test2
Here were the results:
1
-----
1
1
1
1
1
1
1
1
1
1
1
..
How does the not exists query work?
The compiler is smart enough to ignore the actual SELECT in an EXISTS. So, basically, if it WOULD return rows because the filters match, that is all it cares about...the SELECT portion of the EXISTS never executes. It only uses the EXISTS clauses for evaluation purposes
I had this misconception for quite some time since you will see this SELECT 1 a lot. But, I have seen 42, *, etc....It never actually cares about the result, only that there would be one :). The key to keep in mind that SQL is a compiled language, so it will optimize this appropriately.
You could put a 1/0 and it will not throw a divide-by-zero exception...thus further proving that the result set is not evaluated. This is shown in this SQLFiddle
Code from Fiddle:
CREATE TABLE test (i int)
CREATE TABLE test2 (i int)
INSERT INTO test VALUES (1)
INSERT INTO test2 VALUES (1)
SELECT i
FROM test
WHERE EXISTS
(
SELECT 1/0
FROM test2
WHERE test2.i = test.i
)
And finally, more to your point, the NOT simply negates an EXISTS, saying to IGNORE any rows that match
The subquery is a correlated subquery joining between the customers and brochure_requests tables on the selected fields.
The EXISTS clause is simply a predicate that will only return the matching rows (and the NOT negates that).
The query :
select 1 from test2
shows you the value 1 as the value for all the records in test2 table.
Every SELECT query must have at least one column. I think that's why an unnamed column, which has the value 1, is used here.
The sub-query gives you the rows of the related Customers from the table brochure_requests.
NOT EXISTS causes the main query to return all the rows from the Customers table, which are not in the table brochure_requests.
The relational operator in question is known as 'antijoin' (alternatively 'not match' or 'semi difference'). In natural language: customers who do not match brochure_requests using the common attributes first_name and last_name.
A closely related operator is relational difference (alternatively 'minus' or 'except') e.g. in SQL
SELECT customer_last_name, customer_first_name
FROM customers
EXCEPT
SELECT last_name, first_name
FROM brochure_requests;
if customer requested a brochure, subquery returns 1 for this customer. and this customer not be added to return resultset. bcouse of NOT EXISTS clause.
Note: I don't know Oracle, and am not especially expert in SQL.
However, SELECT 1 from simply returns a 1 for every row matching the from clause. So the inner select can find a brochure_requests row whose name fields match those of the customer row currently being considered, it will produce a 1 result and fail the NOT EXISTS.
Hence the query selects all customers who do not have a brochure_request matching their name.
For each row of the table Customers, the query returns the rows when the sub-query.
NOT EXISTS returns no row.
If the sub-query in NOT EXISTS returns rows, then the rows of the table Customers are not returned.
Related
Select * from FMN_XX.order odr where
exists(
select (1) from FMN_XX.order_expired exp
where odr.order_id = exp.order_id
);
Above is the example query for exists. I have tried looking around and reading about it but I just can't get my head wrapped around it.
When I query individually the query inside the EXISTS bracket, it returns 1 as expected and no order_id from order_expired since I didn't query for column there.
But when I run the whole query, it returns the correct number of rows! My question is, how does it know the order_ID from order_expired table when I don't even query for order_id from the order_expired table? How does it compare to get the right rows?
Extra note: Currently, in the order table, I have 19779 rows and in order_expired table, I have 8506 rows. The final result I get when I added count at the outer query layer is 8506 rows, meaning, somewhat the EXISTS statement has filters the rows. If it should just returns if at least one order_id is hit... shouldn't the whole query returns the whole 19779 rows?
how does it know the order_ID from order_expired table when I don't even query for order_id from the order_expired table? How does it compare to get the right rows?
The condition from WHERE clause of the exists's SUBSELECT gives this information :
the odr.order_id is the column from main SELECT, whereas
the exp.order_id is the column from exists SUBSELECT
where odr.order_id = exp.order_id
if the condition above returns TRUE then the record will appear in the result set.
https://en.wikipedia.org/wiki/Correlated_subquery
Exists is similar to join - you delimit your output based on values in another table (or even the same table with different condition.).
The difference in useablity is that the exists function does not care for duplicit values, it checks only if there are query results existing with your condition.
In other words, if your table order_expired would be unique in column order_id, then you should get the same result from your query as from this query:
Select odr.* from FMN_XX.order odr
join FMN_XX.order_expired exp on odr.order_id = exp.order_id;
However if it is not unique then the join would delimit your results, but at the same time duplicate orders from order_expired.
One more difference is also, that with eixsts you cant use any values from the table inside the exists subquery - with join you can use any columns from joined tables.
You said:
When I query individually the query inside the EXISTS bracket, it returns 1 as expected and no order_id from order_expired since I didn't query for column there.
However, I guess that you haven't really used the EXISTS query as it would have been:
select (1) from FMN_XX.order_expired exp
where odr.order_id = exp.order_id
and it would return error because it doesn't know what odr is.
The clause where odr.order_id = exp.order_id is exactly what gives the correlation between the main query and the EXISTS subquery.
So, the query would be roughly translated in natural language as:
select all the orders that exist into the expired orders table by looking it up by the order_id field
I'd like to query the database as to whether or not one or more rows exist that satisfy a given predicate. However, I am not interested in the distinction between there being one such row, two rows or a million - just if there are 'zero' or 'one or more'. And I do not want Postgres to waste time producing an exact count that I do not need.
In DB2, I would do it like this:
SELECT 1 FROM SYSIBM.SYSDUMMY1 WHERE EXISTS
(SELECT 1 FROM REAL_TABLE WHERE COLUMN = 'VALUE')
and then checking if zero rows or one row was returned from the query.
But Postgres has no dummy table available, so what is the best option?
If I create a one-row dummy table myself and use that in place of SYSIBM.SYSDUMMY1, will the query optimizer be smart enough to not actually read that table when running the query, and otherwise 'do the right thing'?
PostgreSQL doesn't have a dummy table because you don't need one.
SELECT 1 WHERE EXISTS
(SELECT 1 FROM REAL_TABLE WHERE COLUMN = 'VALUE')
Alternatively if you want a true/false answer:
SELECT EXISTS(SELECT 1 FROM REAL_TABLE WHERE COLUMN = 'VALUE')
How about just doing this?
SELECT (CASE WHEN EXISTS (SELECT 1 FROM REAL_TABLE WHERE COLUMN = 'VALUE') THEN 1 ELSE 0 END)
1 means there is a value. 0 means no value.
This will always return one row.
If you are happy with "no row" if no row matches, you can even just:
SELECT 1 FROM real_table WHERE column = 'VALUE' LIMIT 1;
Performance is basically the same as with EXISTS. Key to performance for big tables is a matching index.
Employee table has ID and NAME columns. Names can be repeated. I want to find out if there is at least one row with name like 'kaushik%'.
So query should return true/false or 1/0.
Is it possible to find it using single query.
If we try something like
select count(1) from employee where name like 'kaushik%'
in this case it does not return true/false.
Also we are iterating over all the records in table. Is there way in simple SQL such that whenever first record which satisfies condition is fetched, it should stop checking further records.
Or such thing can only be handled in Pl/SQL block ?
EDIT *
First approach provided by Justin looks correct answer
SELECT COUNT(*) FROM employee WHERE name like 'kaushik%' AND rownum = 1
Commonly, you'd express this as either
SELECT COUNT(*)
FROM employee
WHERE name like 'kaushik%'
AND rownum = 1
where the rownum = 1 predicate allows Oracle to stop looking as soon as it finds the first matching row or
SELECT 1
FROM dual
WHERE EXISTS( SELECT 1
FROM employee
WHERE name like 'kaushik%' )
where the EXISTS clause allows Oracle to stop looking as soon as it finds the first matching row.
The first approach is a bit more compact but, to my eye, the second approach is a bit more clear since you really are looking to determine whether a particular row exists rather than trying to count something. But the first approach is pretty easy to understand as well.
How about:
select max(case when name like 'kraushik%' then 1 else 0 end)
from employee
Or, what might be more efficient since like can use indexes:
select count(x)
from (select 1 as x
from employee
where name like 'kraushik%'
) t
where rownum = 1
since you require that the sql query should return 1 or 0, then you can try the following query :-
select count(1) from dual
where exists(SELECT 1
FROM employee
WHERE name like 'kaushik%')
Since the above query uses Exists, then it will scan the employee table and as soon as it encounters the first record where name matches "kaushik", it will return 1 (without scanning the rest of the table). If none of the records match, then it will return 0.
select 1
where exists ( select name
from employee
where name like 'kaushik%'
)
I apologize in advance for my long-winded question and if the formatting isn't up to par (newbie), here goes.
I have a table MY_TABLE with the following schema -
MY_ID | TYPE | REC_COUNT
1 | A | 1
1 | B | 3
2 | A | 0
2 | B | 0
....
The first column corresponds to an ID, the second is some type and 3rd some count. NOTE that the MY_ID column is not the primary key, there could be many records having the same MY_ID.
I want to write a stored procedure which will take an array of IDs and return the subset of them that match the following criteria -
the ID should match the MY_ID field of at least 1 record in the table and at least 1 matching record should not have TYPE = A OR REC_COUNT = 0.
This is the procedure I came up with -
PROCEDURE get_id_subset(
iIds IN ID_ARRAY,
oMatchingIds OUT NOCOPY ID_ARRAY
)
IS
BEGIN
SELECT t.column_value
BULK COLLECT INTO oMatchingIds
FROM TABLE(CAST(iIds AS ID_ARRAY)) t
WHERE EXISTS (
SELECT /*+ NL_SJ */ 1
FROM MY_TABLE m
WHERE (m.my_id = t.column_value)
AND (m.type != 'A' OR m.rec_count != 0)
);
END get_id_subset;
But I really care about performance and some IDs could match 1000s of records in the table. There is an index on the MY_ID and TYPE column but no index on the REC_COUNT column. So I was thinking if there are more than 1000 rows that have a matching MY_ID field then I'll just return the ID without applying the TYPE and REC_COUNT predicates. Here's this version -
PROCEDURE get_id_subset(
iIds IN ID_ARRAY,
oMatchingIds OUT NOCOPY ID_ARRAY
)
IS
BEGIN
SELECT t.column_value
BULK COLLECT INTO oMatchingIds
FROM TABLE(CAST(iIds AS ID_ARRAY)) t, MY_TABLE m
WHERE (m.my_id = t.column_value)
AND ( ((SELECT COUNT(m.my_id) FROM m WHERE 1) >= 1000)
OR EXISTS (m.type != 'F' OR m.rec_count != 0)
);
END get_id_subset;
But this doesn't compile, I get the following error on the inner select -
PL/SQL: ORA-00936: missing expression
Is there another way of writing this? The inner select needs to work on the joined table.
And to clarify, I'm OK with the result set being different for this query. My assumption is that since there is an index on the my_id column, doing count(*) would be much cheaper than actually applying the rec_count predicate to 10000s of rows since there is no index on that column. Am I wrong?
I don't see your second query as being much if any improvement over the first. At best, the first subquery has to hit 1000 matching records in order to determine if the count is less than 1000, so I don't think it will save lots of work. Also it changes the actual result, and it's not clear from your description if you're saying that's OK as long as it's more efficient. (And if it is OK, then the business logic is very unclear -- why do the other conditions matter at all, if they don't matter when there's lots of records?)
You ask, "will the group by be applied before or after the predicate". I'm not clear what part of the query you're talking about, but logically speaking the order is always
Where predicates
Group By
Having predicates
The optimizer can change the order in which things are actually evaluated, but the result must always be logically equivalent to the above order of evaluation (barring optimizer bugs).
1000s of records is really not that much. Have you actually encountered a case where performance of the first query is unacceptable?
For either query, it may be better to rewrite the correlated EXISTS subquery as a non-correlated IN subquery. You need to test this.
You need to show actual execution plans to get more useful feedback.
Edit
For the kind of short-circuiting you're talking about, I think you need to rewrite your subquery (from the initial version of the query) like this (sorry, my first attempt at this wouldn't work because I tried to access a column from the top-level table in a sub-sub-query):
WHERE EXISTS (
SELECT /*+ NL_SJ */ 1
FROM MY_TABLE m
WHERE (m.my_id = t.column_value)
AND rownum <= 1000
HAVING MAX( CASE WHEN m.type != 'A' OR m.rec_count != 0 THEN 1 ELSE NULL END ) I S NOT NULL
OR MAX(rownum) >= 1000
)
That should force it to hit no more than 1,000 records per id, then return a row if either at least one row matches the conditions on type and rec_count, or the 1,000-record limit was reached. If you view the execution plan, you should expect to see a COUNT STOPKEY operation, which shows that Oracle is going to stop running a query block after a certain number of rows are returned.
I was looking at sql inner queries (bit like the sql equivalent of a C# anon method), and was wondering, can I return more than one value from a query?
For example, return the number of rows in a table as one output value, and also, as another output value, return the distinct number of rows?
Also, how does distinct work? Is this based on whether one field may be the same as another (thus classified as "distinct")?
I am using Sql Server 2005. Would there be a performance penalty if I return one value from one query, rather than two from one query?
Thanks
You could do your first question by doing this:
SELECT
COUNT(field1),
COUNT(DISTINCT field2)
FROM table
(For the first field you could do * if needed to count null values.)
Distinct means the definition of the word. It eliminates duplicate returned rows.
Returning 2 values instead of 1 would depend on what the values were, if they were indexed or not and other undetermined possible variables.
If you are meaning subqueries within the select statement, no you can only return 1 value. If you want more than 1 value you will have to use the subquery as a join.
If the inner query is inline in the SELECT, you may struggle to select multiple values. However, it is often possible to JOIN to a sub-query instead; that way, the sub-query can be named and you can get multiple results
SELECT a.Foo, a.Bar, x.[Count], x.[Avg]
FROM a
INNER JOIN (SELECT COUNT(1) AS [Count], AVG(something) AS [Avg]) x
ON x.Something = a.Something
Which might help.
DISTINCT does what it says. IIRC, you can SELECT COUNT(DISTINCT Foo) etc to query distinct data.
you can return multiple results in 3 ways (off the top of my head)
By having a select with multiple values eg: select col1, col2, col3
With multiple queries eg: select 1 ; select "2" ; select colA. you would get to them in a datareader by calling .NextRecord()
Using output parameters, declare the parameters before exec the query then get the value from them afterwards. eg: set #param1 = "2" . string myparam2 = sqlcommand.parameters["param1"].tostring()
Distinct, filters resulting rows to be unique.
Inner queries in the form:
SELECT * FROM tbl WHERE fld in (SELECT fld2 FROM tbl2 WHERE tbl.fld = tbl2.fld2)
cannot return multiple rows. When you need multiple rows from a secondary query, you usually need to do an inner join on the other query.
rows:
SELECT count(*), count(distinct *) from table
will return a dataset with one row containing two columns. Column 1 is the total number of rows in the table. Column 2 counts only distinct rows.
Distinct means the returned dataset will not have any duplicate rows. Distinct can only appear once usually directly after the select. Thus a query such as:
SELECT distinct a, b, c FROM table
might have this result:
a1 b1 c1
a1 b1 c2
a1 b2 c2
a1 b3 c2
Note that values are duplicated across the whole result set but each row is unique.
I'm not sure what your last question means. You should return from a query all the data relevant to the query. As for faster, only benchmarking can tell you which approach is faster.