SQL Server, IN operator show different results - sql

Working on SQL server,i was trying to count the total number of rows based on some criteria as shown in the simple query bellow:
SELECT count(*) FROM my_table WHERE (supp=92 OR (supp=94 and organisation <> 'LDF'))
knowing that my_table do not contain any rows with supp=92 and organisation='LDF' i decided to make the query simpler like the following:
SELECT count(*) FROM my_table WHERE supp IN (92,94) and organisation <> 'LDF'
The results of the two queries were totally different.
Yet the queries look totally the same for me, i've been trying to figure out where the problem is, but i couldn't find an answer.
it's really confusing to me, thank you in advance for your answers.

If your organisation column contained any NULL values where supp=92, the first query would return them but the second query would exclude them. This is because this code:
organisation <> 'LDF'
would return 'NULL', not 'TRUE', if the organisation field is NULL.
I'd recommend you read Robert Shelden's excellent article on all the ways that NULL can trip you up; the entire article is worth reading, but the ninth point talks about this specific scenario.

They are not the same
First:
SELECT COUNT(*)
FROM my_table
WHERE supp=92
OR (supp=94 AND organisation <> 'LDF')
Second is equivalent to:
SELECT COUNT(*)
FROM my_table
WHERE (supp = 92 OR supp = 94)
AND organisation <> 'LDF'

Related

Selecting the biggest ZIP code from a column

I want to get the biggest ZIP code in DB. Normally I do this
SELECT *
FROM (
Select * From tbuser ORDER BY zip DESC
)
WHERE rownum = 1
with this code I can get the biggest zip code value without a duplicate row (since zip code is not a primary key).
But the main company at Japan said that I cant use it since when the connection is slow or the DB have very large data, you cant get the right row of it. It will be a great help for me if someone can helps.
I want to get the biggest ZIP code in DB.
If you really only want the zip code, try that:
SELECT MAX(zip) FROM TBUSER;
This will use the index on the zip column (if it exists).
That being said, Oracle is usually smart enough to properly optimize sub-query selection using ROWNUM. Maybe your main company is more concerned about the possible "full table" ̀€ORDER BY` in the subquery ? OTH, if the issue is really with "slow network", maybe worth taking some time with your DBA to look on the wire using a network analyzer or some other tool if your approach really leads to "excessive bandwidth consumption". I sincerely doubt about that...
If you want to retrieve the whole row having the maximum zip code here is a slight variation on an other answer (in my opinion, this is one of the rare case for using a NATURAL JOIN):
select * from t
natural join (select max(zip) zip from t);
Of course, in case of duplicates, this will return multiple rows. You will have to combine that with one of the several options posted in the various other answers to return only 1 row.
As an extra solution, and since you are not allowed to use ROWNUM (and assuming row_number is arbitrary forbidden too), you can achieve the desired result using something as contrived as:
select * from t
where rowid = (
select min(t.rowid) rid from t
natural join (select max(zip) zip from t)
);
See http://sqlfiddle.com/#!4/3bd63/5
But honestly, there isn't any serious reason to hope that such query will perform better than the simple ... ORDER BY something DESC) WHERE rownum <= 1 query.
This sounds to me like bad advice (masquerading as a rule) from a newbie data base administrator who doesn't understand what he's looking at. That insight isn't going to help you, though. Rarely does a conversation starting with "you're an obstructionist incompetent" achieve anything.
So, here's the thing. First of all, you need to make sure there's an index on your zip column. It doesn't have to be a primary key.
Second, you can try explaining that Oracle's table servers do, in fact, optimize the ... ORDER BY something DESC) WHERE rownum <= 1 style of query. Their servers do a good job of that. Your use case is very common.
But if that doesn't work on your DBA, try saying "I heard you" and do this.
SELECT * FROM (
SELECT a.*
FROM ( SELECT MAX(zip) zip FROM zip ) b
JOIN ZIP a ON (a.zip = b.zip)
) WHERE rownum <= 1
This will get one row with the highest numbered zip value without the ORDER BY that your DBA mistakenly believes is messing up his server's RAM pool. And, it's reasonably efficient. As long as zip has an index.
As you are looking for a way to get the desired record without rownum now, ...
... here is how to do it from Oracle 12c onward:
select *
from tbuser
order by zip desc fetch first 1 row only;
... and here is how to do it before Oracle 12c:
select *
from (select tbuser.*, row_number() over(order by zip desc) as rn from tbuser)
where rn = 1;
EDIT: As Sylvain Leroux pointed out, it is more work for the dbms to sort all records rather than just find the maximum. Here is a max query without rownum:
select *
from tbuser where rowid =
(select max(rowid) keep (dense_rank last order by zip) from tbuser);
But as Sylvain Leroux also mentioned, it makes also a difference whether there is an index on the column. Some tests I did show that with an index on the column, the analytic functions are slower than the traditional functions. Your original query would just get into the index, go to the highest value, pick the record and then stop. You won't get this any faster. My last mentioned query being quite fast on a none-indexed column is slower than yours on an indexed column.
Your requirements seem arbitrary, but this should give you the result you've requested.
SELECT *
FROM (SELECT * FROM tbuser
WHERE zip = (SELECT MAX(zip) FROM tbuser))
WHERE rownum = 1
OK - try something like this:
SELECT *
FROM TBUSER
WHERE ZIP = (SELECT MAX(ZIP) FROM TBUSER);
Fetch a single row from a cursor based on the above statement, then close the cursor. If you're using PL/SQL you could do it like this:
FOR aRow IN (SELECT *
FROM TBUSER
WHERE ZIP = (SELECT MAX(ZIP) FROM TBUSER))
LOOP
-- Do something with aRow
-- then force an exit from the loop
EXIT;
END LOOP;
Share and enjoy.
I was wondering that nobody posted this answer yet. I think that is the way, you should do something like that.
SELECT *
FROM (
Select a.*, max(zip) over () max_zip
From tbuser a
)
WHERE zip=max_zip
and rownum = 1
Your query gets exactly one random row of all records having the max zip code. So it cannot be the problem that you retrieve a record with another zip code or more than one record or zero records (as long as there is at least one record in the table).
Maybe Japan simply expects one of the other rows with that zip code? Then you may just have to add another order criteria to get that particular desired row.
Another thought: As they are talking about slow connection speed, it may also be that they enter a new max zip code on one session, query with another and get the old max zip, because the insert statement of the other session hasn't gone through yet. But well, that's just the way this works of course.
BTW: A strange thing to select a maximum zip code. I guess that's just an example to illustrate the problem?
IF you are getting multiple records using MAX function (which is not possible, but in your case you are getting, I don't know how until you post screenshot) then You can use DISTINCT in your sql query to get single record
SELECT DISTINCT MAX(zipcode) FROM TableUSER
SQL FIDDLE

efficiently find subset of records as well as total count

I'm writing a function in ColdFusion that returns the first couple of records that match the user's input, as well as the total count of matching records in the entire database. The function will be used to feed an autocomplete, so speed/efficiency are its top concerns. For example, if the function receives input "bl", it might return {sampleMatches:["blue", "blade", "blunt"], totalMatches:5000}
I attempted to do this in a single query for speed purposes, and ended up with something that looked like this:
select record, count(*) over ()
from table
where criteria like :criteria
and rownum <= :desiredCount
The problem with this solution is that count(*) over () always returns the value of :desiredCount. I saw a similar question to mine here, but my app will not have permissions to create a temp table. So is there a way to solve my problem in one query? Is there a better way to solve it? Thanks!
I'm writing this on top of my head, so you should definitely have to time this, but I believe that using following CTE
only requires you to write the conditions once
only returns the amount of records you specify
has the correct total count added to each record
and is evaluated only once
SQL Statement
WITH q AS (
SELECT record
FROM table
WHERE criteria like :criteria
)
SELECT q1.*, q2.*
FROM q q1
CROSS JOIN (
SELECT COUNT(*) FROM q
) q2
WHERE rownum <= :desiredCount
A nested subquery should return the results you want
select record, cnt
from (select record, count(*) over () cnt
from table
where criteria like :criteria)
where rownum <= :desiredCount
This will, however, force Oracle to completely process the query in order to generate the accurate count. This seems unlikely to be what you want if you're trying to do an autocomplete particularly when Oracle may decide that it would be more efficient to do a table scan on table if :criteria is just b since that predicate isn't selective enough. Are you really sure that you need a completely accurate count of the number of results? Are you sure that your table is small enough/ your system is fast enough/ your predicates are selective enough for that to be a requirement that you could realistically meet? Would it be possible to return a less-expensive (but less-accurate) estimate of the number of rows? Or to limit the count to something smaller (say, 100) and have the UI display something like "and 100+ more results"?

What is the meaning of a constant in a SELECT query?

Considering the 2 below queries:
1)
USE AdventureWorks
GO
SELECT a.ProductID, a.ListPrice
FROM Production.Product a
WHERE EXISTS (SELECT 1 FROM Sales.SalesOrderDetail b
WHERE b.ProductID = a.ProductID)
2)
USE AdventureWorks
GO
SELECT a.ProductID, a.Name, b.SalesOrderID
FROM Production.Product a LEFT OUTER JOIN Sales.SalesOrderDetail b
ON a.ProductID = b.ProductID
ORDER BY 1
My only question is know what is the meaning of the number 1 in those queries? How about if I change them to 2 or something else?
Thanks for helping
In the first case it does not matter; you can select a 2 or anything, really, because it is an existence query. In general selecting a constant can be used for other things besides existence queries (it just drops the constant into a column in the result set), but existence queries are where you are most likely to encounter a constant.
For example, given a table called person containing three columns, id, firstname, lastname, and birthdate, you can write a query like this:
select firstname, 'YAY'
from person
where month(birthdate) = 6;
and this would return something like
name 'YAY'
---------------
Ani YAY
Sipho YAY
Hiro YAY
It's not useful, but it is possible. The idea is that in a select statement you select expressions, which can be not only column names but constants and function calls, too. A more likely case is:
select lastname||','||firstname, year(birthday)
from person;
Here the || is the string concatenation operator, and year is a function I made up.
The reason you sometimes see 1 in existence queries is this. Suppose you only wanted to know whether there was a person whose name started with 'H', but you didn't care who this person was. You can say
select id
from person
where lastname like 'H%';
but since we don't need the id, you can also say
select 1
from person
where lastname like 'H%';
because all you care about is whether or not you get a non-empty result set or not.
In the second case, the 1 is a column number; it means you want your results sorted by the value in the first column. Changing that to a 2 would order by the second column.
By the way, another place where constants are selected is when you are dumping from a relational database into a highly denormalized CSV file that you will be processing in NOSQL-like systems.
In the second case the 1 is not a literal at all. Rather, it is an ordinal number, indicating that the resultset should be sorted by its first column. If you changed the 1 to 4 the query would fail with an error because the resultset only has three columns.
BTW, the reason you use a constant like 1 instead of using an actual column is you avoid the I/O of actually getting the column value. This may improve performance.

Why can't I GROUP BY 1 when it's OK to ORDER BY 1?

Why are column ordinals legal for ORDER BY but not for GROUP BY? That is, can anyone tell me why this query
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY OrgUnitID
cannot be written as
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY 1
When it's perfectly legal to write a query like
SELECT OrgUnitID FROM Employee AS e ORDER BY 1
?
I'm really wondering if there's something subtle about the relational calculus, or something, that would prevent the grouping from working right.
The thing is, my example is pretty trivial. It's common that the column that I want to group by is actually a calculation, and having to repeat the exact same calculation in the GROUP BY is (a) annoying and (b) makes errors during maintenance much more likely. Here's a simple example:
SELECT DATEPART(YEAR,LastSeenOn), COUNT(*)
FROM Employee AS e
GROUP BY DATEPART(YEAR,LastSeenOn)
I would think that SQL's rule of normalize to only represent data once in the database ought to extend to code as well. I'd want to only right that calculation expression once (in the SELECT column list), and be able to refer to it by ordinal in the GROUP BY.
Clarification: I'm specifically working on SQL Server 2008, but I wonder about an overall answer nonetheless.
One of the reasons is because ORDER BY is the last thing that runs in a SQL Query, here is the order of operations
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
so once you have the columns from the SELECT clause you can use ordinal positioning
EDIT, added this based on the comment
Take this for example
create table test (a int, b int)
insert test values(1,2)
go
The query below will parse without a problem, it won't run
select a as b, b as a
from test
order by 6
here is the error
Msg 108, Level 16, State 1, Line 3
The ORDER BY position number 6 is out of range of the number of items in the select list.
This also parses fine
select a as b, b as a
from test
group by 1
But it blows up with this error
Msg 164, Level 15, State 1, Line 3
Each GROUP BY expression must contain at least one column that is not an outer reference.
There is a lot of elementary inconsistencies in SQL, and use of scalars is one of them. For example, anyone might expect
select * from countries
order by 1
and
select * from countries
order by 1.00001
to be a similar queries (the difference between the two can be made infinitesimally small, after all), which are not.
I'm not sure if the standard specifies if it is valid, but I believe it is implementation-dependent. I just tried your first example with one SQL engine, and it worked fine.
use aliasses :
SELECT DATEPART(YEAR,LastSeenOn) as 'seen_year', COUNT(*) as 'count'
FROM Employee AS e
GROUP BY 'seen_year'
** EDIT **
if GROUP BY alias is not allowed for you, here's a solution / workaround:
SELECT seen_year
, COUNT(*) AS Total
FROM (
SELECT DATEPART(YEAR,LastSeenOn) as seen_year, *
FROM Employee AS e
) AS inline_view
GROUP
BY seen_year
databases that don't support this basically are choosing not to. understand the order of the processing of the various steps, but it is very easy (as many databases have shown) to parse the sql, understand it, and apply the translation for you. Where its really a pain is when a column is a long case statement. having to repeat that in the group by clause is super annoying. yes, you can do the nested query work around as someone demonstrated above, but at this point it is just lack of care about your users to not support group by column numbers.

Sql Distinct Count of resulting table with no conditionals

I want to count the number of accounts from the resulting table generated from this code. This way, I know how many people liked blue at one time.
Select Distinct PEOPLE.FullName, PEOPLE.FavColor From PEOPLE
Where FavColor='Blue'
Lets say this is a history accounting of what people said their favorite color when they were asked so there may be multiple records of the same full name if asked again at a much later time; hence the distinct.
The code I used may not be reusable in your answer so feel free to use what you think can work. I am sure I found a possible solution to my problem using declare and if statements but I lost that page... so I am left with no solution. However, I think there is a way to do it without using conditionals which is what I am asking and rather have. Thanks.
Edit: My question is: From the code above, is there a way to count the number of accounts in the resulting table?
If I understand what you are asking correctly (how many people liked blue at one time?), try this:
select count(*)
from PEOPLE
where FavColor = 'Blue'
group by FullName
If your question is in fact, how can I count the results of any select query?, you can do this:
Suppose your original query is:
select MyColumn
from MyTable
where MyOtherColumn = 26
You can wrap it in another query to get the count
select count(*)
from (
select MyColumn
from MyTable
where MyOtherColumn = 26
) a
Select Count (Distinct PEOPLE.FullName)
From PEOPLE
Where FavColor='Blue'
Are you saying that you want to generate the count with no WHERE clause?
How about this?
SELECT
count(*)
FROM
people
INNER JOIN (SELECT FavColor = 'Blue') col ON col.FavColor = people.FavColor
Edit: OK, I see what you want now.
You just need to wrap your query in SELECT count(*) FROM ( <your-query-goes-here> ).
I think this may be what you want.
Select Count(*) as 'NumberOfPeople'
From
(
Select Distinct PEOPLE.FullName, PEOPLE.FavColor From PEOPLE
Where FavColor='Blue'
)a
If the same person has multiple answers, that is, their favourite colour has changed over time - resulting in several colours, some repeating, for the same person (aka "account") then to find the number of accounts where the favourite colour is / was blue (oracle):
select count(*)
from (select FULLNAME
from PEOPLE
where FAVCOLOR = 'BLUE'
group by FULLNAME);