SQL: How to properly check if a record exists - sql

While reading some SQL Tuning-related documentation, I found this:
SELECT COUNT(*) :
Counts the number of rows.
Often is improperly used to verify the existence of a record.
Is SELECT COUNT(*) really that bad?
What's the proper way to verify the existence of a record?

It's better to use either of the following:
-- Method 1.
SELECT 1
FROM table_name
WHERE unique_key = value;
-- Method 2.
SELECT COUNT(1)
FROM table_name
WHERE unique_key = value;
The first alternative should give you no result or one result, the second count should be zero or one.
How old is the documentation you're using? Although you've read good advice, most query optimizers in recent RDBMS's optimize SELECT COUNT(*) anyway, so while there is a difference in theory (and older databases), you shouldn't notice any difference in practice.

I would prefer not use Count function at all:
IF [NOT] EXISTS ( SELECT 1 FROM MyTable WHERE ... )
<do smth>
For example if you want to check if user exists before inserting it into the database the query can look like this:
IF NOT EXISTS ( SELECT 1 FROM Users WHERE FirstName = 'John' AND LastName = 'Smith' )
BEGIN
INSERT INTO Users (FirstName, LastName) VALUES ('John', 'Smith')
END

You can use:
SELECT 1 FROM MyTable WHERE <MyCondition>
If there is no record matching the condition, the resulted recordset is empty.

You can use:
SELECT 1 FROM MyTable WHERE... LIMIT 1
Use select 1 to prevent the checking of unnecessary fields.
Use LIMIT 1 to prevent the checking of unnecessary rows.

SELECT COUNT(1) FROM MyTable WHERE ...
will loop thru all the records. This is the reason it is bad to use for record existence.
I would use
SELECT TOP 1 * FROM MyTable WHERE ...
After finding 1 record, it will terminate the loop.

The other answers are quite good, but it would also be useful to add LIMIT 1 (or the equivalent, to prevent the checking of unnecessary rows.

You can use:
SELECT COUNT(1) FROM MyTable WHERE ...
or
WHERE [NOT] EXISTS
( SELECT 1 FROM MyTable WHERE ... )
This will be more efficient than SELECT * since you're simply selecting the value 1 for each row, rather than all the fields.
There's also a subtle difference between COUNT(*) and COUNT(column name):
COUNT(*) will count all rows, including nulls
COUNT(column name) will only count non null occurrences of column name

Other option:
SELECT CASE
WHEN EXISTS (
SELECT 1
FROM [MyTable] AS [MyRecord])
THEN CAST(1 AS BIT) ELSE CAST(0 AS BIT)
END

I'm using this way:
IF (EXISTS (SELECT TOP 1 FROM Users WHERE FirstName = 'John'), 1, 0) AS DoesJohnExist

Related

More than one row returned by a subquery used as an expression when UPDATE on multiple rows

I'm trying to update rows in a single table by splitting them into two "sets" of rows.
The top part of the set should have a status set to X and the bottom one should have a status set to status Y.
I've tried putting together a query that looks like this
WITH x_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
LIMIT 5
), y_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
OFFSET 5
)
UPDATE people
SET status = folks.status
FROM (values
((SELECT id from x_status), 'X'),
((SELECT id from y_status), 'Y')
) as folks (ids, status)
WHERE id IN (folks.ids);
When I run this query I get the following error:
pq: more than one row returned by a subquery used as an expression
This makes sense, folks.ids is expected to return a list of IDs, hence the IN clause in the UPDATE statement, but I suspect the problem is I can not return the list in the values statement in the FROM clause as it turns into something like this:
(1, 2, 3, 4, 5, 5)
(6, 7, 8, 9, 1)
Is there a way how this UPDATE can be done using a CTE query at all? I could split this into two separate UPDATE queries, but CTE query would be better and in theory faster.
I think I understand now... if I get your problem, you want to set the status to 'X' for the oldest five records and 'Y' for everything else?
In that case I think the row_number() analytic would work -- and it should do it in a single pass, two scans, and eliminating one order by. Let me know if something like this does what you seek.
with ranked as (
select
id, row_number() over (order by date_registered desc) as rn
from people
)
update people p
set
status = case when r.rn <= 5 then 'X' else 'Y' end
from ranked r
where
p.id = r.id
Any time you do an update from another data set, it's helpful to have a where clause that defines the relationship between the two datasets (the non-ANSI join syntax). This makes it iron-clad what you are updating.
Also I believe this code is pretty readable so it will be easier to build on if you need to make tweaks.
Let me know if I missed the boat.
So after more tinkering, I've come up with a solution.
The problem with why the previous query fails is we are not grouping the IDs in the subqueries into arrays so the result expands into a huge list as I suspected.
The solution is grouping the IDs in the subqueries into ARRAY -- that way they get returned as a single result (tuple) in ids value.
This is the query that does the job. Note that we must unnest the IDs in the WHERE clause:
WITH x_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
LIMIT 5
), y_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
OFFSET 5
)
UPDATE people
SET status = folks.status
FROM (values
(ARRAY(SELECT id from x_status), 'X'),
(ARRAY(SELECT id from y_status), 'Y')
) as folks (ids, status)
WHERE id IN (SELECT * from unnest(folks.ids));

Query to show all ID's except the ones that have a specific instrument

So I have a table which looks like this:
What would be the best query to exclude every "stuknr" that has atleast 1 "saxofoon" in the "instrumentnaam" column?
To select all stuknr where there is not stuknr with a saxofoon:
SELECT *
FROM table
WHERE stuknr not in (
SELECT stuknr
FROM TABLE
WHERE INSTRUMENTNAAM = ‘saxofoon’)
;
SELECT STUNKR
FROM TABLE
WHERE INSTRUMENTNAAM != ‘saxofoon’;
Here != (or <>, which are equivalent, see this) means "not equal".
You can do this with a NOT IN sub-select
select distinct
YT.stunkr
from
YourTable YT
where YT.stunkr NOT IN ( select distinct stunkr
from YourTable
where InstrumentNaam = 'saxofoon' )
So the sub-select is getting all IDs that DO have saxofoon and the primary select/from is getting where the ID is NOT in the secondary.

Why COUNT(*) is equal to 1 without FROM clause? [duplicate]

This question already has an answer here:
Why does "select count(*)" from nothing return 1
(1 answer)
Closed 7 years ago.
for a quick check I used a query
select COUNT(*) LargeTable
and was surprized to see
LargeTable
-----------
1
seconds later I realized my mistake, made it
select COUNT(*) from LargeTable
and got expected result
(No column name)
-----------
1.000.000+
but now I don't understand why COUNT(*) returned 1
it happens if I do select COUNT(*) or declare #x int = COUNT(*); select #x
another case
declare #EmptyTable table ( Value int )
select COUNT(*) from #EmptyTable
returns
(No column name)
-----------
0
I did't find explanation in SQL standard (http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt, online source is given here https://stackoverflow.com/a/8949764/1506454)
why COUNT(*) returns 1?
In SQL Server a SELECT without a FROM clause works as though it operates against a single row table.
This is not standard SQL. Other RDBMSs provide a utility DUAL table with a single row.
So this would be treated effectively the same as
SELECT COUNT(*) AS LargeTable
FROM DUAL
A related Connect Item discussing
SELECT 'test'
WHERE EXISTS (SELECT *)
is https://connect.microsoft.com/SQLServer/feedback/details/671475/select-test-where-exists-select
Because without the FROM clause DBMS cannot know [LargeTable] is a table. You tricked it in guessing it's a COLUMN NAME alias
You can try it and see
select count(*) 'eklmnjdklfgm'
select count(*) eklmnjdklfgm
select count(*) [eklmnjdklfgm]
select count(*)
The first 3 examples returns eklmnjdklfgm as column name
Count(*) returned 1 because your sentence is not of SQL.
1) In the first sentences, you accounts a table empty, with an only row, since you don't put to which table want to access.
2) In the second case:
declare #EmptyTable table ( Value int )
select COUNT(*) from #EmptyTable
returns
(No column name)
-----------
0
You put the variable, but don't determine to which table implement him. For this, do a count and go out 0.

SQL get rows matching ALL conditions

I would like to retrieve all rows matching a set of conditions on the same column. But I would like the rows only if ALL the conditions are good, and no row if only one condition fails.
For example, taking this table:
|id|name|
---------
|1 |toto|
|2 |tata|
I would like to be able to request if "tata" && "toto" are in this table. But when asking if "tata" and "tuto" are in, I would like an empty response if one of argument is in not in the table, for example asking if "toto" && "tutu" are included in the table.
How can I do that ?
Currently, I'am doing one query per argument, which is not very efficient. I tried several solutions including a subselect or a group+having, but no one is working like I want.
thanks for your support !
cheers
This isn't the most efficient way, but this query would work.
SELECT * FROM table_name
WHERE (name = 'toto' OR name = 'tata')
AND ( SELECT COUNT(*) FROM table_name WHERE name = 'toto') > 0
AND ( SELECT COUNT(*) FROM table_name WHERE name = 'tata') > 0
This is a little vague. If the names are unique, you could count the matching rows that match a where clause:
where name='toto' or name='tata'
If the count is 2, then you know both matched. If name is not unique you could potentially select the first ID (select top 1 id ...) that matches each in a union and count those with an outer select.
Even if you had an arbitrary number of names to match, you could create a stored procedure or code in whatever top-level language you are using to build the select statement.
SELECT 1 AS found FROM hehe
WHERE 1 IN (SELECT 1 FROM hehe WHERE name='tata')
AND 1 IN (SELECT 1 FROM hehe WHERE name='toto')
If name is unique you can simplify to:
SELECT *
FROM tbl
WHERE name IN ('toto', 'tata')
AND (SELECT count(*) FROM tbl WHERE name IN ('toto', 'tata')) > 1;
If it isn't:
SELECT *
FROM tbl
WHERE name IN ('toto', 'tata')
AND EXISTS (SELECT * FROM tbl WHERE name = 'toto')
AND EXISTS (SELECT * FROM tbl WHERE name = 'tata');
Or, in PostgreSQL, MySQL and possibly others:
SELECT *
FROM tbl
WHERE name IN ('toto', 'tata')
AND (SELECT count(DISTINCT name) FROM tbl WHERE name IN ('toto', 'tata')) > 1;

SQL First Match or just First

Because this is part of a larger SQL SELECT statement, I want a wholly SQL query that selects the first item matching a criteria or, if no items match the criteria, just the first item.
I.e. using Linq I want:
Dim t1 = From t In Tt
Dim t2 = From t In t1 Where Criteria(t)
Dim firstIfAny = From t In If(t2.Any, t2, t1) Take 1 Select t
Because If is not part of Linq, LinqPad doesn't show a single SQL statement, but two, the second depending upon whether the Criteria matches any of the Tt values.
I know it will be SELECT TOP 1 etc. and I can add ORDER BY clauses to get the specific first one I want, but I'm having trouble thinking of the most straightforward way to get the first of two criteria. (It was at exactly this point when I was able to solve this myself.)
Seeing as I don't see an existing question for this, I will let it stand. I'm sure someone else will see the answer quickly.
select top 1 *
from (
select top 1 *, 1 as Rank from MyTable where SomeColumn = MyCriteria
union all
select top 1 *, 2 as Rank from MyTable order by MyOrderColumn
) a
order by Rank
I've gone with this:
SELECT TOP 1 *
FROM MyTable
WHERE SomeColumn = MyCriteria
OR NOT (EXISTS (SELECT NULL FROM MyTable WHERE SomeColumn = MyCriteria))
ORDER BY MyOrdering
My actual SomeColumn = MyCriteria is rather more complex of course, as well as other unrelated where clauses.