Best way to test if a row exists in a MySQL table - sql

I'm trying to find out if a row exists in a table. Using MySQL, is it better to do a query like this:
SELECT COUNT(*) AS total FROM table1 WHERE ...
and check to see if the total is non-zero or is it better to do a query like this:
SELECT * FROM table1 WHERE ... LIMIT 1
and check to see if any rows were returned?
In both queries, the WHERE clause uses an index.

You could also try EXISTS:
SELECT EXISTS(SELECT * FROM table1 WHERE ...)
and per the documentation, you can SELECT anything.
Traditionally, an EXISTS subquery starts with SELECT *, but it could
begin with SELECT 5 or SELECT column1 or anything at all. MySQL
ignores the SELECT list in such a subquery, so it makes no difference.

I have made some researches on this subject recently. The way to implement it has to be different if the field is a TEXT field, a non unique field.
I have made some tests with a TEXT field. Considering the fact that we have a table with 1M entries. 37 entries are equal to 'something':
SELECT * FROM test WHERE text LIKE '%something%' LIMIT 1 with
mysql_num_rows() : 0.039061069488525s. (FASTER)
SELECT count(*) as count FROM test WHERE text LIKE '%something% :
16.028197050095s.
SELECT EXISTS(SELECT 1 FROM test WHERE text LIKE '%something%') :
0.87045907974243s.
SELECT EXISTS(SELECT 1 FROM test WHERE text LIKE '%something%' LIMIT 1) : 0.044898986816406s.
But now, with a BIGINT PK field, only one entry is equal to '321321' :
SELECT * FROM test2 WHERE id ='321321' LIMIT 1 with
mysql_num_rows() : 0.0089840888977051s.
SELECT count(*) as count FROM test2 WHERE id ='321321' : 0.00033879280090332s.
SELECT EXISTS(SELECT 1 FROM test2 WHERE id ='321321') : 0.00023889541625977s.
SELECT EXISTS(SELECT 1 FROM test2 WHERE id ='321321' LIMIT 1) : 0.00020313262939453s. (FASTER)

A short example of #ChrisThompson's answer
Example:
mysql> SELECT * FROM table_1;
+----+--------+
| id | col1 |
+----+--------+
| 1 | foo |
| 2 | bar |
| 3 | foobar |
+----+--------+
3 rows in set (0.00 sec)
mysql> SELECT EXISTS(SELECT 1 FROM table_1 WHERE id = 1);
+--------------------------------------------+
| EXISTS(SELECT 1 FROM table_1 WHERE id = 1) |
+--------------------------------------------+
| 1 |
+--------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT EXISTS(SELECT 1 FROM table_1 WHERE id = 9);
+--------------------------------------------+
| EXISTS(SELECT 1 FROM table_1 WHERE id = 9) |
+--------------------------------------------+
| 0 |
+--------------------------------------------+
1 row in set (0.00 sec)
Using an alias:
mysql> SELECT EXISTS(SELECT 1 FROM table_1 WHERE id = 1) AS mycheck;
+---------+
| mycheck |
+---------+
| 1 |
+---------+
1 row in set (0.00 sec)

In my research, I can find the result getting on following speed.
select * from table where condition=value
(1 total, Query took 0.0052 sec)
select exists(select * from table where condition=value)
(1 total, Query took 0.0008 sec)
select count(*) from table where condition=value limit 1)
(1 total, Query took 0.0007 sec)
select exists(select * from table where condition=value limit 1)
(1 total, Query took 0.0006 sec)

I feel it is worth pointing out, although it was touched on in the comments, that in this situation:
SELECT 1 FROM my_table WHERE *indexed_condition* LIMIT 1
Is superior to:
SELECT * FROM my_table WHERE *indexed_condition* LIMIT 1
This is because the first query can be satisfied by the index, whereas the second requires a row look up (unless possibly all the table's columns are in the index used).
Adding the LIMIT clause allows the engine to stop after finding any row.
The first query should be comparable to:
SELECT EXISTS(SELECT * FROM my_table WHERE *indexed_condition*)
Which sends the same signals to the engine (1/* makes no difference here), but I'd still write the 1 to reinforce the habit when using EXISTS:
SELECT EXISTS(SELECT 1 FROM my_table WHERE *indexed_condition*)
It may make sense to add the EXISTS wrapping if you require an explicit return when no rows match.

Suggest you not to use Count because count always makes extra loads for db use SELECT 1 and it returns 1 if your record right there otherwise it returns null and you can handle it.

At times it is quite handy to get the auto increment primary key (id) of the row if it exists and 0 if it doesn't.
Here's how this can be done in a single query:
SELECT IFNULL(`id`, COUNT(*)) FROM WHERE ...

A COUNT query is faster, although maybe not noticeably, but as far as getting the desired result, both should be sufficient.

For non-InnoDB tables you could also use the information schema tables:
http://dev.mysql.com/doc/refman/5.1/en/tables-table.html

I'd go with COUNT(1). It is faster than COUNT(*) because COUNT(*) tests to see if at least one column in that row is != NULL. You don't need that, especially because you already have a condition in place (the WHERE clause). COUNT(1) instead tests the validity of 1, which is always valid and takes a lot less time to test.

Or you can insert raw sql part to conditions
so I have
'conditions'=>array('Member.id NOT IN (SELECT Membership.member_id FROM memberships AS Membership)')

COUNT(*) are optimized in MySQL, so the former query is likely to be faster, generally speaking.

Related

Where clause to select rows with only unique values

firstly let me describe you my problem. I need to ignore all repeated values in my select query. So for example if I have something like that:
| Other columns| THE COLUMN I'm working with |
| ............ | Value 1 |
| ............ | Value 2 |
| ............ | Value 2 |
I'd like to get the result containing only the row with "Value 1"
Now because of the specifics of my task I need to validate it with subquery.
So I've figured out something like this:
NOT EXISTS (SELECT 1 FROM TABLE fpd WHERE fpd.value = fp.value HAVING count(*) > 2)
It works like I want, but I'm aware of it being slow. Also I've tried putting 1 instead of 2 in HAVING comprassion, but it just returns zero results. Could you explain where does the 2 value come from?
I would suggest window functions:
select t.*
from (select t.*, count(*) over (partition by value) as cnt
from fpd t
) t
where cnt = 1;
Alternatively, you can use not exists with a primary key:
where not exists (select 1
from fpd fpd2
where fpd2.value = fp.value and
fpd2.primarykey <> fp.primarykey
)
SELECT DISTINCT myColumn FROM myTable

SQL simplifying an except query

I have a database with around 50 million entries showing the status of a device for a given day, simplified to the form:
id | status
-------------
1 | Off
1 | Off
1 | On
2 | Off
2 | Off
3 | Off
3 | Off
3 | On
...
such that each id is guaranteed to have at least 2 rows with an 'off' status, but doesn't have to have an 'on' status. I'm trying to get a list of only the ids that do not have an 'On' status. For example, in the above data set I'd want a query returned with only '2'
The current query is:
SELECT DISTINCT id FROM table
EXCEPT
SELECT DISTINCT id FROM table WHERE status <> 'Off'
Which seems to work, but it's having to iterate over the entire table twice which ends up taking ~10-12 minutes to run per query. Is there a simpler way to do this with only a single query?
You can use WHERE NOT EXISTS instead:
Select Distinct Id
From Table A
Where Not Exists
(
Select *
From Table B
Where A.Id = B.Id
And B.Status = 'On'
)
I would also recommend looking at the indexes on the Status column. 10-12 minutes to run is excessively long. Even with 50m records, with proper indexing, a query like this shouldn't take longer than a second.
To add an index to the column, you can run this (I'm assuming SQL Server, your syntax may vary):
Create NonClustered Index Ix_YourTable_Status On YourTable (Status Asc);
You can use conditional aggregation.
select id
from table
group by id
having count(case when status='On' then 1 end)=0
You can use the help of a SELF JOIN ..
SELECT DISTINCT A.Id
FROM Table A
LEFT JOIN Table B ON A.Id=B.Id
WHERE B.Status='On'
AND B.Id IS NULL

How to efficiently get a value from the last row in bulk on SQL Server

I have a table like so
Id | Type | Value
--------------------
0 | Big | 2
1 | Big | 3
2 | Small | 3
3 | Small | 3
I would like to get a table like this
Type | Last Value
--------------------
Small | 3
Big | 3
How can I do this. I understand there is an SQL Server method called LAST_VALUE(...) OVER .(..) but I can't get this to work with GROUP BY.
I've also tried using SELECT MAX(ID) & SELECT TOP 1.. but this seems a bit inefficient since there would be a subquery for each value. The queries take too long when the table has a few million rows in it.
Is there a way to quickly get the last value for these, perhaps using LAST_VALUE?
You can do it using rownumber:
select
type,
value
from
(
select
type,
value,
rownumber() over (partition by type order by id desc) as RN
) TMP
where RN = 1
Can't test this now since SQL Fiddle doesn't seem to work, but hopefully that's ok.
The most efficient method might be not exists, which uses an anti-join for the underlying operator:
select type, value
from likeso l
where not exists (select 1 from likeso l2 where l2.type = l.type and l2.id > l.id)
For performance, you want an index on likeso(type, id).
I really wonder if there is more efficent solution but, I use following query on such needs;
Select Id, Type, Value
From ( Select *, Max (Id) Over (Partition By Type) As LastId
From #Table) T
Where Id = LastId

How to get an array in postgres where the array size is greater than 1

I have a table that looks like this:
val | fkey | num
------------------
1 | 1 | 1
1 | 2 | 1
1 | 3 | 1
2 | 3 | 1
What I would like to do is return a set of rows in which values are grouped by 'val', with an array of fkeys, but only where the array of fkeys is greater than 1. So, in the above example, the return would look something like:
1 | [1,2,3]
I have the following query aggregates the arrays:
SELECT val, array_agg(fkey)
FROM mytable
GROUP BY val;
But this returns something like:
1 | [1,2,3]
2 | [3]
What would be the best way of doing this? I guess one possibility would be to use my existing query as a subquery, and do a sum / count on that, but that seems inefficient. Any feedback would really help!
Use Having clause to filter the groups which is having more than fkey
SELECT val, array_agg(fkey)
FROM mytable
GROUP BY val
Having Count(fkey) > 1
Using the HAVING clause as #Fireblade pointed out is probably more efficient, but you can also leverage subqueries:
SQLFiddle: Subquery
SELECT * FROM (
select val, array_agg(fkey) fkeys
from mytable
group by val
) array_creation
WHERE array_length(fkeys,1) > 1
You could also use the array_length function in the HAVING clause, but again, #Fireblade has used count(), which should be more efficient. Still:
SQLFiddle: Having Clause
SELECT val, array_agg(fkey) fkeys
FROM mytable
GROUP BY val
HAVING array_length(array_agg(fkey),1) > 1
This isn't a total loss, though. Using the array_length in the having can be useful if you want a distinct list of fkeys:
SELECT val, array_agg(DISTINCT fkey) fkeys
There may still be other ways, but this method is more descriptive, which may allow your SQL to be easier to understand when you come back to it, years from now.

oracle - getting 1 or 0 records based on the number of occurrences of a non-unique field

I have a table MYTABLE
N_REC | MYFIELD |
1 | foo |
2 | foo |
3 | bar |
where N_REC is the primary key and MYFIELD is a non-unique field.
I need to query this table on MYFIELD and extract the associated N_REC, but only if there is only one occurrence of MYFIELD; otherwise I need no records returned.
So if I go with MYFIELD='bar' I will get 3, if I go with MYFIELD='foo' I will get no records.
I went with the following query
select * from
(
select
n_rec,
( select count(*) from mytable where mycolumn=my.mycolumn ) as counter
from mytable my where mycolumn=?
)
where counter=1
While it gives me the desired result I feel like I'm running the same query twice.
Are there better ways to achieve what I'm doing?
I think that this should do what you want:
SELECT
my_field,
MAX(n_rec)
FROM
My_Table
GROUP BY
my_field
HAVING
COUNT(*) = 1
You might also try the analytic or windowing version of count(*) and compare plans to the other options:
select n_rec, my_field
from (select n_rec, my_field
, count(*) over (partition by my_field) as Counter
from myTable
where my_field = ?)
where Counter = 1