SQL Query: Which one should i use? count("columnname") or count(1) - sql

In my SQL query I just need to check whether data exists for a particular userid.
I always only want one row that will be returned when data exist.
I have two options
1. select count(columnname) from table where userid=:userid
2. select count(1) from tablename where userid=:userid
I am thinking second one is the one I should use because it may have a better response time as compared with first one.

There can be differences between count(*) and count(column). count(*) is often fastest for reasons discussed here. Basically, with count(column) the database has to check if column is null or not in each row. With count(column) it just returns the total number of rows in the table which is probably has on hand. The exact details may depend on the database and the version of the database.
Short answer: use count(*) or count(1). Hell, forget the count and select userid.
You should also make sure the where clause is performing well and that its using an index. Look into EXPLAIN.

I'd like to point out that this:
select count(*) from tablename where userid=:userid
has the same effect as your second solution, with th advantage that count(*) it unambigously means "count all rows".
The * in COUNT(*) will not expand into all columns - that is to say, the * in SELECT COUNT(*) is not the same as in SELECT *. So you need not worry about performance when writing COUNT(*)
The disadvantage of writing COUNT(1) is that it is less clear: what did you mean? A literal one (1) may look like a lower case L (this: l) in some fonts.

Will give different results if columnname can be NULL, otherwise identical performance.
The optimiser (SQL Server at least) realises COUNT(1) is trivial. You can also use COUNT(1/0)

It depends what you want to do.
The first one counts rows with non-null values of columnname. The second one counts ALL rows.
Which behaviour do you want? From the way your question is worded, I guess that you want the second one.

To count the number of records you should use the second option, or rather:
select count(*) from tablename where userid=:userid
You could also use the exists() function:
select case when exists(select * from tablename where userid=:userid) then 1 else 0 end
It might be possible for the database to do the latter more efficiently in some cases, as it can stop looking as soon as a match is found instead of comparing all records.

Hey how about Select count(userid) from tablename where userid=:userid ? That way the query looks more friendly.

Related

Postgres all subqueries in coalesce executed

COALESCE in Postgres is a function that returns the first parameter not null.
So I used coalesce in subqueries like:
SELECT COALESCE (
( SELECT * FROM users WHERE... ORDER BY ...),
( SELECT * FROM users WHERE... ORDER BY ...),
( SELECT * FROM users WHERE... ORDER BY ...),
( SELECT * FROM users WHERE... ORDER BY ...)
);
I change the where in any query and they contain lots of params and CASE, also different ORDER BY clauses.
This is because I always want to return something but giving priorities.
What I noticed while issuing EXPLAIN ANALYZE is that any query is executed despite the first one actually returns NOT a null value.
I would expect the engine to run only the first one query and not the following ones if it returns not null.
This way I could have a bad performance.
So am I doing any bad practice and is it better to run the queries separately for performance reason?
EDIT:
Sorry you where right I don’t select * but I select only one column. I didn’t post my code because I am not interested in my query but it’s a generic question to understand how the engine is working. So I reproduce a very simple fiddle here http://sqlfiddle.com/#!17/a8aa7/4
I may be wrong but I think it behaves as I was telling: it runs all the subqueries despite the first one already returns a not null value
EDIT 2: ok I read only now it says never executed. So the other two queries aren’t getting executed. What confused me was the fact they were included in the query plan.
Anyways it’s still important for my question. Is it better to run all the queries separately for performance reasons? Because it seems like that even if the first one returns a not null value the other two subqueries can slow down the performance
For separate SELECT queries, I suggest to use UNION ALL / LIMIT 1 instead. Based on your fiddle:
(select user_id from users order by age limit 1) -- misleading example, see below
UNION ALL
(select user_id from users where user_id=1)
UNION ALL
(select user_id from users order by user_id DESC limit 1)
LIMIT 1;
db<>fiddle here
For three reasons:
Works for any SELECT list: single expressions (your fiddle), multiple or whole row (your example in the question).
You can distinguish actual NULL values from "no row". Since user_id is the PK in the example (and hence, NOT NULL), the problem cannot surface in the example. But with an expression that can be NULL, COALESCE cannot distinguish between both, "no row" is coerced to NULL for the purpose of the query. See:
Return a value if no record is found
Faster.
Aside, your first SELECT in the example makes this a wild-goose chase. It returns a row if there is at least one. The rest is noise in this case.
Related:
PostgreSQL combine multiple select statements
SQL - does order of OR conditions matter?
Way to try multiple SELECTs till a result is available?

Most efficient way of getting mariaDB/SQL record count

A question on a simple SQL statement, but one which I sometimes wonder over. Thought I'd see if anyone knew the answer to.
When counting the records in a table using a simple SQL statement, which has the least overheard:
1) SELECT COUNT(single_primary_field) FROM table, i.e. SELECT COUNT(user_ID) FROM users;
2) SELECT COUNT(*) FROM table
I initially thought the first may be quickest. But perhaps not having a specific field to associate with makes the second quicker?
Probably makes very little difference speed wise either way.
Thanks
COUNT(column) counts only selected column and ignore the null values.
COUNT(*) count rows and don't care values in the columns.
Using COUNT(*) is a better way for counting rows.
Count(*) is most efficent way to count according to mysql:
Count
Have a read through https://mariadb.com/kb/en/library/explain/ to look at what type of indexing your query is using, it usually hints at its performance.
I think Count(*) is going to be the fastest because maria does store a running count.

Getting additional info on the result of a SQL max query

Say I want to do this with SQL (Sybase): Find all fields of the record with the latest timestamp.
One way to write that is like this:
select * from data where timestamp = (select max(timestamp) from data)
This is a bit silly because it causes two queries - first to find the max timestamp, and then to find all the data for that timestamp (assume it's unique, and yes - i do have an index on timestamp). More so it just seems unnecessary because max() has already found the row that I am interested in so looking for it again is wasteful.
Is there a way to directly access fields of the row that max() returns?
Edit: All answers I see are basically clever hacks - I was looking for a syntactic way of doing something like max(field1).field2 to access field2 of the row with max field1
SELECT TOP 1 * from data ORDER BY timestamp DESC
No, using an aggregate means that you are automatically grouping, so there isn't a single row to get data from even if the group happens to contain a single row.
You can order by the field and get the first row:
set rowcount 1
select * from data order by timestamp desc
(Note that you shouldn't use select *, but rather specify the fields that you want from the query. That makes the query less sensetive to changes in the database layout.)
Can you try this
SELECT TOP 1 *
FROm data
ORDER BY timestamp DESC
You're making assumptions about how Sybase optimizes queries. For all you know, it may do precisely what you want it to do - it may notice both queries are from "data" and that the condition is "where =", and may optimize as you suggest.
I know in the case of SQL Server, it's possible to configure indexes to include fields from the indexed row. Doing a select through such an index leaves those fields available.
This is SQL server, but you'll get the idea.
SELECT TOP(1) * FROM data
ORDER BY timestamp DESC;

In SQL is there a difference between count(*) and count(<fieldname>)

Pretty self explanatory question. Is there any reason to use one or the other?
Count(*) counts all records, including nulls, whereas Count(fieldname) does not include nulls.
Select count(*) selects any row, select count(field) selects rows where this field is not null.
If you want to improve performance (i.e. be a complete performance Nazi), you might want to do neither.
Example:
SELECT COUNT(1) FROM MyTable WHERE ...
This puzzled me for a while too.
In MySQL at least COUNT(*) counts the number of rows where every (*) value in the row is not null. Just COUNTing a column will count the number of rows where that column is not null.
In terms of performance using a single column would be slightly faster,
count(*) is faster if table type is MyISAM with no WHERE statement. With WHERE the speed will be the same for MyISAM and InnoDB.

In SQL, what’s the difference between count(*) and count('x')? [duplicate]

This question already has answers here:
In SQL, what's the difference between count(column) and count(*)?
(12 answers)
Closed 9 years ago.
I have the following code:
SELECT <column>, count(*)
FROM <table>
GROUP BY <column> HAVING COUNT(*) > 1;
Is there any difference to the results or performance if I replace the COUNT(*) with COUNT('x')?
(This question is related to a previous one)
To say that SELECT COUNT(*) vs COUNT(1) results in your DBMS returning "columns" is pure bunk. That may have been the case long, long ago but any self-respecting query optimizer will choose some fast method to count the rows in the table - there is NO performance difference between SELECT COUNT(*), COUNT(1), COUNT('this is a silly conversation')
Moreover, SELECT(1) vs SELECT(*) will NOT have any difference in INDEX usage -- most DBMS will actually optimize SELECT( n ) into SELECT(*) anyway. See the ASK TOM: Oracle has been optimizing SELECT(n) into SELECT(*) for the better part of a decade, if not longer:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1156151916789
problem is in count(col) to count()
conversion
**03/23/00 05:46 pm *** one workaround is to set event 10122 to
turn off count(col) ->count()
optimization. Another work around is
to change the count(col) to count(),
it means the same, when the col has a
NOT NULL constraint. The bug number is
1215372.
One thing to note - if you are using COUNT(col) (don't!) and col is marked NULL, then it will actually have to count the number of occurrences in the table (either via index scan, histogram, etc. if they exist, or a full table scan otherwise).
Bottom line: if what you want is the count of rows in a table, use COUNT(*)
The major performance difference is that COUNT(*) can be satisfied by examining the primary key on the table.
i.e. in the simple case below, the query will return immediately, without needing to examine any rows.
select count(*) from table
I'm not sure if the query optimizer in SQL Server will do so, but in the example above, if the column you are grouping on has an index the server should be able to satisfy the query without hitting the actual table at all.
To clarify: this answer refers specifically to SQL Server. I don't know how other DBMS products handle this.
This question is slightly different that the other referenced. In the referenced question, it was asked what the difference was when using count(*) and count(SomeColumnName), and SQLMenace's answer was spot on.
To address this question, essentially there is no difference in the result. Both count(*) and count('x') and say count(1) will return the same number. The difference is that when using " * " just like in a SELECT all columns are returned, then counted. When a constant is used (e.g. 'x' or 1) then a row with one column is returned and then counted. The performance difference would be seen when " * " returns many columns.
Update: The above statement about performance is probably not quite right as discussed in other answers, but does apply to subselect queries when using EXISTS and NOT EXISTS
MySQL: According to the MySQL website, COUNT(*) is faster for single table queries when using MyISAM:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_count
I'm guessing with a having clause with a count in it may change things.