Is there any difference between COUNT('') and COUNT(*) and COUNT(1) and COUNT(ColumnName)? What approach is faster?
Count(ColumnName) is influenced by the value of the column. The other variants do effectively the same.
Count(*) is slower in some databases (MySQL amongst others), because it retrieves all fields while it doesn't have to. That's whay often 'x' or 1 is used to be safe. SQL Server and Oracle are somewhat smarter and don't retrieve field values if they don't have to.
Note that '' equals NULL on Oracle (yes it does!), which may have an undesired effect there. Not a problem for SQL Server, but you can use 1 to be safe.
COUNT(''), COUNT(1) and COUNT(*) will return the same result. COUNT(ColumnName) might return a different value, because COUNT only counts non-null values.
Performance-wise they should be equivalent, at least on SQL-Server.
Related
Is there any difference between count(*) and count(column) in dolphindb?
What's the difference in their performance?
Although this is not a duplicate, the difference is explained in count(*) versus count(column) in dolphindb
count(*) will count ALL rows, but count(column) will only count the records that have a non-null value in the specified column.
In terms of performance, count(*) should perform better, because it does not need to evaluate the values in each row, but ultimately the SQL count() function expression will execute the rowCount() function and there is no mention of special handling for count(*) or that the syntax is even supported.
In this instance if you have a large enough table you should be able to observe a difference and prove this for yourself. If you run two variants (reference a unique column that does not have any null values for the column name version) The count(*) should be faster
In terms of general SQL Count() there is a good explanation here
There is the perception that in some older SQL databases there were performance gains if you specifyied an arbitrary value instead of *, as in Count(1), true or not, modern RDBMS implementations will not try to evaluate that all columns are not null, they will evaluates count(*) as a special case to mean "count all rows."
This behaviour between 1 and * has less relevance in DolphinDb because the SQL command syntax is only a wrapper to the internal rowCount() function that will accept one or multiple vectors, tuples of vectors, matrices or tables as arguments.
COALESCE in Postgres is a function that returns the first parameter not null.
So I used coalesce in subqueries like:
SELECT COALESCE (
( SELECT * FROM users WHERE... ORDER BY ...),
( SELECT * FROM users WHERE... ORDER BY ...),
( SELECT * FROM users WHERE... ORDER BY ...),
( SELECT * FROM users WHERE... ORDER BY ...)
);
I change the where in any query and they contain lots of params and CASE, also different ORDER BY clauses.
This is because I always want to return something but giving priorities.
What I noticed while issuing EXPLAIN ANALYZE is that any query is executed despite the first one actually returns NOT a null value.
I would expect the engine to run only the first one query and not the following ones if it returns not null.
This way I could have a bad performance.
So am I doing any bad practice and is it better to run the queries separately for performance reason?
EDIT:
Sorry you where right I don’t select * but I select only one column. I didn’t post my code because I am not interested in my query but it’s a generic question to understand how the engine is working. So I reproduce a very simple fiddle here http://sqlfiddle.com/#!17/a8aa7/4
I may be wrong but I think it behaves as I was telling: it runs all the subqueries despite the first one already returns a not null value
EDIT 2: ok I read only now it says never executed. So the other two queries aren’t getting executed. What confused me was the fact they were included in the query plan.
Anyways it’s still important for my question. Is it better to run all the queries separately for performance reasons? Because it seems like that even if the first one returns a not null value the other two subqueries can slow down the performance
For separate SELECT queries, I suggest to use UNION ALL / LIMIT 1 instead. Based on your fiddle:
(select user_id from users order by age limit 1) -- misleading example, see below
UNION ALL
(select user_id from users where user_id=1)
UNION ALL
(select user_id from users order by user_id DESC limit 1)
LIMIT 1;
db<>fiddle here
For three reasons:
Works for any SELECT list: single expressions (your fiddle), multiple or whole row (your example in the question).
You can distinguish actual NULL values from "no row". Since user_id is the PK in the example (and hence, NOT NULL), the problem cannot surface in the example. But with an expression that can be NULL, COALESCE cannot distinguish between both, "no row" is coerced to NULL for the purpose of the query. See:
Return a value if no record is found
Faster.
Aside, your first SELECT in the example makes this a wild-goose chase. It returns a row if there is at least one. The rest is noise in this case.
Related:
PostgreSQL combine multiple select statements
SQL - does order of OR conditions matter?
Way to try multiple SELECTs till a result is available?
I need to write a query to fetch some data from table and need to use concat method. I end up preparing two different queries and I'm not sure which one will have better performance when we have huge amounts of data.
Please help me understand which query will perform better, and why.
I need to concat twice, one for displaying and one for where condition:
select
id, concat(boolean_value, long_value, string_value, double_value) as Value
from
table
where
concat(boolean_value, long_value, string_value, double_value) = 'XX'
Write the above query as a subquery and add the where condition
Select *
from
(select
id, concat(boolean_value, long_value, string_value, double_value) as Value
from
table) as output
where
Value = 'XX'
Note: this is sample query and the actual query will have multiple joins and the need to concat from multiple columns of different tables
SQL databases represent the result set being produced. You seem to be asking if there is common expression elimination -- that is, will the concat() be performed exactly once.
You have a better chance of this with the subquery than with the version that repeat the expression. But SQL Server could be smart enough to only evaluate it once.
If you want to guarantee single evaluation, then I think cross apply does that:
select t.id, v.value
from table t cross apply
(values (concat( boolean_value, long_value, string_value, double_value))
) as Value
where v.value = 'XX';
I would, however, question why you are comparing the result of the concat() rather than the base columns. Comparing the base columns would allow the optimizer to take advantage of indexes, partitions, and statistics.
You may run EXPLAIN on both queries to see what exactly is on the mind of SQL Server. I would expect that the first version would be preferable, because, unlike the second version, it does not force SQL Server to materialize an intermediate table. Regarding whether or not SQL Server would have to actually evaluate CONCAT twice in the first version, it may not even matter if the cost of the subquery be very high.
In my SQL query I just need to check whether data exists for a particular userid.
I always only want one row that will be returned when data exist.
I have two options
1. select count(columnname) from table where userid=:userid
2. select count(1) from tablename where userid=:userid
I am thinking second one is the one I should use because it may have a better response time as compared with first one.
There can be differences between count(*) and count(column). count(*) is often fastest for reasons discussed here. Basically, with count(column) the database has to check if column is null or not in each row. With count(column) it just returns the total number of rows in the table which is probably has on hand. The exact details may depend on the database and the version of the database.
Short answer: use count(*) or count(1). Hell, forget the count and select userid.
You should also make sure the where clause is performing well and that its using an index. Look into EXPLAIN.
I'd like to point out that this:
select count(*) from tablename where userid=:userid
has the same effect as your second solution, with th advantage that count(*) it unambigously means "count all rows".
The * in COUNT(*) will not expand into all columns - that is to say, the * in SELECT COUNT(*) is not the same as in SELECT *. So you need not worry about performance when writing COUNT(*)
The disadvantage of writing COUNT(1) is that it is less clear: what did you mean? A literal one (1) may look like a lower case L (this: l) in some fonts.
Will give different results if columnname can be NULL, otherwise identical performance.
The optimiser (SQL Server at least) realises COUNT(1) is trivial. You can also use COUNT(1/0)
It depends what you want to do.
The first one counts rows with non-null values of columnname. The second one counts ALL rows.
Which behaviour do you want? From the way your question is worded, I guess that you want the second one.
To count the number of records you should use the second option, or rather:
select count(*) from tablename where userid=:userid
You could also use the exists() function:
select case when exists(select * from tablename where userid=:userid) then 1 else 0 end
It might be possible for the database to do the latter more efficiently in some cases, as it can stop looking as soon as a match is found instead of comparing all records.
Hey how about Select count(userid) from tablename where userid=:userid ? That way the query looks more friendly.
This question already has answers here:
In SQL, what's the difference between count(column) and count(*)?
(12 answers)
Closed 9 years ago.
I have the following code:
SELECT <column>, count(*)
FROM <table>
GROUP BY <column> HAVING COUNT(*) > 1;
Is there any difference to the results or performance if I replace the COUNT(*) with COUNT('x')?
(This question is related to a previous one)
To say that SELECT COUNT(*) vs COUNT(1) results in your DBMS returning "columns" is pure bunk. That may have been the case long, long ago but any self-respecting query optimizer will choose some fast method to count the rows in the table - there is NO performance difference between SELECT COUNT(*), COUNT(1), COUNT('this is a silly conversation')
Moreover, SELECT(1) vs SELECT(*) will NOT have any difference in INDEX usage -- most DBMS will actually optimize SELECT( n ) into SELECT(*) anyway. See the ASK TOM: Oracle has been optimizing SELECT(n) into SELECT(*) for the better part of a decade, if not longer:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1156151916789
problem is in count(col) to count()
conversion
**03/23/00 05:46 pm *** one workaround is to set event 10122 to
turn off count(col) ->count()
optimization. Another work around is
to change the count(col) to count(),
it means the same, when the col has a
NOT NULL constraint. The bug number is
1215372.
One thing to note - if you are using COUNT(col) (don't!) and col is marked NULL, then it will actually have to count the number of occurrences in the table (either via index scan, histogram, etc. if they exist, or a full table scan otherwise).
Bottom line: if what you want is the count of rows in a table, use COUNT(*)
The major performance difference is that COUNT(*) can be satisfied by examining the primary key on the table.
i.e. in the simple case below, the query will return immediately, without needing to examine any rows.
select count(*) from table
I'm not sure if the query optimizer in SQL Server will do so, but in the example above, if the column you are grouping on has an index the server should be able to satisfy the query without hitting the actual table at all.
To clarify: this answer refers specifically to SQL Server. I don't know how other DBMS products handle this.
This question is slightly different that the other referenced. In the referenced question, it was asked what the difference was when using count(*) and count(SomeColumnName), and SQLMenace's answer was spot on.
To address this question, essentially there is no difference in the result. Both count(*) and count('x') and say count(1) will return the same number. The difference is that when using " * " just like in a SELECT all columns are returned, then counted. When a constant is used (e.g. 'x' or 1) then a row with one column is returned and then counted. The performance difference would be seen when " * " returns many columns.
Update: The above statement about performance is probably not quite right as discussed in other answers, but does apply to subselect queries when using EXISTS and NOT EXISTS
MySQL: According to the MySQL website, COUNT(*) is faster for single table queries when using MyISAM:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_count
I'm guessing with a having clause with a count in it may change things.