In SQL is there a difference between count(*) and count(<fieldname>) - sql

Pretty self explanatory question. Is there any reason to use one or the other?

Count(*) counts all records, including nulls, whereas Count(fieldname) does not include nulls.

Select count(*) selects any row, select count(field) selects rows where this field is not null.

If you want to improve performance (i.e. be a complete performance Nazi), you might want to do neither.
Example:
SELECT COUNT(1) FROM MyTable WHERE ...

This puzzled me for a while too.
In MySQL at least COUNT(*) counts the number of rows where every (*) value in the row is not null. Just COUNTing a column will count the number of rows where that column is not null.
In terms of performance using a single column would be slightly faster,

count(*) is faster if table type is MyISAM with no WHERE statement. With WHERE the speed will be the same for MyISAM and InnoDB.

Related

Most efficient way of getting mariaDB/SQL record count

A question on a simple SQL statement, but one which I sometimes wonder over. Thought I'd see if anyone knew the answer to.
When counting the records in a table using a simple SQL statement, which has the least overheard:
1) SELECT COUNT(single_primary_field) FROM table, i.e. SELECT COUNT(user_ID) FROM users;
2) SELECT COUNT(*) FROM table
I initially thought the first may be quickest. But perhaps not having a specific field to associate with makes the second quicker?
Probably makes very little difference speed wise either way.
Thanks
COUNT(column) counts only selected column and ignore the null values.
COUNT(*) count rows and don't care values in the columns.
Using COUNT(*) is a better way for counting rows.
Count(*) is most efficent way to count according to mysql:
Count
Have a read through https://mariadb.com/kb/en/library/explain/ to look at what type of indexing your query is using, it usually hints at its performance.
I think Count(*) is going to be the fastest because maria does store a running count.

What is the most efficient way to count rows in a table in SQLite?

I've always just used "SELECT COUNT(1) FROM X" but perhaps this is not the most efficient. Any thoughts? Other options include SELECT COUNT(*) or perhaps getting the last inserted id if it is auto-incremented (and never deleted).
How about if I just want to know if there is anything in the table at all? (e.g., count > 0?)
The best way is to make sure that you run SELECT COUNT on a single column (SELECT COUNT(*) is slower) - but SELECT COUNT will always be the fastest way to get a count of things (the database optimizes the query internally).
If you check out the comments below, you can see arguments for why SELECT COUNT(1) is probably your best option.
To follow up on girasquid's answer, as a data point, I have a sqlite table with 2.3 million rows. Using select count(*) from table, it took over 3 seconds to count the rows. I also tried using SELECT rowid FROM table, (thinking that rowid is a default primary indexed key) but that was no faster. Then I made an index on one of the fields in the database (just an arbitrary field, but I chose an integer field because I knew from past experience that indexes on short fields can be very fast, I think because the index is stored a copy of the value in the index itself). SELECT my_short_field FROM table brought the time down to less than a second.
If you are sure (really sure) that you've never deleted any row from that table and your table has not been defined with the WITHOUT ROWID optimization you can have the number of rows by calling:
select max(RowId) from table;
Or if your table is a circular queue you could use something like
select MaxRowId - MinRowId + 1 from
(select max(RowId) as MaxRowId from table) JOIN
(select min(RowId) as MinRowId from table);
This is really really fast (milliseconds), but you must pay attention because sqlite says that row id is unique among all rows in the same table. SQLite does not declare that the row ids are and will be always consecutive numbers.
The fastest way to get row counts is directly from the table metadata, if any. Unfortunately, I can't find a reference for this kind of data being available in SQLite.
Failing that, any query of the type
SELECT COUNT(non-NULL constant value) FROM table
should optimize to avoid the need for a table, or even an index, scan. Ideally the engine will simply return the current number of rows known to be in the table from internal metadata. Failing that, it simply needs to know the number of entries in the index of any non-NULL column (the primary key index being the first place to look).
As soon as you introduce a column into the SELECT COUNT you are asking the engine to perform at least an index scan and possibly a table scan, and that will be slower.
I do not believe you will find a special method for this. However, you could do your select count on the primary key to be a little bit faster.
sp_spaceused 'table_name' (exclude single quote)
this will return the number of rows in the above table, this is the most efficient way i have come across yet.
it's more efficient than select Count(1) from 'table_name' (exclude single quote)
sp_spaceused can be used for any table, it's very helpful when the table is exceptionally big (hundreds of millions of rows), returns number of rows right a way, whereas 'select Count(1)' might take more than 10 seconds. Moreover, it does not need any column names/key field to consider.

The faster of two SQL queries, sort and select top 1, or select MAX

Which one is faster of the following two queries?
1
SELECT TOP 1 order_date
FROM orders WITH (NOLOCK)
WHERE customer_id = 9999999
ORDER BY order_date DESC
2
SELECT MAX(order_date)
FROM orders WITH (NOLOCK)
WHERE customer_id = 9999999
With an index on order_date, they are of same performance.
Without an index, MAX is a little bit faster, since it will use Stream Aggregation rather than Top N Sort.
ORDER BY is almost always slowest. The table's data must be sorted.
Aggregate functions shouldn't slow things down as much as a sort.
However, some aggregate functions use a sort as part of their implementation. So a particular set of tables in a particular database product must be tested experimentally to see which is faster -- for that set of data.
I would say the first, because the second requires it to go through an aggregate function.
However, as marc_s said, test it.
You can't answer that without knowing at least something about your indexes (have you got one on customer_id, on order_date?), the amount of data in the table, the state of your statistics etc.etc.
it depends on how big your table is. On the second query, i guess there's no need for "TOP 1," since MAX only returns one value. If I were you, I would use the second query.

SQL Query: Which one should i use? count("columnname") or count(1)

In my SQL query I just need to check whether data exists for a particular userid.
I always only want one row that will be returned when data exist.
I have two options
1. select count(columnname) from table where userid=:userid
2. select count(1) from tablename where userid=:userid
I am thinking second one is the one I should use because it may have a better response time as compared with first one.
There can be differences between count(*) and count(column). count(*) is often fastest for reasons discussed here. Basically, with count(column) the database has to check if column is null or not in each row. With count(column) it just returns the total number of rows in the table which is probably has on hand. The exact details may depend on the database and the version of the database.
Short answer: use count(*) or count(1). Hell, forget the count and select userid.
You should also make sure the where clause is performing well and that its using an index. Look into EXPLAIN.
I'd like to point out that this:
select count(*) from tablename where userid=:userid
has the same effect as your second solution, with th advantage that count(*) it unambigously means "count all rows".
The * in COUNT(*) will not expand into all columns - that is to say, the * in SELECT COUNT(*) is not the same as in SELECT *. So you need not worry about performance when writing COUNT(*)
The disadvantage of writing COUNT(1) is that it is less clear: what did you mean? A literal one (1) may look like a lower case L (this: l) in some fonts.
Will give different results if columnname can be NULL, otherwise identical performance.
The optimiser (SQL Server at least) realises COUNT(1) is trivial. You can also use COUNT(1/0)
It depends what you want to do.
The first one counts rows with non-null values of columnname. The second one counts ALL rows.
Which behaviour do you want? From the way your question is worded, I guess that you want the second one.
To count the number of records you should use the second option, or rather:
select count(*) from tablename where userid=:userid
You could also use the exists() function:
select case when exists(select * from tablename where userid=:userid) then 1 else 0 end
It might be possible for the database to do the latter more efficiently in some cases, as it can stop looking as soon as a match is found instead of comparing all records.
Hey how about Select count(userid) from tablename where userid=:userid ? That way the query looks more friendly.

In SQL, what’s the difference between count(*) and count('x')? [duplicate]

This question already has answers here:
In SQL, what's the difference between count(column) and count(*)?
(12 answers)
Closed 9 years ago.
I have the following code:
SELECT <column>, count(*)
FROM <table>
GROUP BY <column> HAVING COUNT(*) > 1;
Is there any difference to the results or performance if I replace the COUNT(*) with COUNT('x')?
(This question is related to a previous one)
To say that SELECT COUNT(*) vs COUNT(1) results in your DBMS returning "columns" is pure bunk. That may have been the case long, long ago but any self-respecting query optimizer will choose some fast method to count the rows in the table - there is NO performance difference between SELECT COUNT(*), COUNT(1), COUNT('this is a silly conversation')
Moreover, SELECT(1) vs SELECT(*) will NOT have any difference in INDEX usage -- most DBMS will actually optimize SELECT( n ) into SELECT(*) anyway. See the ASK TOM: Oracle has been optimizing SELECT(n) into SELECT(*) for the better part of a decade, if not longer:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1156151916789
problem is in count(col) to count()
conversion
**03/23/00 05:46 pm *** one workaround is to set event 10122 to
turn off count(col) ->count()
optimization. Another work around is
to change the count(col) to count(),
it means the same, when the col has a
NOT NULL constraint. The bug number is
1215372.
One thing to note - if you are using COUNT(col) (don't!) and col is marked NULL, then it will actually have to count the number of occurrences in the table (either via index scan, histogram, etc. if they exist, or a full table scan otherwise).
Bottom line: if what you want is the count of rows in a table, use COUNT(*)
The major performance difference is that COUNT(*) can be satisfied by examining the primary key on the table.
i.e. in the simple case below, the query will return immediately, without needing to examine any rows.
select count(*) from table
I'm not sure if the query optimizer in SQL Server will do so, but in the example above, if the column you are grouping on has an index the server should be able to satisfy the query without hitting the actual table at all.
To clarify: this answer refers specifically to SQL Server. I don't know how other DBMS products handle this.
This question is slightly different that the other referenced. In the referenced question, it was asked what the difference was when using count(*) and count(SomeColumnName), and SQLMenace's answer was spot on.
To address this question, essentially there is no difference in the result. Both count(*) and count('x') and say count(1) will return the same number. The difference is that when using " * " just like in a SELECT all columns are returned, then counted. When a constant is used (e.g. 'x' or 1) then a row with one column is returned and then counted. The performance difference would be seen when " * " returns many columns.
Update: The above statement about performance is probably not quite right as discussed in other answers, but does apply to subselect queries when using EXISTS and NOT EXISTS
MySQL: According to the MySQL website, COUNT(*) is faster for single table queries when using MyISAM:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_count
I'm guessing with a having clause with a count in it may change things.