Most efficient way of getting mariaDB/SQL record count - sql

A question on a simple SQL statement, but one which I sometimes wonder over. Thought I'd see if anyone knew the answer to.
When counting the records in a table using a simple SQL statement, which has the least overheard:
1) SELECT COUNT(single_primary_field) FROM table, i.e. SELECT COUNT(user_ID) FROM users;
2) SELECT COUNT(*) FROM table
I initially thought the first may be quickest. But perhaps not having a specific field to associate with makes the second quicker?
Probably makes very little difference speed wise either way.
Thanks

COUNT(column) counts only selected column and ignore the null values.
COUNT(*) count rows and don't care values in the columns.
Using COUNT(*) is a better way for counting rows.

Count(*) is most efficent way to count according to mysql:
Count

Have a read through https://mariadb.com/kb/en/library/explain/ to look at what type of indexing your query is using, it usually hints at its performance.
I think Count(*) is going to be the fastest because maria does store a running count.

Related

using SQL COUNT function or executing search query directly which is more efficient

Let's say i have a very big database , if i execute a search query directly then count the returned rows would it be more faster ? Or using COUNT(searchquery) then start executing query like ->
SELECT *
FROM TABLE
WHERE bla='blabla'
OFFSET 0 FETCH NEXT 20 ROWS ONLY
I searched for it but i couldn't find any solution.
Do the count in the database! It will be much faster.
First, a count(*) only returns one row and one value. That is much, much less data -- and much faster -- than returning all the rows.
Second, a count(*) does not reference any columns in the select, so the query can be better optimized. It might be possible to get the count without ever looking at the data pages.
It looks like you are doing paging. You need the total count to do display the total count and calculate the total number of pages to the user, yes?
Than Gordon's answer is the one to use.

SQL Query: Which one should i use? count("columnname") or count(1)

In my SQL query I just need to check whether data exists for a particular userid.
I always only want one row that will be returned when data exist.
I have two options
1. select count(columnname) from table where userid=:userid
2. select count(1) from tablename where userid=:userid
I am thinking second one is the one I should use because it may have a better response time as compared with first one.
There can be differences between count(*) and count(column). count(*) is often fastest for reasons discussed here. Basically, with count(column) the database has to check if column is null or not in each row. With count(column) it just returns the total number of rows in the table which is probably has on hand. The exact details may depend on the database and the version of the database.
Short answer: use count(*) or count(1). Hell, forget the count and select userid.
You should also make sure the where clause is performing well and that its using an index. Look into EXPLAIN.
I'd like to point out that this:
select count(*) from tablename where userid=:userid
has the same effect as your second solution, with th advantage that count(*) it unambigously means "count all rows".
The * in COUNT(*) will not expand into all columns - that is to say, the * in SELECT COUNT(*) is not the same as in SELECT *. So you need not worry about performance when writing COUNT(*)
The disadvantage of writing COUNT(1) is that it is less clear: what did you mean? A literal one (1) may look like a lower case L (this: l) in some fonts.
Will give different results if columnname can be NULL, otherwise identical performance.
The optimiser (SQL Server at least) realises COUNT(1) is trivial. You can also use COUNT(1/0)
It depends what you want to do.
The first one counts rows with non-null values of columnname. The second one counts ALL rows.
Which behaviour do you want? From the way your question is worded, I guess that you want the second one.
To count the number of records you should use the second option, or rather:
select count(*) from tablename where userid=:userid
You could also use the exists() function:
select case when exists(select * from tablename where userid=:userid) then 1 else 0 end
It might be possible for the database to do the latter more efficiently in some cases, as it can stop looking as soon as a match is found instead of comparing all records.
Hey how about Select count(userid) from tablename where userid=:userid ? That way the query looks more friendly.

Multiple SQL searches vs searching through one returned array

Is it faster to do multiple SQL finds on one table with different conditions or to get all the items from the table in an array and then separate them out that way? I realize I'm not explaining my question that well, so here is an example:
I want to pull records on posts and display them in categories based on when they were posted, say within one year, within one month, one week, etc. The nature of the categories results in lower level categories being entirely contained within upper level ones.
Should I do a SQL find with different conditions for each category, resulting in multiple calls to the database, or should I do one search, returning all of the items and then sort them out from the array? Thanks for your responses, sorry I'm new at this.
Typically I would say that you are going to get better performance by letting your database engine do the sorting work. Each database engine has this functionality and typically it can do it faster than you can.
So I would vote to use the database to get your multiple groups rather than trying to do it yourself in memory.
I typically perform one large sql query and then break the array up in ruby to minimize the number or duration of database connections.
This isn't necessarily any faster, and I have never benchmarked it, but less reads to the db hopefully means it will scale longer.
Edit: Nevermind, I didn't quite understand the question. Just let SQL perform the ordering for you in a convenient fashion and then process the array yourself.
You can probably make it even easier if you let your SELECT statement generate helper columns to say which categories (e.g., based on the date) a record belongs to.
The simplest, and easiest to understand would be to perform multiple queries for each criteria, and then form each of those result sets into a group. I don't think you want to start traversing result sets and duplicating rows.
If you really want to do it in one query, you could try a UNION query.
SELECT *, 1 as group from posts WHERE date > '2009-07-24 00:00:00' ORDER BY date DESC
UNION ALL
SELECT *, 2 as group from posts WHERE date > '2009-07-17 00:00:00' ORDER BY date DESC
UNION ALL
SELECT *, 3 as group from posts WHERE date > '2009-06-24 00:00:00' ORDER BY date DESC
UNION ALL
SELECT *, 4 as group from posts WHERE date > '2008-07-24 00:00:00' ORDER BY date DESC
After that you just need to traverse the list once, and filter into new lists by the "group" column.
It depends. If you're using OR operators in your procedures, then things could get kind of slow. It would be better at that point to use multiple SQL statements.
But really, you need to analyze the query plans and decide for yourself if it is efficient enough or not. Run real world examples and TEST TEST TEST.

In SQL is there a difference between count(*) and count(<fieldname>)

Pretty self explanatory question. Is there any reason to use one or the other?
Count(*) counts all records, including nulls, whereas Count(fieldname) does not include nulls.
Select count(*) selects any row, select count(field) selects rows where this field is not null.
If you want to improve performance (i.e. be a complete performance Nazi), you might want to do neither.
Example:
SELECT COUNT(1) FROM MyTable WHERE ...
This puzzled me for a while too.
In MySQL at least COUNT(*) counts the number of rows where every (*) value in the row is not null. Just COUNTing a column will count the number of rows where that column is not null.
In terms of performance using a single column would be slightly faster,
count(*) is faster if table type is MyISAM with no WHERE statement. With WHERE the speed will be the same for MyISAM and InnoDB.

What's the most efficient way to check the presence of a row in a table?

Say I want to check if a record in a MySQL table exists. I'd run a query, check the number of rows returned. If 0 rows do this, otherwise do that.
SELECT * FROM table WHERE id=5
SELECT id FROM table WHERE id=5
Is there any difference at all between these two queries? Is effort spent in returning every column, or is effort spent in filtering out the columns we don't care about?
SELECT COUNT(*) FROM table WHERE id=5
Is a whole new question. Would the server grab all the values and then count the values (harder than usual), or would it not bother grabbing anything and just increment a variable each time it finds a match (easier than usual)?
I think I'm making a lot of false assumptions about how MySQL works, but that's the meat of the question! Where am I wrong? Educate me, Stack Overflow!
Optimizers are pretty smart (generally). They typically only grab what they need so I'd go with:
SELECT COUNT(1) FROM mytable WHERE id = 5
The most explicit way would be
SELECT WHEN EXISTS (SELECT 1 FROM table WHERE id = 5) THEN 1 ELSE 0 END
If there is an index on (or starting with) id, it will only search, with maximum efficiency, for the first entry in the index it can find with that value. It won't read the record.
If you SELECT COUNT(*) (or COUNT anything else) it will, under the same circumstances, count the index entries, but not read the records.
If you SELECT *, it will read all the records.
Limit your results to at most one row by appending LIMIT 1, if all you want to do is check the presence of a record.
SELECT id FROM table WHERE id=5 LIMIT 1
This will definitely ensure that no more than one row is returned or processed. In my experience, LIMIT 1 (or TOP 1 depending in the DB) to check for existence of a row makes a big difference in terms of performance for large tables.
EDIT: I think I misread your question, but I'll leave my answer here anyway if it's of any help.
I would think this
SELECT null FROM table WHERE id = 5 LIMIT 1;
would be faster than this
SELECT 1 FROM table WHERE id = 5 LIMIT 1;
but the timer says the winner is "SELECT 1".
For the first two queries, most people will generally say, always specify exactly what you need and leave the rest. Effort isn't all specific as bandwidth could be spent in returning data that you aren't even going to do anything with.
As for the previous answer will do for your result set, unless you're dealing with a language that supports affected rows. This can sometimes work when getting data to collect information on how many rows were returned in the last query. You'll need to look at your interface documentation as to how to get that information.
The difference between your 3 queries depends on how you've built your index. Only returning the primary key is likely to be faster as MySQL will have your index in memory, and not have to hit disk. Adding the LIMIT 1 is also a good trick that will speed up the optimizer significantly in early 5.0.x branches and earlier.
try EXPLAIN SELECT id FROM table WHERE id=5 and check the Extras column for the presence of USING INDEX. If its there, then you're query is coming straight from the index, and is going to be much faster.