SQL Best Practice: count(1) or count(*) [duplicate] - sql

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Count(*) vs Count(1)
I remember anecdotally being told:
never use count(*) when count(1) will do
Recently I passed this advice on to another developer, and was challenged to prove this was true. My argument was what I was told along with when I was given the advice: that the database would only return the first column, which would then be counted. The counterargument was that the database wouldn't evaluate anything in the brackets.
From some (unscientific) testing on small tables, there certainly seems to be no difference. I don't currently have access to any large tables to experiment on.
I was given this advice when I was using Sybase, and tables had hundreds of millions of rows. I'm now working with Oracle and considerably less data.
So I guess in summary, my two questions are:
Which is faster, count(1) or count(*)?
Would this vary in different database vendors?

According to another similar question (Count(*) vs Count(1)), they are the same.
In Oracle, according to Ask Tom, count(*) is the correct way to count the number of rows because the optimizer changes count(1) to count(*). count(1) actually means to count rows with non-null 1's (all of them are non-null so optimizer will change it for you).

See
What is better in MYSQL count(*) or count(1)?
for MYSQL (no difference between count(*) and count(1))
Count(*) vs Count(1)
http://beyondrelational.com/blogs/dave_ballantyne/archive/2010/07/27/count-or-count-1.aspx
for MS Sql Server (no difference)
http://dbaspot.com/sybase/349079-count-vs-count-1-a.html
for Sybase (no difference)

In reading books specifically on TSQL and Microsoft SQL Server, I have read that using * is better because it lets the optimizer decide what is best to do. I'll try to find the names of the specific books and post those here.

This is such a basic query pattern, and the meaning is identical. I've read more than once that the optimizer treats them identically - can't find a specific reference right now but put this in the category of "institutional knowledge".
(should have searched first...http://stackoverflow.com/questions/1221559/count-vs-count1)

I can only speak to SQL Server, but testing on a 5 GB table, 11 mm records - both the number of reads and execution plan were identical.
I'd say there is no difference.

As far as I know using count() should be faster because when that function is called the engine counts only indexes. From another point of view probably both count() and count(1) in binary code look very similar, so there should be no difference.

count(1)
No, generally speaking this will always have slightly better performance.
It would only affect if upscaled to a drastic amount but it is good practice.

Related

Is COUNT(1) or COUNT(*) better for PostgreSQL

I've seen answers to this question for other databases (MySQL, SQL Server, etc.) but not for PostgreSQL. So, is COUNT(1) or COUNT(*) faster/better for selecting the row count of a table?
Benchmarking the difference
The last time I've benchmarked the difference between COUNT(*) and COUNT(1) for PostgreSQL 11.3, I've found that COUNT(*) was about 10% faster. The explanation by Vik Fearing at the time has been that the constant expression 1 (or at least its nullability) is being evaluated for the entire count loop. I haven't checked whether this has been fixed in PostgreSQL 14.
Don't worry about this in real world queries
However, you shouldn't worry about such a performance difference. The difference of 10% was measurable in a benchmark, but I doubt you can consistently measure such a difference in an ordinary query. Also, ideally, all SQL vendors optimise the two things in the same way, given that 1 is a constant expression, and thus can be eliminated. As mentioned in the above article, I couldn't find any difference in any other RDBMS that I've tested (MySQL, Oracle, SQL Server), and I wouldn't expect there to be any difference.

How can I solve a performance issue in sql query?

All developers know that "IN" and DISTINCT create issue for all sql query. My colleague created below query but now He is not working at my employed company.PLease take a look below code. How can I tune up my query for high performance?
SELECT xxx
, COUNT(DISTINCT Id) Count
FROM Test (NOLOCK)
WHERE IsDeleted = 0
AND xxx IN
(
SELECT CAST(value AS INT)
FROM STRING_SPLIT(#ProductIds, ',')
)
GROUP BY xxx
All developers know that "IN" and DISTINCT create issue for all sql query.
This is not necessarily true. They do hurt performance, but sometimes they are necessary.
The IN is probably not a big deal. It gets evaluated once. If you have another way to pass in a list -- say using a temporary table -- that is better.
The COUNT(DISTINCT id) is suspicious. I would expect id to already be unique. If so, then just use COUNT(*).
The WITH (NOLOCK) is not recommended unless you really know what you are doing. Working with data that might be inconsistent is dangerous.
I have used used Sentry One Plan Explorer to help find the tuning points of queries I am having performance issues with:
https://www.sentryone.com/plan-explorer
First you need to decide what good performance is in your environment, then find the worst parts of the query and optimize those first.
Last, consider how you are storing your data, look for places it makes sense to add an index if needed.
better you have to create an index for XXX column

SQL select * vs. selecting specific columns [duplicate]

This question already has answers here:
Why is SELECT * considered harmful?
(16 answers)
Closed 9 years ago.
I was wondering which is best practice. Lest say I have a table with 10+ columns and I want to select data from it.
I've heard that 'select *' is better since selecting specific columns makes the database search for these columns before selecting while selecting all just grabs everything. On the other hand, what if the table has a lot of columns in it?
Is that true?
Thanks
It is best practice to explicitly name the columns you want to select.
As Mitch just said the performance isn't different. I even heard that looking up the actual columns names when using * is slower.
But the advantage is that when your table changes then your select does not change when you name your columns.
I think these two questions here and here have satisfactory answers.
* is not better, actually it is slower is one reason that select * is not good. In addition to this, according to OMG Ponies, select * is anti-pattern. See the questions in the links for detail.
selecting specific columns is better as it is raises the probability that SQL Server can access the data from indexes rather than querying the table data.
It's also require less changes, since any code that consumes the data will be getting the same data structure regardless of changes you make to the table schema in the future.
Definetly not. Try making a SELECT * from a table which has millions of rows and tens of columns.
The performance with SELECT * will be worse.
It depends on what you're about to do with the result. Selecting unnecessary data is not a good practice either. You wouldn't create a bunch of variables with values you would never use. So selecting many columns you don't need is not a good idea either.
It depends.
Selecting all columns can make query slower because of need of reading all columns from disk -- if there are a lot of string columns (which are not in index) then it can have huge impact on query (IO) performance. And from my practise -- you rely need all columns.
From the other hand -- for small database with a few user and good enough hardware it's much easier to select just all columns -- especially if schema changes often.
However -- I would always recommended to explicitly select columns to make sure it doesn't hurt performance.

Group By making query astronomically longer

*As a first note, I only have read access to my server. Just, FYI as it seems to come up a lot...
Server:DB2(6.1) for i (IBM)
I have a query I'm running on a table that has 19mil rows in it (I don't design them, I just query them). I've been limiting my return data to 10 rows (*) until I get this query sorted out so that return times are a bit more reasonable.
The basic design is that I need to get data about categories of products we sell on a week by week basis, using columns: WEEK_ID, and CATEGORY. Here's example code (with some important bits #### out.)
SELECT WEEK_ID, CATEGORY
FROM DWQ####.SLSCATW
INNER JOIN DW####.CATEGORY
ON DWQ####.SLSCATW.CATEGORY_NUMBER = DW####.CATEGORY.CATEGORY_NUMBER
WHERE WEEK_ID
BETWEEN 200952 AND 201230 --Format is year/week
GROUP BY WEEK_ID, CATEGORY
If I comment out that last line I can get back 100 rows in 254 ms. If I put that line back in my return takes longer than I've had patience to wait for :-). (Longest I've waited is 10 minutes.)
This question has two parts. The first question is quite rudimentary: Is this normal? There are 50 categories (roughly) and 140 weeks (or so) that I'm trying to condense down to. I realize that's a lot of info to condense off of 19mil rows, but I was hoping limiting my query to 10 rows returned would minimize the amount of time?
And, if I'm not just a complete n00b, and this in fact should not take several minutes, what exactly is wrong with my SQL?
I've Googled WHERE statement optimization and can't seem to find anything. All links and explanation are more than welcome.
Apologies for such a newbie post... we all have to start somewhere, right?
(*)using SQLExplorer, my IDE, an Eclipse implementation of Squirrel SQL.
I'm not sure how the server handles group by when there's no aggregating functions in the query. Based on your answers in the comments, I'd just try to add those:
SELECT
...,
SUM(SalesCost) as SalesCost,
SUM(SalesDollars) as SalesDollars
FROM
...
Leave the rest of the query as is.
If that doesn't solve the problem, you might have missing indexes. I would try to find out if there's an index where the WEEK_ID is the only column or where it is the first column. You could also check if you have another temporal column (i.e. TransactionDate or something similar) on the same table that already is indexed. If so, you could use that instead in the where clause.
Without correct indexes, the database server is forced to do a complete table scan, and that could explain your performance issues. 39 million rows does take some not insignificant amount of time to read from disk.
Also check that the data type of WEEK_ID is int or similar, just to avoid unnecessary casting in your query.
To avoid a table scan on the Category table, you need to make sure that Category_Number is indexed as well. (It probably already is, since I assume it is a key to that table.)
Indices on WEEK_ID, CATEGORY (and possibly CATEGORY_NUMBER) is the only way to make it really fast, so you need to convince the DBO to introduce those.

Is there a performance difference between select * from tablename and select column1, column2 from tablename? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicates:
Select * vs Specifying Column Names
Which is faster/best? SELECT * or SELECT column1, colum2, column3, etc.
Is there a performance difference between select * from tablename and select column1, column2 from tablename?
When it is select * from, the database pulls out all fields/columns which are more than 2 fields/columns. So does the first query cost more time/resources?
If you do select * from, there are two performance issues:
the database has to determine which columns exist in the table
there is more data sent from the server to the client (all columns instead of only two)
The answer is YES in general! Because for small databases you don't see a performance difference ... but with biggest databases could be relevant differences if you use the unqualified * selector as shorthand!
In general, it's better to instantiate each column from which you want to retrieve data!
I can suggest you to read the official document about how to optimize SELECT and other statements!
In each case you always should test your changes. You could use a profiler to do that.
For mysql see : http://dev.mysql.com/tech-resources/articles/using-new-query-profiler.html
there is difference, specially in case when the other columns are BLOB or (big) TEXT fields. in case, your table contains just the two columns, there is no difference.
I've checked with the profiler - and it seems that the answer is no - both queries took the same time to execute.
The table had relatively small fields so the result set wasn't bloated with large quantities of data I potentially wouldn't need.
If you have large data fields you don't need in your result-set don't include them in your query
And while this may not make a big difference with one query run one time, if you use select * through the application, and especially if you use it when you are doing joins where you are definitely returning unneeded data, you are clearly slowing down the system for no good reason other than developer laziness. Select * should almost never be used on a production system.
Well, I guess it would depend on which type of performance you're talking about: Database or Programmer.
Speaking as someone who has had to clean up database queries that were written in the
select * from foo
format, it's a total nightmare. The database can and does change, so now the neat and tidy normalized database has become denormalized for performance reasons, and the host of sloppy queries written to grab everything are now slogging tons more data back.
If you don't need it, don't ask for it. Take a moment, think of the people who will follow after you and need to deal with your choices.
I've still got an entire section in our LIMS to uncluster, thanks for reminding me. -- Sigh