Different faces of COUNT - sql

I would like to know the difference between the following 4 simple queries in terms of result and functionality:
SELECT COUNT(*) FROM employees;
SELECT COUNT(0) FROM employees;
SELECT COUNT(1) FROM employees;
SELECT COUNT(2) FROM employees;

The four examples all evaluate to the same number - there is no difference.
What might give a different answer would be:
SELECT COUNT(middle_initial) FROM employees;
If there are any entries with a NULL in the middle_initial column, then the count returned will be different from COUNT(*) because it will be just the number of non-null values in the column.

No difference in terms of result, they all return the number of rows in employees.
COUNT(expression) simply means "for each row in this table, if expression evaluates to a non-null value, count this row".
But, * means count anything, while n is a constant numeric value and is therefore never null. Hence, both don't take into account the actual row data and thus return the total number of rows in a table.

SELECT COUNT(x) FROM employees will give you the number of rows where x is not null.

Count(*) :
Specifies that all rows should be counted to return the total number of rows in a table. COUNT(*) takes no parameters and cannot be used with DISTINCT. COUNT(*) does not require an expression parameter because, by definition, it does not use information about any particular column. COUNT(*) returns the number of rows in a specified table without getting rid of duplicates. It counts each row separately. This includes rows that contain null values.
Hence, COUNT(*) returns the number of items in a group. This includes NULL values and duplicates.
To sum. Count(*) will return all rows in your query which match your where clause.
So if you go
SELECT COUNT(*) FROM EMPLOYEES
the row count will be returned the same as if you went
SELECT * FROM employees
All rows will be returned from the table.
Count 1,2,3 and 4
COUNT(*) counts the number of rows produced by the query, whereas COUNT(1) counts the number of 1 values. Note that when you include a literal such as a number or a string in a query, this literal is "appended" or attached to every row that is produced by the FROM clause. This also applies to literals in aggregate functions, such as COUNT(1). The same can be said for Count(2) , Count(3) and Count(4). It will evaluate the expression based on the number of Count(variable) values and return non-null results.
So if you go
SELECT COUNT(1) from emplyees
it will return the same row count as if you went
SELECT first_name from employees
(Where first_name is column no. 1 in the table)
However the advantage here is you can go
SELECT COUNT(Distinct 1) from employees
and then it would return the count of unique records for that column in the table.

Related

Why does MAX statement require a Group By?

I understand why the first query needs a GROUP BY, as it doesn't know which date to apply the sum to, but I don't understand why this is the case with the second query. The value that ultimately is the max amount is already contained in the table - it is not calculated like SUM is. thank you
-- First Query
select
sum(OrderSales),OrderDates
From Orders
-- Second Query
select
max(FilmOscarWins),FilmName
From tblFilm
It is not the SUM and MAX that require the GROUP BY, it is the unaggregated column.
If you just write this, you will get a single row, for the maximum value of the FilmOscarWins column across the whole table:
select
max(FilmOscarWins)
From
tblFilm
If the most Oscars any film won was 12, that one row will say 12. But there could be multiple films, all of which won 12 Oscars, so if we ask for the FilmName alongside that 12, there is no single answer.
By adding the Group By, we fundamentally change the query: instead of returning one number for the whole table, it will return one row for each group - which in this case, means one row for each film.
If you do want to get a list of all those films which had the maximum 12 Oscars, you have to do something more complicated, such as using a sub-query to first find that single number (12) and then find all the rows matching it:
select
FilmOscarWins,
FilmName
From
tblFilm
Where FilmOscarWins = (
select
max(FilmOscarWins)
From
tblFilm
)
If you want the film with the most Oscar wins, then use select top:
select top (1) f.*
From tblFilm f
order by FilmOscarWins desc;
In an aggregation query, the select columns need to be consistent with the group by columns -- the unaggregated columns in the select must match the group by.

How can BigQuery SQL give different DISTINCT and GROUP BY results?

We seem to be getting two different, mutually incompatible results from legacy SQL and standard SQL in Google Big Query.
Here is our standard SQL Query...which gives an answer with 218,529 rows.
SELECT DISTINCT(EID)
FROM test.ourBQtable
Here is our legacy SQL Query...
SELECT COUNT(EID) AS Total, EID
FROM [ourBQproject:test.ourBQtable]
GROUP BY EID
ORDER BY Total DESC
This shows results that look like the table below but yet also shows 218,529 rows of results:
Total EID
376 jb+qLvHMm5JrMkNybAi6uC75FzgsGcNQhJ19IeWFDcQ=
352 JGqNBgicm+mpcYBS4K7AI2WXI3xaSgMkktb+7oOjjnQ=
How is it possible to have what appears to be duplicate EIDs (376 of them as shown in one case in the table) - but when using the DISTINCT(EID) command - the number of rows doesn't decrease? Shouldn't DISTINCT be filtering out all the duplicate rows? Do we really have duplicate rows?
What are we missing in our understanding?
Your code appears to be working exactly correctly.
DISTINCT EID is saying that there are 218,529 different values of EID. This should be returning one row for each of the 218,529 different EIDs.
When you use GROUP BY, you are getting one row for each of the EIDs. In this case, you get the same number.
Try running this query:
SELECT COUNT(*) as num_rows, COUNT(DISTINCT EID) as num_eids
FROM test.ourBQtable;
This will show the number of rows in the table and the number of distinct values of EID (ignoring NULL values)`.
Below two query are equivalent and return same number or rows - one per each unique EID
SELECT DISTINCT EID
FROM test.ourBQtable
and
SELECT EID
FROM test.ourBQtable
GROUP BY EID
That explains why number of output rows are the same
Now, in second query you added COUNT(EID)
SELECT COUNT(EID) AS Total, EID
FROM test.ourBQtable
GROUP BY EID
this does not change the number of output rows, but rather adds count of rows in test.ourBQtable with respective EID (if you sum all these counts - you will get total rows in the original table)

sql server: How can I show show category name although there is no record exisits

Can anyone help me to find a way to show category name, age, and total number of 0 even if there is no record exists? Now when I run the below SQL, it returns nothing. Thanks.
SELECT
'ADMISSION: ' AS CATEGORY_NAME
,AGE
,COUNT(ID) AS COUNTS
FROM table
GROUP BY AGE
Leave out the GROUP BY:
SELECT 'ADMISSION: ' AS CATEGORY_NAME,
COUNT(ID) AS COUNTS
FROM admission;
An aggregation query with no GROUP BY always returns one row. If you have a GROUP BY, then such a query will return no rows for an empty table (or if all rows are filtered out).
Also COUNT() doesn't return NULL. It returns 0 in this case.

How to insert a count column into a sql query

I need the second column of the table retrieved from a query to have a count of the number of rows, so row one would have a 1, row 2 would have a 2 and so on. I am not very proficient with sql so I am sorry if this is a simple task.
A basic example of what I am doing would be is:
SELECT [Name], [I_NEED_ROW_COUNT_HERE],[Age],[Gender]
FROM [customer]
The row count must be the second column and will act as an ID for each row. It must be the second row as the text file it is generating will be sent to the state and they require a specific format.
Thanks for any help.
With your edit, I see that you want a row ID (normally called row number rather than "count") which is best gathered from a unique ID in the database (person_id or some other unique field). If that isn't possible, you can make one for this report with ROW_NUMBER() OVER (ORDER BY EMPLOYEE_ID DESC) AS ID, in your select statement.
select Name, ROW_NUMBER() OVER (ORDER BY Name DESC) AS ID,
Age, Gender
from customer
This function adds a field to the output called ID (see my tips at the bottom to describe aliases). Since this isn't in the database, it needs a method to determine how it will increment. After the over keyword it orders by Name in descending order.
Information on Counting follows (won't be unique by row):
If each customer has multiple entries but the selected fields are the same for that user and you are counting that user's records (summed in one result record for the user) then you would write:
select Name, count(*), Age, Gender
from customer
group by name, age, gender
This will count (see MSDN) all the user's records as grouped by the name, age and gender (if they match, it's a single record).
However, if you are counting all records so that your whole report has the grand total on every line, then you want:
select Name, (select count(*) from customer) as "count", Age, Gender
from customer
TIP: If you're using something like SSMS to write a query, dragging in columns will put brackets around the columns. This is only necessary if you have spaces in column names, but a DBA will tend to avoid that like the plague. Also, if you need a column header to be something specific, you can use the as keyword like in my first example.
W3Schools has a good tutorial on count()
The COUNT(column_name) function returns
the number of values (NULL values will not be counted) of the
specified column:
SELECT COUNT(column_name) FROM table_name;
The COUNT(*) function returns the number of records in a table:
SELECT COUNT(*) FROM table_name;
The COUNT(DISTINCT column_name) function returns the number of
distinct values of the specified column:
SELECT COUNT(DISTINCT column_name) FROM table_name;
COUNT(DISTINCT) works with ORACLE and Microsoft SQL Server, but
not with Microsoft Access.
It's odd to repeat the same number in every row but it sounds like this is what you're asking for. And note that this might not work in your flavor of SQL. MS Access?
SELECT [Name], (select count(*) from [customer]), [Age], [Gender]
FROM [customer]

Is GROUP BY needed in the following correlated subquery?

Given scenario:
table fd
(cust_id, fd_id) primary-key and amount
table loan
(cust_id, l_id) primary-key and amount
I want to list all customers who have a fixed deposit with an amount less than the sum of all their loans.
Query:
SELECT cust_id
FROM fd
WHERE amount
<
(SELECT sum(amount)
FROM loan
WHERE fd.cust_id = loan.cust_id);
OR should we use
SELECT cust_id
FROM fd
WHERE amount
<
(SELECT sum(amount)
FROM loan
WHERE fd.cust_id = loan.cust_id group by cust_id);
A customer can have multiple loans but one FD is considered at a time.
GROUP BY can be omitted in this case, because there is only (one) aggregate function(s) in the SELECT list and all rows are guaranteed to belong to the same group of cust_id ( by the WHERE clause).
The aggregation will be over all rows with matching cust_id in both cases. So both queries are correct.
This would be a cleaner another way to implement the same thing:
SELECT fd.cust_id
FROM fd
JOIN loan USING (cust_id)
GROUP BY fd.cust_id, fd.amount
HAVING fd.amount < sum(loan.amount)
There is one difference: rows with identical (cust_id, amount) in fd only appear once in the result of my query, while they would appear multiple times in the original.
Either way, if there is no matching row with a non-null amount in table loan, you get no rows at all. I assume you are aware of that.
There are no need for GROUP BY since you filtered data by cust_id. In any case inner query will return the same result.
No, it isn't, because you calculate sum(amount) for customer with id = fd.cust_id, so for a single customer.
However, if somehow your subquery calculate sum for more than one customer, the group by would cause the subquery to generate more than one row and this will cause the condition(<) to fail, and thus, the query to fail.
A query with an aggregate like sum but without a group by will output one group. The aggregates will be computed over all matching rows.
A subquery in a condition clause is only allowed to return one row. If the subquery returned multiple rows, what would the following expression mean?
where 1 > (... subquery ...)
So the group by must be omitted; you would even get an error for your second query.
N.B. When you specify all, any, or some a subquery can return multiple rows:
where 1 > ALL (... subquery ...)
But it's easy to see why that doesn't make sense in your case; you'd compare one customer's data to that of another.