sqlite SELECT AVG returns null - sql

Does anyone know why a SQL SELECT query returns no rows when SELECTing from an empty table, but when trying to SELECT the AVG from a column in an empty table it returns < null >? The difference in behavior just seems odd to me. I’m using a sqlite database if that makes any difference.
Here are the two queries:
Normal select: SELECT a FROM table1
If table1 is empty I get no rows back
Avg select: SELECT AVG(a) FROM table1
If table1 is empty I get back a < null > row.

From the ANSI 92 spec
b) If AVG, MAX, MIN, or SUM is
specified, then
Case:
i) If TXA is empty, then the result is the null value.
Read more at: http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt

I'm not positive, but to determine average, you must divide by the number of rows. If the number of rows is zero, dividing by it would be undefined. Thus, the NULL return. Just a guess.

You're doing an aggregate. Since the aggregate is defined for 0-n rows (in this case, 0 rows yields null), you will always get one result back (exactly one in this case).
To put it another way, you're not asking for rows from the table--you're asking for the average of one column in the table and that's what you're getting back. Getting anything other than one row in this case would be weirder.
If you had asked for non-aggregated columns, too, e.g.
SELECT Salesperson, AVG(Sale)
FROM Sales
GROUP BY Salesperson
then I would expect you to get no rows back because there wouldn't be anything to satisfy the non-aggregate selects.

AVG is an aggregate function similiar to COUNT. If you do:
SELECT COUNT(a) FROM table1 you'd expect to get a zero row.
its the same with AVG, SUM, etc. You get the one row with the result of the aggregate function.

Related

What happens when you use DISTINCT * in COUNT() in SQL?

I've just learned about the COUNT() function, and how it is possible to get the number of rows in a column by passing * as the argument.
SELECT COUNT(*) FROM table;
I've also learned that we can get the number of distinct rows of a column in a table by using DISTINCT.
SELECT COUNT(DISTINCT column) FROM table;
I've noticed that the following returns nothing.
SELECT COUNT(DISTINCT *) FROM table;
Why is this?
I suppose the root of my issue is that I don't quite fully understand what the COUNT() function with * as the argument does exactly. My resource says that the COUNT() function takes a column as an argument and counts how many non-NULL rows there are. So say we have a table that has a column with some rows having both NULL and non-NULL values. If COUNT(column) doesn't count the non-NULL rows, what happens differently in COUNT(*) so that all the rows are counted? And by extension, what happens during COUNT(DISTINCT *)?
This would be a syntax error in most databases. If it were allowed, it would probably be equivalent to:
select count(*)
from (select distinct * from t) t
However, NULL values might throw it off.

Why does this query return the first row instead of the name and country that has the most yellowcard

I have a database of soccer teams.
My query looks like this:
SELECT
players.Name,
players.Country
FROM
players
WHERE
players.Player_id = (SELECT player_cards.Player_id
FROM player_cards
HAVING MAX(player_cards.Yellow_Cards));
But it only returns the name and country of the player in the first row instead of the info of the player with the most yellow cards.
Why is this happening? How should I fix it?
The subquery that you want is:
players.Player_id = (SELECT pc.Player_id
FROM player_cards pc
ORDER BY pc.Yellow_Cards DESC
LIMIT 1
);
Note that if you have duplicates, this returns only one max value.
Why doesn't yours work? First, it is not really valid SQL, because the HAVING clause makes this an aggregation query . . . and without a GROUP BY there should be no unaggregated columns in the SELECT. You have one, so I am guessing that you are using an older version of MySQL.
What does the query do? It returns one row because of the MAX() in the HAVING clause. The one row has an arbitrary value of Player_id -- which might be the "first" row or any row. Which row the value comes from is undefined.
The HAVING clause serves two purposes. It makes the query an aggregation query that returns exactly one row. And it validates that the maximum value of the column is not 0 or NULL.

SQL returning 0 for count(), but returning multiple rows with simple SELECT

Sorry if the phrasing of my question was not very clear.
I am running this simple query below
SELECT count(cg)
FROM all_data
WHERE cg is null
and am getting 0 as the result. When I run this query
SELECT cg
FROM all_data
WHERE cg is null
and get a bunch of records that fit the criteria. There are very obviously many records that have a cg value of null, but they do not appear from the count() query.
Is there a reason for this? Am I doing something wrong?
Thanks for any help
Aggregates (COUNT(), SUM() etc.) ignore NULL values.
Use COUNT(*) to count all rows matching your condition.
SELECT COUNT(*)
FROM all_data
WHERE cg IS NULL
Further reading - Count Function (Microsoft Access SQL):
The Count function does not count records that have Null fields unless expr is the asterisk (*) wildcard character. If you use an asterisk, Count calculates the total number of records, including those that contain Null fields. Count(*) is considerably faster than Count([Column Name]).
If you want to count the amount of null values use the following query
SELECT
SUM(CASE WHEN CG IS NULL THEN 1 END) AMOUNT_CG
FROM all_data
No more follow the tip of the friend above
According to the SQL Reference Manual section on Aggregate Functions:
All aggregate functions except COUNT(*) and GROUPING ignore nulls. You can use the NVL function in the argument to an aggregate function to substitute a value for a null. COUNT never returns null, but returns either a number or zero. For all the remaining aggregate functions, if the data set contains no rows, or contains only rows with nulls as arguments to the aggregate function, then the function returns null.
So from above information we can conclude that to solve your problem use count(*) instead of count(cg).

Count of 2 columns by GROUP BY and catx giving different outputs

I have to find distinct count of combination of 2 variables. I used the following 2 queries to find the count:
select count(*) from
( select V1, V2
from table1
group by 1,2
) a
select count(distinct catx('-', V1, V2))
from table1
Logically, both the above queries should give the same count but I am getting different counts. Note that
both V1 and V2 are integers
Both variables can have null values, though there are no null values in my table
There are no negative values
Any idea why I might be getting different outputs? And which is the best way to find the count of distinct combinations of 2 or more columns?
Thanks.
The SAS log gives the answer when you run the first sql code. Using 'group by' requires a summary function, otherwise it is ignored. The count will therefore return the overall number of rows instead of a distinct count of the 2 variables combined.
Just add count(*) to the subquery and you will get the same answer with both methods.
select count(*) from
( select V1, V2, count(*)
from table1
group by 1,2
) a
Use distinct in the subquery for the first query..
When you do a group by but don't include any aggregate function, it discards the group by.
so you will still have duplicate combinations of v1 and v2.
It seems that GROUP BY doesn't work that way in SAS. You can't use it to remove duplicates unless you have an aggregate function in your query. I found this in the log of my query output -
NOTE: A GROUP BY clause has been discarded because neither the SELECT
clause nor the optional HAVING clause of the associated
table-expression referenced a summary function.
This answers the question.
you can ignore the group by part also and just add a distinct in the sub-query. Also the second query you wrote is more efficient

Not getting the correct count in SQL

I am totally new to SQL. I have a simple select query similar to this:
SELECT COUNT(col1) FROM table1
There are some 120 records in the table and shown on the GUI.
For some reason, this query always returns a number which is less than the actual count.
Can somebody please help me?
Try
select count(*) from table1
Edit: To explain further, count(*) gives you the rowcount for a table, including duplicates and nulls. count(isnull(col1,0)) will do the same thing, but slightly slower, since isnull must be evaluated for each row.
You might have some null values in col1 column. Aggregate functions ignore nulls.
try this
SELECT COUNT(ISNULL(col1,0)) FROM table1
Slightly tangential, but there's also the useful
SELECT count(distinct cola) from table1
which gives you number of distinct column in the table.
You are getting the correct count
As per https://learn.microsoft.com
COUNT(*) returns the number of items in a group. This includes NULL values and duplicates.
COUNT(ALL expression) evaluates an expression for each row in a group and returns the number of nonnull values.
COUNT(DISTINCT expression) evaluates an expression for each row in a group and returns the number of unique, non null values.
In your case you have passed the column name in COUNT that's why you will get count of not null records, now you're in your table data you may have null values in given column(col1)
Hope this helps!