Why SQL-Standard doesn't allow COUNT(col1, col2, ..., colN) - sql

The thing is simple, why doesn't SQL-Standard allow COUNT(col1, col2, ..., colN)? What's the reason behind?
It's pretty strange because, viceversa, SQL-standard allows COUNT(DISTINCT col1, col2, ..., colN).

What do you want? Simply COUNT(*) gives you the 'desired' answer.
COUNT(*) is usually the thing to use. It counts the number of rows (after filtering by WHERE and subject to GROUP BY).
COUNT(col) is common, but often not necessary -- it counts rows with col IS NOT NULL.
COUNT(DISTINCT col) determines how many different values there are for col.
COUNT(DISTINCT col1, col2) determines how many different values there are for the combination of col1 and col2.
If you want to know why MySQL/MariaDB chose to leave out that syntax, you may have to ask the people who developed the SQL standard.

COUNT(NAME), COUNT(SURNAME) gets you ALL names and surnames. ALL is illegal in that context.

Related

What exactly is going on under the hood of WHERE clause in SQL when filtering based on two columns inequality

Let's assume we have a following table:
In short, there are unique ids in col1 and some non-unique corresponding values in col2.
Say we want to find the rows where col2 values are not uniquely defined.
e.g. in the following example such rows are 1 and 4.
col1
col2
1
"a"
2
"b"
3
"c"
4
"a"
So I found the following cryptic-looking (for me) code that does the job (test is the name of the table above):
SELECT *
FROM test a
WHERE col2 IN (SELECT col2 FROM test b WHERE b.col1 <> a.col1);
Sure, one way to do the task is to group by col2 and filter out those values that have count(col1) equal 1, but what does concern me is not the task at hand, but rather how does the WHERE clause in this context work.
I am aware of how tables are explicitly joined with JOINs, and I also understand the common use of WHERE clause like WHERE somecol != value. Yet, the way WHERE somecol != othercol work in this context is beyond me.
Could someone give me a clue of how does the code above work?
Maybe the question is stupid, sorry if that is the case.
Thanks!
edit:
Execution analysis here
In the absence of indexes, such a where clause is generally going to be implemented as a nested loop construct.
That is, for each row in the outer query, the engine is going to run the inner query. For each row, it will compare col1. And when these are not equal, it will check if col2 is the same in the outer query.
Engines do have a variety of algorithms so this is not guaranteed. However, non-equality conditions are harder to optimize and less frequent.
That said, there are much more efficient ways to express the query. For instance, you can use window functions. I believe this is the same logic -- assuming the values in the columns are not NULL:
select t.*
from (select t.*,
min(col1) over (partition by col2) as min_col1,
max(col1) over (partition by col2) as max_col1
from test t
) t
where min_col1 <> max_col1;

How does group by statement in SQL affect the results ?

Does including an extra column in group by change the number of rows in the results ?
I was doing a select query on a table A(col1,col2....col9) and I first included
select col1,col2,col3
from A where col1 = (condition)
group by col1, col2, col3
which yielded me certain number of results.
now I changed the query to this
`select col1,col2,col3, col8,col9
from A where col1=(condition)
group by col1,col2,col3, col8,col9'
and I got a different number of rows in the results. What could be the possible explanation ?
If the combination of col1, col2 and col3 is not unique, you can have more than one row with the same combination of those three.
If that happens, and those duplicates have different values for col8 and/or col9, then grouping by those extra columns will result in more rows.
Note that you can use select distinct to get the same results. group by is especially used if you want to aggregate over other columns, for instance, calculate a sum or a count, like so:
select
col1, col2, col3,
sum(col8) as total8
from A
group by col1, col2, col3
The query above will give you each unique combination of col1, col2 and col3 plus the sum over all col8's for each combination.
By grouping on those columns you are, in essence, making the results distinct on the grouped columns. So if there were rows that had columns 1, 2, 3, 18, and 19 in common, they would be folded together.
Adding GROUP BY isn't really the correct way to go about this as instead of grouping by the one column it tries to group across the board so you may end up with fewer or greater results depending on the data you're querying.
May I ask for what reason you're grouping the columns?

How to manipulate a column selected by * in SQLite?

I want a query to return all rows and all columns with one caveat: if, in a given row, colN is null, then instead return the string 'FOO'.
Why dont I just use SELECT col1, col2, ..., COALESCE(colN, 'FOO')?
I am implementing an abstract interface and thus I am required to use SELECT queries which SELECT * (because I cannot make assumptions on what columns there are). I can only assume 1 columns exists: colN.
What would this provide me?
I need this because this query is used in combination with a UNION and this allows me to keep track of the origin of the data.
Any ideas on how to do this?
One thing you could do is
SELECT *, COALESCE(colN, 'FOO') as CoalescedColN
if it's possible to adjust the other select(s) in the UNION accordingly
I don't know if SQL Lite can use this technique but this is what I would do in most other dbs:
select * from
(SELECT col1, col2, ..., COALESCE(colN, 'FOO') from table ) a

What do comma-separated integers in a GROUP BY statement accomplish?

I have a query like this:
SELECT col1, col2, col3, col4, col5, SUM(col6) AS total
FROM table_name
WHERE col1 < 99999
GROUP BY 1,2,3,4,5
What does the GROUP BY statement actually accomplish here? The query does not work properly without the comma-separated integers.
It is equivalent to writing:
SELECT col1, col2, col3, col4, col5, SUM(col6) AS total
FROM table_name
WHERE col1 < 99999
GROUP BY col1, col2, col3, col4, col5
The numbers are the values/columns in the select-list expressed by ordinal position in the list, starting with 1.
The numbers used to mandatory; then the ability to use the expressions in the select-list was added. The expressions can get unwieldy, and not all DBMS allow you to use 'display labels' or 'column aliases' from the select-list in the GROUP BY clause, so occasionally using the column numbers is helpful.
In your example, it would be better to use the names - they are simple. And, in general, use names rather than numbers whenever you can.
My guess is that your database product allows for referencing columns in the Group By by position as opposed to only by column name (i.e., 1 for the first column, 2 for the second column etc.) If so, this is a proprietary feature and is not recommended because of portability and (arguably) readability issues (But can admittedly be handy for a quick and dirty query).
Tried kind a same query in MS SQL Server 2005
select distinct host from some_table group by 1,2,3
It error's out saying
Each GROUP BY expression must contain at least one column that is not an outer reference.
So this indicates that those 1,2,3 are nothing but column outer referrence

Is there a difference between DISTINCT colname and DISTINCT(colname)?

I've seen both versions around. On iSeries DB2 you can use either and as far as I can tell they do the same thing. Is there a difference?
No, there is no difference because DISTINCT is a keyword and not a function call.
It's the same difference as between SOME_COLUMN and (SOME_COLUMN) (without any keyword in front)
If you have only one column in your select, then there is no difference.
However when you use distinct outside as -
select disctinct col1, col2, col3 from table
It applies distinct on the group tuple of (col1, col2, col3).
Finally there is no difference in using distinct as select distinct or select distinct()