Oracle query mistake - sql

I need to know where the mistake is in this oracle query?
SELECT(KEY1),COUNT(*) FROM TABLE1 GROUP BY AGE

SELECT KEY1,COUNT(*) FROM TABLE1 GROUP BY KEY1
There are two problems. First one: You cannot close the parenthesis after the first keyword. Second: You have to group by all keys that are in the query that are not all row dependend. In that case "KEY1". If you want to order by age you have to query age as parameter.
SELECT AGE,COUNT(*) FROM TABLE1 GROUP BY AGE
Your table naming is not very good. I assume you should have a look at group by tutorials like https://www.w3schools.com/sql/sql_groupby.asp or the sql tutorial https://www.w3schools.com/sql/

Your query had an issue. You have to modify your query as below
SELECT KEY1,COUNT(*) FROM TABLE1 GROUP BY KEY1.
Observation:
All the columns that are added in the select statement alongside the aggregate functions, should be included the group by columns.
Your first column does have the bracket in it which should be removed.

Related

Group by and having trouble understanding

I was looking at some SQL query that I have in Access database that I did not make.
One of the SQL query goes something like this:
select column1 from table1 group by column1 having count(*)>1
The purpose of this query is to find the value in column1 that appears more than once. I can verify that this query works correctly and returns the column value that appears more than once.
I however do not understand why this query works. As per my understanding using group by will remove duplicate fields. For instance if column1 had
column1
apple
mango
mango
Doing group by (column1) will result
column1
apple
mango
At this point, if we perform having count(*)>1 or having count(column1)>1, it should return no result because group by has already removed the duplicate field. But clearly, I am wrong as the above SQL statement does give the accurate result.
Would you please let me know the problem in my understanding?
Edit 1:
Besides the accepted answer, I this article which deals with order of SQL operation really helped my understanding
You are misunderstanding how HAVING works. In fact, you can think of it by using subqueries. Your query is equivalent to:
select column1
from (select column1, count(*) as cnt
from table1
group by column1
) as t
having cnt > 1;
That is, having filters an aggregation query after the aggregation has taken place. However, the aggregation functions are applied per group. So count(*) is counting the number of rows in each group. That is why it is identifying duplicates.
group by doesn't just remove duplicate values - it returns one row per distinct value of the group by clause, and allows you apply aggregate function per such unique value.
In this query, you actually query the values of column1 and the result of count(*) per value of column1, then, you use the having clause to return only the values of column1 that have a count(*) greater than 1.
GROUP BY clause groups the selection per the fields you mention, on this case column1 but can be a combined column (e.g. column1, column2).
By the way, I think if you run:
SELECT column1, Count(*) AS [Count], MIN(column2) AS MinColumn2, MAX(column2) AS MaxColumn2
FROM table1
GROUP BY column1;
Will help you to understand how GROUPING works. When filter by any column directly you may use the WHERE condition but if you want to filter per any field calculated from the grouping you need to use the HAVING clause.

SQL GROUP BY 1 2 3 and SQL Order of Execution

This may be a dumb question but I am really confused. So according to the SQL Query Order of Execution, the GROUP BY clause will be executed before the SELECT clause. However it allows to do something like:
SELECT field_1, SUM(field_2) FROM myTable GROUP BY 1
My confusion is that if GROUP BY clause happens before SELECT, in this scenario I provided, how does SQL know what 1 is? It works with ORDER BY clause and it makes sense to me because ORDER BY clause happens after SELECT.
Can someone help me out? Thanks in advance!
https://www.periscopedata.com/blog/sql-query-order-of-operations
My understanding is because it's ordinal notation and for the SELECT statement to pass syntax validation you have to have at least selected a column. So the 1 is stating the first column in the select statement since it knows you have a column selected.
EDIT:
I see people saying you can't use ordinal notation and they are right if you're using SQL Server. You can use it in MySQL though.
select a,b,c from emp group by 1,2,3. First it will group by column a then b and c. It works based on the column after the select statement.
Each GROUP BY expression must contain at least one column that is not an outer reference. You cannot group by 1 if it is not a column in your table.

Select query group by alias

I have this query:
SELECT * FROM TABLE1 WHERE AREA_CODE IN ('929', '718', '347', '646') GROUP BY AREA_CODE
Is it possible to get only one record row with name 'NEW_YORK_AREA' that includes all these four area codes? To be more clear, let's say you have 4 records in the table for each area code listed above but you want to get only one result(row) with alias 'NEW_YOUR_AREA'. I hope it is clear, let me know if you have any questions, I will edit the question. Thank you all and have a great day.
UPDATE: requirements have changed and it is no longer needed. Thank you all for your help! :)
DB2 supports listagg(). So:
SELECT 'NEW_YORK_AREA' as cityname,
LISTAGG(AREA_CODE, ',') WITHIN GROUP (ORDER BY AREA_CODE) as areacodes
FROM TABLE1
WHERE AREA_CODE IN ('212', '929', '718', '347', '646') ;
I helpfully added 212, the most famous NYC area code ;)
If you have duplicates, then you need to use a subquery to remove them before aggregating.
Logically, what you want to do is group everything into the same category. You could do this by explicitly grouping all rows by a single value:
select 'NEW_YORK_AREA',
--whatever functions you need to aggregate the data here.
count(var1),
max(var2)
from table1
where area_code in ('929', '718', '347', '646')
group by 1
However, if the only functions that refer to the data in the table are aggregate functions, DB2 lets you omit the group by, and it will automatically group everything into a single row. The following is equivalent to the above query:
select 'NEW_YORK_AREA',
count(var1),
max(var2)
from table1
where area_code in ('929', '718', '347', '646')
What about creating a AREA_CODE_GROUP table
AREA_GROUP,AREA_CODE
'NEW_YORK_AREA','929'
'NEW_YORK_AREA','718'
'NEW_YORK_AREA','347'
'NEW_YORK_AREA','646'
that you can join:
SELECT t.* FROM TABLE1 "t"
INNER JOIN AREA_CODE_GROUP "g"
ON t.AREA_CODE = g.AREA_CODE
WHERE AREA_GROUP = 'NEW_YORK_AREA'

How to SELECT columns without including them in GROUP BY access sql

My sample sql query
SELECT EID,p,p1,p2,p3 FROM table 1 GROUP BY EID;
Giving error not part of aggregate function.I wanted to group by only EID not all other p,p1,p2,p3. How do i specify that in sql query.
In most dialects of SQL, you have to specify which column you want, if the column is not in the group by clause. For instance, maybe you want the minimum value:
SELECT EID, min(p), min(p1), min(p2), min(p3)
FROM table 1
GROUP BY EID;
Or, if you wanted all the values from a particular record, use first or last:
SELECT EID, first(p), first(p1), first(p2), first(p3)
FROM table 1
GROUP BY EID;

"group by" needed in count(*) SQL statement?

The following statement works in my database:
select column_a, count(*) from my_schema.my_table group by 1;
but this one doesn't:
select column_a, count(*) from my_schema.my_table;
I get the error:
ERROR: column "my_table.column_a" must appear in the GROUP BY clause
or be used in an aggregate function
Helpful note: This thread: What does SQL clause "GROUP BY 1" mean? discusses the meaning of "group by 1".
Update:
The reason why I am confused is because I have often seen count(*) as follows:
select count(*) from my_schema.my_table
where there is no group by statement. Is COUNT always required to be followed by group by? Is the group by statement implicit in this case?
This error makes perfect sense. COUNT is an "aggregate" function. So you need to tell it which field to aggregate by, which is done with the GROUP BY clause.
The one which probably makes most sense in your case would be:
SELECT column_a, COUNT(*) FROM my_schema.my_table GROUP BY column_a;
If you only use the COUNT(*) clause, you are asking to return the complete number of rows, instead of aggregating by another condition. Your questing if GROUP BY is implicit in that case, could be answered with: "sort of": If you don't specify anything is a bit like asking: "group by nothing", which means you will get one huge aggregate, which is the whole table.
As an example, executing:
SELECT COUNT(*) FROM table;
will show you the number of rows in that table, whereas:
SELECT col_a, COUNT(*) FROM table GROUP BY col_a;
will show you the the number of rows per value of col_a. Something like:
col_a | COUNT(*)
---------+----------------
value1 | 100
value2 | 10
value3 | 123
You also should take into account that the * means to count everything. Including NULLs! If you want to count a specific condition, you should use COUNT(expression)! See the docs about aggragate functions for more details on this topic.
If you don't use the Group by clause at all then all that will be returned is a count of 1 for each row, which is already assumed anyway and therefore redundant data. By adding GROUP BY 1 you have categorized the information thereby making it non-redundant even though it returns the same result in theory as the statement that creates an error.
When you have a function like count, sum etc. you need to group the other columns. This would be equivalent to your query:
select column_a, count(*) from my_schema.my_table group by column_a;
When you use count(*) with no other column, you are counting all rows from SELECT * from the table. When you use count(*) alongside another column, you are counting the number of rows for each different value of that other column. So in this case you need to group the results, in order to show each value and its count only once.
group by 1 in this case refers to column_a which has the column position 1 in your query.
This why it works on your server. Indeed this is not a good practice in sql.
You should mention the column name because the column order may change in the table so it will be hard to maintain this code.
The best solution is:
select column_a, count(*) from my_schema.my_table group by column_a;