Group by clause not working in DB2 (sql code -119) - sql

I want to get the amount of results for each day of the past week. Unfortunately, I got this error for the query:
An expression starting with "APP_ID" specified in a SELECT clause,
HAVING clause, or ORDER BY clause is not specified in the GROUP BY
clause or it is in a SELECT clause, HAVING clause, or ORDER BY clause
with a column function and no GROUP BY clause is specified..
SQLCODE=-119, SQLSTATE=42803, DRIVER=3.67.27
The query:
SELECT DAYNAME(created), app_id
FROM Annotation
WHERE app_id = 1 AND (created < CURRENT DATE - 7 DAYS)
GROUP BY DAYNAME(created) ORDER BY created
The problem has something to do with the GROUP BY statement. What is wrong with it?

I think the error is pretty clear -- appid is in the SELECT but not the GROUP BY. The solution is that you need an aggregation function. I would expect something like this:
SELECT DAYNAME(created), COUNT(*)
FROM Annotation a
WHERE app_id = 1 AND (created < CURRENT DATE - 7 DAYS)
GROUP BY DAYNAME(created)
ORDER BY MIN(created);

If you want to use group by, then every column must either by in your group by statement, or aggregated.
select col1, col2, 'same-for-every-row', sum(col3) as col3_sum, avg(col4) as col4_avg
from schema.table
group by col1, col2
This works because col1 and col2 have been grouped by, but every other column has some aggregation to know how to group up all the values.
Your current statement won't work, because although you've grouped by date, you haven't specified how to group all of the rows for app_id, you need to specify that they should be grouped by summing or averaging or finding the minimum or aggregating in some other way, all of the values in that group.
The exception being a column that's created using a string, 'same-for-every-row', this won't need to be aggregated as it's the same every time.

Related

Selecting distinct values from database

I have a table as follows:
ParentActivityID | ActivityID | Timestamp
1 A1 T1
2 A2 T2
1 A1 T1
1 A1 T5
I want to select unique ParentActivityID's along with Timestamp. The time stamp can be the most recent one or the first one as is occurring in the table.
I tried to use DISTINCT but i came to realise that it dosen't work on individual columns. I am new to SQL. Any help in this regard will be highly appreciated.
DISTINCT is a shorthand that works for a single column. When you have multiple columns, use GROUP BY:
SELECT ParentActivityID, Timestamp
FROM MyTable
GROUP BY ParentActivityID, Timestamp
Actually i want only one one ParentActivityID. Your solution will give each pair of ParentActivityID and Timestamp. For e.g , if i have [1, T1], [2,T2], [1,T3], then i wanted the value as [1,T3] and [2,T2].
You need to decide what of the many timestamps to pick. If you want the earliest one, use MIN:
SELECT ParentActivityID, MIN(Timestamp)
FROM MyTable
GROUP BY ParentActivityID
Try this:
SELECT [ParentActivityId],
MIN([Timestamp]) AS [FirstTimestamp],
MAX([Timestamp]) AS [RecentTimestamp]
FROM [Table]
GROUP BY [ParentActivityId]
This will provide you the first timestamp and the most recent timestamp for each ParentActivityId that is present in your table. You can choose the ones you need as per your need.
"Group by" is what you need here. Just do "group by ParentActivityID" and tell that most recent timestamp along all rows with same ParentActivityID is needed for you:
SELECT ParentActivityID, MAX(Timestamp) FROM Table GROUP BY ParentActivityID
"Group by" operator is like taking rows from a table and putting them in a map with a key defined in group by clause (ParentActivityID in this example). You have to define how grouping by will handle rows with duplicate keys. For this you have various aggregate functions which you specify on columns you want to select but which are not part of the key (not listed in group by clause, think of them as a values in a map).
Some databases (like mysql) also allow you to select columns which are not part of the group by clause (not in a key) without applying aggregate function on them. In such case you will get some random value for this column (this is like blindly overwriting value in a map with new value every time). Still, SQL standard together with most databases out there will not allow you to do it. In such case you can use min(), max(), first() or last() aggregate function to work around it.
Use CTE for getting the latest row from your table based on parent id and you can choose the columns from the entire row of the output .
;With cte_parent
As
(SELECT ParentActivityId,ActivityId,TimeStamp
, ROW_NUMBER() OVER(PARTITION BY ParentActivityId ORDER BY TimeStamp desc) RNO
FROM YourTable )
SELECT *
FROM cte_parent
WHERE RNO =1

How to SELECT columns without including them in GROUP BY access sql

My sample sql query
SELECT EID,p,p1,p2,p3 FROM table 1 GROUP BY EID;
Giving error not part of aggregate function.I wanted to group by only EID not all other p,p1,p2,p3. How do i specify that in sql query.
In most dialects of SQL, you have to specify which column you want, if the column is not in the group by clause. For instance, maybe you want the minimum value:
SELECT EID, min(p), min(p1), min(p2), min(p3)
FROM table 1
GROUP BY EID;
Or, if you wanted all the values from a particular record, use first or last:
SELECT EID, first(p), first(p1), first(p2), first(p3)
FROM table 1
GROUP BY EID;

"group by" needed in count(*) SQL statement?

The following statement works in my database:
select column_a, count(*) from my_schema.my_table group by 1;
but this one doesn't:
select column_a, count(*) from my_schema.my_table;
I get the error:
ERROR: column "my_table.column_a" must appear in the GROUP BY clause
or be used in an aggregate function
Helpful note: This thread: What does SQL clause "GROUP BY 1" mean? discusses the meaning of "group by 1".
Update:
The reason why I am confused is because I have often seen count(*) as follows:
select count(*) from my_schema.my_table
where there is no group by statement. Is COUNT always required to be followed by group by? Is the group by statement implicit in this case?
This error makes perfect sense. COUNT is an "aggregate" function. So you need to tell it which field to aggregate by, which is done with the GROUP BY clause.
The one which probably makes most sense in your case would be:
SELECT column_a, COUNT(*) FROM my_schema.my_table GROUP BY column_a;
If you only use the COUNT(*) clause, you are asking to return the complete number of rows, instead of aggregating by another condition. Your questing if GROUP BY is implicit in that case, could be answered with: "sort of": If you don't specify anything is a bit like asking: "group by nothing", which means you will get one huge aggregate, which is the whole table.
As an example, executing:
SELECT COUNT(*) FROM table;
will show you the number of rows in that table, whereas:
SELECT col_a, COUNT(*) FROM table GROUP BY col_a;
will show you the the number of rows per value of col_a. Something like:
col_a | COUNT(*)
---------+----------------
value1 | 100
value2 | 10
value3 | 123
You also should take into account that the * means to count everything. Including NULLs! If you want to count a specific condition, you should use COUNT(expression)! See the docs about aggragate functions for more details on this topic.
If you don't use the Group by clause at all then all that will be returned is a count of 1 for each row, which is already assumed anyway and therefore redundant data. By adding GROUP BY 1 you have categorized the information thereby making it non-redundant even though it returns the same result in theory as the statement that creates an error.
When you have a function like count, sum etc. you need to group the other columns. This would be equivalent to your query:
select column_a, count(*) from my_schema.my_table group by column_a;
When you use count(*) with no other column, you are counting all rows from SELECT * from the table. When you use count(*) alongside another column, you are counting the number of rows for each different value of that other column. So in this case you need to group the results, in order to show each value and its count only once.
group by 1 in this case refers to column_a which has the column position 1 in your query.
This why it works on your server. Indeed this is not a good practice in sql.
You should mention the column name because the column order may change in the table so it will be hard to maintain this code.
The best solution is:
select column_a, count(*) from my_schema.my_table group by column_a;

Group by or Distinct - But several fields

How can I use a Distinct or Group by statement on 1 field with a SELECT of All or at least several ones?
Example: Using SQL SERVER!
SELECT id_product,
description_fr,
DiffMAtrice,
id_mark,
id_type,
NbDiffMatrice,
nom_fr,
nouveaute
From C_Product_Tempo
And I want Distinct or Group By nom_fr
JUST GOT THE ANSWER:
select id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
from (
SELECT rn = row_number() over (partition by [nom_fr] order by id_mark)
, id_product, description_fr, DiffMAtrice, id_mark, id_type, NbDiffMatrice, nom_fr, nouveaute
From C_Product_Tempo
) d
where rn = 1
And this works prfectly!
If I'm understanding you correctly, you just want the first row per nom_fr. If so, you can simply use a subquery to get the lowest id_product per nom_fr, and just get the corresponding rows;
SELECT * FROM C_Product_Tempo WHERE id_product IN (
SELECT MIN(id_product) FROM C_Product_Tempo GROUP BY nom_fr
);
An SQLfiddle to test with.
You need to decide what to do with the other fields. For example, for numeric fields, do you want a sum? Average? Max? Min? For non-numeric fields to you want the values from a particular record if there are more than one with the same nom_fr?
Some SQL Systems allow you to get a "random" record when you do a GROUP BY, but SQL Server will not - you must define the proper aggregation for columns that are not in the GROUP BY.
GROUP BY is used to group in conjunction with an aggregate function (see http://www.w3schools.com/sql/sql_groupby.asp), so it's no use grouping without counting, summing up etc. DISTINCT eleminates duplicates but how that matches with the other columns you want to extract, I can't imagine, because some rows will be removed from the result.

difference between usage of having clause and where clause [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
SQL: What's the difference between HAVING and WHERE?
What is the difference between using having clause and where clause. Could any one explain in detail.
HAVING filters grouped elements,
WHERE filters ungrouped elements.
Example 1:
SELECT col1, col2 FROM table
WHERE col1 = #id
Example 2:
SELECT SUM(col1), col2 FROM table
GROUP BY col2
HAVING SUM(col1) > 10
Because the HAVING condition can only be applied in the second example AFTER the grouping has occurred, you could not rewrite it as a WHERE clause.
Example 3:
SELECT SUM(col1), col2 FROM table
WHERE col1 = #id
GROUP BY col2
HAVING SUM(col1) > 10
demonstrates how you might use both WHERE and HAVING together:
The table data is first filtered by col1 = #id
then the filtered data is grouped
then the grouped data is filtered again by SUM(col1) > 10
WHERE filters rows before they are grouped in GROUP BY clause
while HAVING filters the aggregate values after GROUP BY takes place
HAVING specifies a search for something used in the SELECT statement.
In other words.
HAVING applies to groups.
WHERE applies to rows.
Without a GROUP BY, there is no difference (but HAVING looks strange then)
With a GROUP BY
HAVING is for testing condition on the aggregate (MAX, SUM, COUNT etc)
HAVING column = 1 is the same as WHERE column = 1 (no aggregate on column )
WHERE COUNT(*) = 1 is not allowed.
HAVING COUNT(*) = 1 is allowed
Having is for use with an aggregate such as Sum. Where is for all other cases.
They specify a search condition for a group or an aggregate. But the difference is that HAVING can be used only with the SELECT statement. HAVING is typically used in a GROUP BY clause. When GROUP BY is not used, HAVING behaves like a WHERE clause. Having Clause is basically used only with the GROUP BY function in a query whereas WHERE Clause is applied to each row before they are part of the GROUP BY function in a query.
As other already said, having is used with group by. The reason is the order of execution - where is executed before group by, having is executed after it
Think of it as a matter of where the filtering happens.
When you specify a where clause you filter input rows to your aggregate function (ie: I only want to get the average age on persons living in a specific city.) When you specify a having constraint you specify that you only want a certain subset of the averages. (I only want to see cities with an average age of 70 years or above.)
Having is for aggregate functions, e.g.
SELECT *
FROM foo
GROUP BY baz
HAVING COUNT(*) > 8