Adding a constant value column in the group by clause - sql

Netezza sql is giving error on this query:Cause: Invalid column name 'dummy'.
select col1,col2, '' as dummy, max(col3) from table1 group by col1,col2,dummy
If i remove the dummy from the group by clause, it works fine. But as per sql syntax, I am supposed to include all non aggregate columns in group by.

why do you need it in your group by, you can use an aggregate function and its result would always be right because the value is constant for example:
select col1,col2, min(' ') as dummy, max(col3) from table1 group by col1,col2

"dummy" is a static column (not in the table), so it does not need to be in the group by because it is an external column.
SELECT col1,
col2,
cast(5 as int) AS [dummy],
max(col3)
FROM test_1
GROUP BY col1,
col2,
col3,
'dummy'
The code produces an outer reference error # 164.
Take a look at these links
http://www.sql-server-helper.com/error-messages/msg-164.aspx
http://www.sql-server-helper.com/error-messages/msg-1-500.aspx

It's due to the order of operations...
FROM
JOIN
WHERE
GROUP BY
...
SELECT
When using group by, only the fields remaining from the previous step are available. Since you are not declaring your "Dummy" column until the Select statement the group by doesn't know it exists and therefore doesn't need to account for it.

Going by Basics..GROUP BY operations are something executed ,after the JOIN operations underneath(File IOs).. And then only the SELECTED resultet would be available.
Now that, you specified something as Dummy in SELECT, and the database would not know it, Because While GROUPing it is not available at the TABLE level.!
Try your query using GROUP BY your_column, ' ' it would work.. Because you have mentioned it directly instead referring an alias!
Finally, when a GROUP by is used.. You can specify any constants in SELECT or GROUP BY.. because they are afterall included in your SELECTed result, without a TABLE operation involved. So the database excuses them.

To resolve the issue, group it at an outer layer:
SELSE X.col1, X.col2, X.dummy, max(col3)
FROM (
SELECT col1,
col2,
cast(5 as int) AS [dummy],
col3
FROM test_1
)
GROUP BY X.col1,
X.col2,
X.dummy

Related

In SQL Server, how to concat --> group by using the concat column

SELECT
'dbo.our_table' as table_name,
CONCAT(col1, '-', col2, '-', col3) as table_id,
COUNT(*) as ct
FROM dbo.our_table
group by table_name, table_id
-- group by 1, 2 -- this doesn't work either...
order by ct desc
This does not work in SQL server because it does not recognize table_name or table_id. I understand that this can be done by nesting a 2nd SELECT clause into the FROM, so that table_name and table_id are explicitly available, however I am trying to understand if it is possible to achieve this output without having to create a nested SELECT statement, but rather by keeping the structure of my current query but only making a tweak.
Thanks
As mentioned in the comments, you need to put your 3 columns (col1, col2 & col3) into the GROUP BY. Unlike these 3 columns, dbo.our_table is not needed in the GROUP BY as it is a string.
SQL Server executes the components of SELECT queries in a particular order starting with FROM, then WHERE, GROUP BY, etc. In this case, SQL Server doesn't recognize the aliases table_name & table_id in the GROUP BY because they are not set until the SELECT, which is executed after the GROUP BY.
Googling "SQL Server SELECT query execution order" should give you a number of resources which will explain the order of execution in more detail.
You need to specify the full calculation for the GROUP BY as well as for the SELECT. This is because GROUP BY is logically considered before SELECT, so cannot access those calculations.
You could do it like this (table_name is not necessary because it's purely computed):
SELECT
'dbo.our_table' as table_name,
CONCAT(col1, '-', col2, '-', col3) as table_id,
COUNT(*) as ct
FROM dbo.our_table
group by CONCAT(col1, '-', col2, '-', col3)
order by ct desc;
But much better is to place calculations in a CROSS APPLY, this means it is accessible later by name as you wished:
SELECT
'dbo.our_table' as table_name,
v.table_id,
COUNT(*) as ct
FROM dbo.our_table
CROSS APPLY (VALUES (CONCAT(col1, '-', col2, '-', col3) ) ) as v(table_id)
group by v.table_id
order by ct desc;

SQL Server Group by clause issues

Suppose I have a query
select id, sum(col1), col2, col3, ......... col10
from table
If I run this without group by clause it gives an error
Column 'dbo.table.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
If i use group by clause
select id, sum(col1), col2, col3, ......... col10
from table
group by col4
again the same error
Column 'dbo.table.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Until i haven't specified all those columns that hasn't implemented any aggregate function on it.
Now i cant apply an aggregate function on my all columns or i have to explicitly include all of my columns in group by clause
I'm not sure what you are trying to achieve.
As for your first query though, you could get the SUM without grouping the rows, if you use an analytic function:
select id,
sum(col1) over () as sum_col1, -- here you have the analytic function
col2,
col3,
......... col10
from table
This way, you still get all the rows in the table, but on each row you will have the sum of col1.
You could also have the sum over col4 (as for your second query), if you add a partition by clause to the analytic function:
select id,
sum(col1) over (partition by col4) as sum_col1,
col2,
col3,
......... col10
from table
You will still get the same number of rows, but the sum will be grouped by col4.
You can use join to get all Columns
Select id, col2,col3, ......... col10,sumcol1
From table t1 inner join
(
select sum(col1) as sumcol1, col4 as coln4 from table
group by col4
) t2
on
t1.col4 =t2.coln4

Perform Union/OR Operation between where clause and having Clause

I am working on implementation for a SQL which should display results with Union operation between Where and Having Clause.
For example,
Select * from table where col1= 'get' group by col2 (OR/UNION) having avg(col3) >30 . This is not valid but trying to give use a case
The purpose of the sql statement is to return result set which satisfies both where and having conditions.
Lets say I have a table1, has with col1, col2, col3, col4 and large data in the table. Now, There is a use case in which user wants to see results when selects filters with specific crtieria col1 ='Y', avg(col2) >10, avg(col3*col4) =30 in filters list. Now, I have to create a criteria, such that, I should return all results which satisfies col1 ='Y' OR avg(col2) >10 OR avg(col3*col4) =30 , like we do in where clause with OR operator but here we have both where clause and having clause –
Like, the below query
resultset1 <= select * from table1 where col1= 'get';
resultset2 <= select * from table1 group by col2 having avg(col3) >30
final results = resultset1+ resultset2
Do any one have better approach or ideas in implementing such scenario?
Lets say I have filters combinations as below
col1 =23
OR
avg(col2) >30
AND
avg(col3) =10
OR
avg(col1) <10
AND
col2 =10
I need to display results satisfying these criteria in SQL
It's not clear what do you want from this quasi SQL. I guess you need to select records with two conditions col1= 'get' AND /OR ? having avg(col3) >30. So here is the solution:
Select * from table
where (col1= 'get')
OR
col2 in (SELECT col2 FROM table GROUP BY col2 HAVING avg(col3) >30)
If you need both conditions where true then replace OR with AND.
If you need to count AVG only for col1 = 'get' then add this condition into the subquery:
Select * from table
where (col1= 'get')
OR
col2 in (SELECT col2 FROM table WHERE (col1= 'get')
GROUP BY col2
HAVING avg(col3) >30)
SELECT <resultset1> --resultset based on a WHERE clause
UNION
SELECT <resultset2> --resultset based on HAVING
In general, if you want a union of resultsets, use ... UNION.
Using OR in a condition is equivalent to UNION (because the UNION operator is the relational algebra equivalent of logical disjunction), but it requires the scope of the involved conditions to be identical.
In this case, this is impossible because a HAVING condition applies not to the table mentioned in the SELECT, but instead to an intermediate table that is "silently" created by the GROUP clause. This is inevitably so because things like AVG,SUM,... only make sense if it is also determined which set of rows must be used to compute the AVG,SUM,... over, and that is what the GROUP BY specification does.
EDIT
In SQL, UNION comes in distinct flavours, UNION DISTINCT and UNION ALL. One eliminates duplicates, the other won't. If you want the exact same behaviour as OR, you'll obviously need the one that eliminates duplicates from its result set.

db2 select distinct rows, but select all columns

Experts, I have a single table with multiple columns. col1, col2, col3, col4, col5, col6
I need to select distinct (col4), but I need all other columns also on my output.
If I run, this ( select distinct(col4 ) from table1 ), then I get only col4 on my output.
May I know, how to do it on db2?.
Thank you
You simply do this...
Select * From Table1 Where col4 In (Select Distinct(col4) From Table1)
I'm not sure if you will be able to do this.
You might try to run group by on this column. You will be able to run some aggregate functions on other columns.
select count(col1), col4 from table1 group by (col4);
none of the answers worked for me so here is one that i got working. use group by on col4 while taking max values of other columns
select max(col1) as col1,max(col2) as col2,max(col3) as col3
, col4
from
table1
group by col4
At least in DB2, you can execute
SELECT
DISTINCT *
FROM
<YOUR TABLE>
Which will give you every distinct combination of your (in this case) 6 columns.
Otherwise, you'll have to specify what columns you want to include. If you do that, you can either use select distinct or group by.

GROUP BY and ORDER BY on different columns

I want to execute a query on POSTGRESQL server whose structure is as below:
SELECT col1, SUM(col2) GROUP BY col1 ORDER BY colNotInSelect;
I have tried to include the colNotInSelect in the GROUP BY clause but since it is a column with a distinct value, it defeats the purpose of using GROUP BY in the first place.
Any help is appreciated.
You cannot order by that column because it potentially has many values for each value of col1.
However you can apply an aggregate function to the column, and order by that.
for example:
SELECT col1,
SUM(col2)
GROUP BY col1
ORDER BY MIN(colNotInSelect);
You question actually makes no sense because rows are grouped by col1 so there is no colNotInSelect in the grouped rows. Try to aggregate colNotInSelect before ordering, for example:
SELECT col1, SUM(col2), AVG(colNotInSelect) as col3 GROUP BY col1 ORDER BY col3;
If it isn't fit your need, maybe you should clarify what you're doing.