GROUP BY and ORDER BY on different columns - sql

I want to execute a query on POSTGRESQL server whose structure is as below:
SELECT col1, SUM(col2) GROUP BY col1 ORDER BY colNotInSelect;
I have tried to include the colNotInSelect in the GROUP BY clause but since it is a column with a distinct value, it defeats the purpose of using GROUP BY in the first place.
Any help is appreciated.

You cannot order by that column because it potentially has many values for each value of col1.
However you can apply an aggregate function to the column, and order by that.
for example:
SELECT col1,
SUM(col2)
GROUP BY col1
ORDER BY MIN(colNotInSelect);

You question actually makes no sense because rows are grouped by col1 so there is no colNotInSelect in the grouped rows. Try to aggregate colNotInSelect before ordering, for example:
SELECT col1, SUM(col2), AVG(colNotInSelect) as col3 GROUP BY col1 ORDER BY col3;
If it isn't fit your need, maybe you should clarify what you're doing.

Related

SQL Server Query - How to append row showing total record count?

What is the best approach to append a row to a SQL Server query showing the total count of rows resulting from the query? UNION is one way, but seems very inefficient:
SELECT col1, col2 FROM tbl1
UNION ALL
SELECT STR(COUNT(col1)), NULL FROM tbl1
ROLLUP isn't an option because it requires GROUP BY, which we're not using for the queries in question.
You can use GROUPING SETS for this
SELECT
CASE WHEN GROUPING(col1) = 0 THEN col1 ELSE CAST(COUNT(*) AS varchar(30)) END AS col1,
col2
FROM tbl1
GROUP BY GROUPING SETS (
(col1, col2),
()
);
The GROUPING function will tell you whether the row is the Total row or not.
This does have the effect of grouping the columns which could be a different result and possibly less efficient. But if you include a unique/primary key as the first column in the grouping list then this shouldn't make a difference, and should be almost as performant as the original query.
You can also use a window function, which will return the total on each row as another column
SELECT
col1,
col2,
COUNT(*) OVER ()
FROM tbl1;

Pick duplicate record within grouping records?

I had an history table in which duplicate records are entered due to history_date_time column.
Because of some process or loading issue i get this duplicate record.
I used a query like
SELECT (col1, col2), COUNT(*)
FROM table_name
GROUP BY (col1, col2)
HAVING COUNT(*) >1;
I grouped a records based on col1 and col2 but the problem is
i may have different column with different records. I want to pick unique records within the grouping records by checking all columns
How can i achieve this... using oracle sql query.. i need query
Sorry i dont have a proper table structure right now.
Should be
SELECT col1, col2, COUNT(*)
FROM table_name
GROUP BY col1, col2
HAVING COUNT(*) > 1;
Alternatively, if it must be a "group" of columns, perhaps you meant something like this?
SELECT col1 ||'-'|| col2 col, COUNT(*)
FROM table_name
GROUP BY col1 ||'-'|| col2
HAVING COUNT(*) >1;
Sample data would help.

SQL Server Group by clause issues

Suppose I have a query
select id, sum(col1), col2, col3, ......... col10
from table
If I run this without group by clause it gives an error
Column 'dbo.table.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
If i use group by clause
select id, sum(col1), col2, col3, ......... col10
from table
group by col4
again the same error
Column 'dbo.table.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Until i haven't specified all those columns that hasn't implemented any aggregate function on it.
Now i cant apply an aggregate function on my all columns or i have to explicitly include all of my columns in group by clause
I'm not sure what you are trying to achieve.
As for your first query though, you could get the SUM without grouping the rows, if you use an analytic function:
select id,
sum(col1) over () as sum_col1, -- here you have the analytic function
col2,
col3,
......... col10
from table
This way, you still get all the rows in the table, but on each row you will have the sum of col1.
You could also have the sum over col4 (as for your second query), if you add a partition by clause to the analytic function:
select id,
sum(col1) over (partition by col4) as sum_col1,
col2,
col3,
......... col10
from table
You will still get the same number of rows, but the sum will be grouped by col4.
You can use join to get all Columns
Select id, col2,col3, ......... col10,sumcol1
From table t1 inner join
(
select sum(col1) as sumcol1, col4 as coln4 from table
group by col4
) t2
on
t1.col4 =t2.coln4

db2 select distinct rows, but select all columns

Experts, I have a single table with multiple columns. col1, col2, col3, col4, col5, col6
I need to select distinct (col4), but I need all other columns also on my output.
If I run, this ( select distinct(col4 ) from table1 ), then I get only col4 on my output.
May I know, how to do it on db2?.
Thank you
You simply do this...
Select * From Table1 Where col4 In (Select Distinct(col4) From Table1)
I'm not sure if you will be able to do this.
You might try to run group by on this column. You will be able to run some aggregate functions on other columns.
select count(col1), col4 from table1 group by (col4);
none of the answers worked for me so here is one that i got working. use group by on col4 while taking max values of other columns
select max(col1) as col1,max(col2) as col2,max(col3) as col3
, col4
from
table1
group by col4
At least in DB2, you can execute
SELECT
DISTINCT *
FROM
<YOUR TABLE>
Which will give you every distinct combination of your (in this case) 6 columns.
Otherwise, you'll have to specify what columns you want to include. If you do that, you can either use select distinct or group by.

Adding a constant value column in the group by clause

Netezza sql is giving error on this query:Cause: Invalid column name 'dummy'.
select col1,col2, '' as dummy, max(col3) from table1 group by col1,col2,dummy
If i remove the dummy from the group by clause, it works fine. But as per sql syntax, I am supposed to include all non aggregate columns in group by.
why do you need it in your group by, you can use an aggregate function and its result would always be right because the value is constant for example:
select col1,col2, min(' ') as dummy, max(col3) from table1 group by col1,col2
"dummy" is a static column (not in the table), so it does not need to be in the group by because it is an external column.
SELECT col1,
col2,
cast(5 as int) AS [dummy],
max(col3)
FROM test_1
GROUP BY col1,
col2,
col3,
'dummy'
The code produces an outer reference error # 164.
Take a look at these links
http://www.sql-server-helper.com/error-messages/msg-164.aspx
http://www.sql-server-helper.com/error-messages/msg-1-500.aspx
It's due to the order of operations...
FROM
JOIN
WHERE
GROUP BY
...
SELECT
When using group by, only the fields remaining from the previous step are available. Since you are not declaring your "Dummy" column until the Select statement the group by doesn't know it exists and therefore doesn't need to account for it.
Going by Basics..GROUP BY operations are something executed ,after the JOIN operations underneath(File IOs).. And then only the SELECTED resultet would be available.
Now that, you specified something as Dummy in SELECT, and the database would not know it, Because While GROUPing it is not available at the TABLE level.!
Try your query using GROUP BY your_column, ' ' it would work.. Because you have mentioned it directly instead referring an alias!
Finally, when a GROUP by is used.. You can specify any constants in SELECT or GROUP BY.. because they are afterall included in your SELECTed result, without a TABLE operation involved. So the database excuses them.
To resolve the issue, group it at an outer layer:
SELSE X.col1, X.col2, X.dummy, max(col3)
FROM (
SELECT col1,
col2,
cast(5 as int) AS [dummy],
col3
FROM test_1
)
GROUP BY X.col1,
X.col2,
X.dummy