SQL Server Group by clause issues - sql

Suppose I have a query
select id, sum(col1), col2, col3, ......... col10
from table
If I run this without group by clause it gives an error
Column 'dbo.table.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
If i use group by clause
select id, sum(col1), col2, col3, ......... col10
from table
group by col4
again the same error
Column 'dbo.table.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Until i haven't specified all those columns that hasn't implemented any aggregate function on it.
Now i cant apply an aggregate function on my all columns or i have to explicitly include all of my columns in group by clause

I'm not sure what you are trying to achieve.
As for your first query though, you could get the SUM without grouping the rows, if you use an analytic function:
select id,
sum(col1) over () as sum_col1, -- here you have the analytic function
col2,
col3,
......... col10
from table
This way, you still get all the rows in the table, but on each row you will have the sum of col1.
You could also have the sum over col4 (as for your second query), if you add a partition by clause to the analytic function:
select id,
sum(col1) over (partition by col4) as sum_col1,
col2,
col3,
......... col10
from table
You will still get the same number of rows, but the sum will be grouped by col4.

You can use join to get all Columns
Select id, col2,col3, ......... col10,sumcol1
From table t1 inner join
(
select sum(col1) as sumcol1, col4 as coln4 from table
group by col4
) t2
on
t1.col4 =t2.coln4

Related

Select multiple columns but distinct only one in SQL?

Lets say I have a table called TABLE with the columns col1, col2, col3 and col4
I want to select col1, col2 and col3 but distinct col2 values from the others, but I can't do it.
I tried something like this:
SELECT DISTINCT "col1", "col2", "col3" FROM [Table] WHERE col1 = Values
But the output brings me more than one record of col 2 with the same value.
I know that is because the distinct filtered all the columns that i specified, but i don't know how to get all the columns and filter only the values of col2.
Is it possible to SELECT more than 1 column but filter only one of them with SELECT DISTINCT ?
As you said, distinct just limits the full set of columns to eliminate duplicates. Instead, I'd just use an aggregate function with a GROUP BY statement.
SELECT MAX(col1) AS col1, col2,
MAX(col3) AS col3
FROM tbl
GROUP BY col2
That will take the top value alphanumerically from the supplied columns. Or, to list all values separated by commas:
SELECT STRING_AGG(col1,',') AS col1, col2,
STRING_AGG(col3,',') AS col3
FROM tbl
GROUP BY col2

De-duplicating rows in a table with respect to certain columns and retaining the corresponding values in the other columns in HIVE

I need to create a temporary table in HIVE using an existing table that has 7 columns. I just want to get rid of duplicates with respect to first three columns and also retain the corresponding values in the other 4 columns. I don't care which row is actually dropped while de-duplicating using first three rows alone.
You could use something as below if you are not considered about ordering
create table table2 as
select col1, col2, col3,
,split(agg_col,"|")[0] as col4
,split(agg_col,"|")[1] as col5
,split(agg_col,"|")[2] as col6
,split(agg_col,"|")[3] as col7
from (Select col1, col2, col3,
max(concat(cast(col4 as string),"|",
cast(col5 as string),"|",
cast(col6 as string),"|",
cast(col7 as string))) as agg_col
from table1
group by col1,col2,col3 ) A;
Below is another approach, which gives much control over ordering but slower than above approach
create table table2 as
select col1, col2, col3,max(col4), max(col5), max(col6), max(col7)
from (Select col1, col2, col3,col4, col5, col6, col7,
rank() over ( partition by col1, col2, col3
order by col4 desc, col5 desc, col6 desc, col7 desc ) as col_rank
from table1 ) A
where A.col_rank = 1
GROUP BY col1, col2, col3;
rank() over(..) function returns more than one column with rank as '1' if order by columns are all equal. In our case if there are 2 columns with exact same values for all seven columns then there will be duplicates when we use filter as col_rank =1. These duplicates can be eleminated using max and group by clauses as written in above query.

db2 select distinct rows, but select all columns

Experts, I have a single table with multiple columns. col1, col2, col3, col4, col5, col6
I need to select distinct (col4), but I need all other columns also on my output.
If I run, this ( select distinct(col4 ) from table1 ), then I get only col4 on my output.
May I know, how to do it on db2?.
Thank you
You simply do this...
Select * From Table1 Where col4 In (Select Distinct(col4) From Table1)
I'm not sure if you will be able to do this.
You might try to run group by on this column. You will be able to run some aggregate functions on other columns.
select count(col1), col4 from table1 group by (col4);
none of the answers worked for me so here is one that i got working. use group by on col4 while taking max values of other columns
select max(col1) as col1,max(col2) as col2,max(col3) as col3
, col4
from
table1
group by col4
At least in DB2, you can execute
SELECT
DISTINCT *
FROM
<YOUR TABLE>
Which will give you every distinct combination of your (in this case) 6 columns.
Otherwise, you'll have to specify what columns you want to include. If you do that, you can either use select distinct or group by.

SQL: windowed aggregate functions for DB2 iSeries

Is count not a valid aggregation function for row partitions for SQL DB2 on the iSeries?
This query works:
select ROW_NUMBER() over (partition by COL1, COL2 order by COL3 asc)
from MyTable
And this query gives a syntax error:
select COUNT(1) over (partition by COL1, COL2)
from MyTable
The error message is pointing at the parenthesis before the word partition:
[Message SQL0401] Token ( is not a valid token. A partial list of valid tokens is , FROM INTO.
I'm aware I can rewrite the query to avoid the row partition, but I'd like to know why this isn't working.
No, COUNT() is not the same type of function as ROW_NUMBER().
If you want the number of rows per (col1,col2) then you could simply use
select COL1, COL2, count(*)
from MyTable
group by col1, col2

GROUP BY and ORDER BY on different columns

I want to execute a query on POSTGRESQL server whose structure is as below:
SELECT col1, SUM(col2) GROUP BY col1 ORDER BY colNotInSelect;
I have tried to include the colNotInSelect in the GROUP BY clause but since it is a column with a distinct value, it defeats the purpose of using GROUP BY in the first place.
Any help is appreciated.
You cannot order by that column because it potentially has many values for each value of col1.
However you can apply an aggregate function to the column, and order by that.
for example:
SELECT col1,
SUM(col2)
GROUP BY col1
ORDER BY MIN(colNotInSelect);
You question actually makes no sense because rows are grouped by col1 so there is no colNotInSelect in the grouped rows. Try to aggregate colNotInSelect before ordering, for example:
SELECT col1, SUM(col2), AVG(colNotInSelect) as col3 GROUP BY col1 ORDER BY col3;
If it isn't fit your need, maybe you should clarify what you're doing.