SQL: windowed aggregate functions for DB2 iSeries - sql

Is count not a valid aggregation function for row partitions for SQL DB2 on the iSeries?
This query works:
select ROW_NUMBER() over (partition by COL1, COL2 order by COL3 asc)
from MyTable
And this query gives a syntax error:
select COUNT(1) over (partition by COL1, COL2)
from MyTable
The error message is pointing at the parenthesis before the word partition:
[Message SQL0401] Token ( is not a valid token. A partial list of valid tokens is , FROM INTO.
I'm aware I can rewrite the query to avoid the row partition, but I'd like to know why this isn't working.

No, COUNT() is not the same type of function as ROW_NUMBER().
If you want the number of rows per (col1,col2) then you could simply use
select COL1, COL2, count(*)
from MyTable
group by col1, col2

Related

SQL filtering for table based on 3rd column's value

How should my SQL be so that the table mentioned above should have the expected output using SQL. The filtering happens using the col 3 value. Row with the highest col 3 value for a particular col 1 value is selected in the output.
You can use ROW_NUMBER() window function
SELECT col1, col2, col3 from (
SELECT row_number() over (partition by col1 order by col3 desc) sn, * from your_table_name
) a WHERE sn=1;
I assume you require the usage of the WHERE clause in SQL which acts as a method to filter results based on a conditional.
Considering you want all entries rather than any given entry one that matches, you can use the following (written in MySQL) which uses a nested SELECT statement:
SELECT *
FROM Table
WHERE col3=(SELECT MAX(col3) FROM Table);

Distinct Values but get all the columns

In SSMS we have a view, which is returning distinct columns by the following command,
Create View [VIEWNAME] As
`Select distinct [Col1],[Col2], Max(TimeDate) as TimeDate
from [Table]
Group By [Col1],[Col2]`
I want the column [Col3] as well from the table in view.
I tried the following so far but unfortunately, it didn't work for me
Select Distint on [Col1],[Col2] * from [Table]
Error: Incorrect syntax near 'on'.
Also,
select [Col1],[Col2],[Col3],Max[TimeDate] from [Table]
Group by [Col1],[Col2],[Col3],[TimeDate]
Error: Column TABLE.Col3 is invalid in the select list because it is not contained in either an aggregated function or GROUP BY clause.
Below is the original table sample.
enter image description here
Desired table image
enter image description here
Thanks.
You can use distinct on in Postgresql.
select distinct on (col1, col2) *
from the_table
order by col1, col2, timedate desc;
I guess that your RDBMS is SQL Server. The query below uses the SQL server equivalent of Postgresql distinct on ().
select col1, col2, col3, timedate
from
(
select *, row_number() over (partition by col1,col2 order by timedate desc) as rn
from the_table
) as t
where rn = 1;

BigQuery: Use COUNT as LIMIT

I want to select everything from mytable1 and combine that with just as many rows from mytable2. In my case mytable1 always has fewer rows than mytable2 and I want the final table to be a 50-50 mix of data from each table. While I feel like the following code expresses what I want logically, it doesn't work syntax wise:
Syntax error: Expected "#" or integer literal or keyword CAST but got
"(" at [3:1]
(SELECT * FROM `mytable1`)
UNION ALL (
SELECT * FROM `mytable2`
LIMIT (SELECT COUNT(*) FROM`mytable1`)
)
Using standard SQL in bigquery
The docs state that LIMIT clause accept only literal or parameter values. I think you can ROW_NUMBER() the rows from second table and limit based on that:
SELECT col1, col2, col3
FROM mytable1
UNION ALL
SELECT col1, col2, col3
FROM (
SELECT col1, col2, col3, ROW_NUMBER() OVER () AS rn
FROM mytable2
) AS x
WHERE x.rn <= (SELECT COUNT(*) FROM mytable1)
Each SELECT statement within UNION must have the same number of
columns
The columns must also have similar data types
The columns in each SELECT statement must also be in the same order
As your mytable1 always less column than mytable2 so you have to put same number of column by selection
select col1,col2,col3,'' as col4 from mytable1 --in case less column you can use alias
union all
select col1,col2,col3,col4 from mytable2

SQL Server Group by clause issues

Suppose I have a query
select id, sum(col1), col2, col3, ......... col10
from table
If I run this without group by clause it gives an error
Column 'dbo.table.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
If i use group by clause
select id, sum(col1), col2, col3, ......... col10
from table
group by col4
again the same error
Column 'dbo.table.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Until i haven't specified all those columns that hasn't implemented any aggregate function on it.
Now i cant apply an aggregate function on my all columns or i have to explicitly include all of my columns in group by clause
I'm not sure what you are trying to achieve.
As for your first query though, you could get the SUM without grouping the rows, if you use an analytic function:
select id,
sum(col1) over () as sum_col1, -- here you have the analytic function
col2,
col3,
......... col10
from table
This way, you still get all the rows in the table, but on each row you will have the sum of col1.
You could also have the sum over col4 (as for your second query), if you add a partition by clause to the analytic function:
select id,
sum(col1) over (partition by col4) as sum_col1,
col2,
col3,
......... col10
from table
You will still get the same number of rows, but the sum will be grouped by col4.
You can use join to get all Columns
Select id, col2,col3, ......... col10,sumcol1
From table t1 inner join
(
select sum(col1) as sumcol1, col4 as coln4 from table
group by col4
) t2
on
t1.col4 =t2.coln4

Multiple rows match, but I only want one?

Sometimes I wish to perform a join whereby I take the largest value of one column. Doing this I have to perform a max() and a groupby- which prevents me from retrieving the other columns from the row which was the max (beause they were not contained in a GROUP BY or aggregate function).
To fix this, I join the max value back on the original data source, to get the other columns. However, my problem is that this sometimes returns more than one row.
So, so far I have something like:
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2
If the above query now returns three rows (which match the largest value for column2) I have a bit of a headache.
If there was an extra column- col3 and for the rows returned by the above query, I only wanted to return the one which was, say the minimum Col3 value- how would I do this?
If you are using SQL Server 2005+. Then you can do it like this:
CTE way
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr=1
Subquery way
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
) AS T
WHERE
T.RowNbr=1
As I got it can be something like this
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2 and Col3 = (select min(Col3) from table )
Assuming you are using SQL-Server 2005 or later You can make use of Window functions here. I have chosen ROW_NUMBER() but it is not hte only option.
;WITH T AS
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) [RowNumber]
FROM Table
)
SELECT *
FROM T
WHERE RowNumber = 1
The PARTITION BY within the OVER clause is equivalent to your group by in your subquery, then your ORDER BY determines the order in which to start numbering the rows. In this case Col2 DESC to start with the highest value of col2 (Equivalent to your MAX statement).