Numbering series of data in SQL - sql

i have a little problem with SQL SELECT. I want to number continous groups of the same value in column nr 2:
1,'a'
2,'a,
3,'b'
4,'c'
5,'a'
6,'a'
7,'e'
8,'e'
The output i want :
1,'a',1
2,'a,,1
3,'b',2
4,'c',3
5,'a',4
6,'a',4
7,'e',5
8,'e',5
Is it possible to do it with just a select? I must do it in Vertica's SQL, its not supporting operations on variables in select, so i cant just declare a variable before and increment it somehow.

You could use CONDITIONAL_CHANGE_EVENT() which is pretty simple. Basically you send in the column that you want to trigger the sequence increment as a parameter, and you order it the way you need it in the window. It's a Vertica analytic function.
SELECT col1,
col2,
CONDITIONAL_CHANGE_EVENT(col2) OVER ( ORDER BY col1 )
FROM mytable

You can do this with window functions. One method uses lag() and then does a cumulative sum of when the value changes:
select t.col1, t.col2,
sum(case when col2 = prev_col2 then 0 else 1 end) over (order by col1) as newcol
from (select t.*,
lag(col2) over (order by col1) as prev_col2
from t
) t

Related

SQL filtering for table based on 3rd column's value

How should my SQL be so that the table mentioned above should have the expected output using SQL. The filtering happens using the col 3 value. Row with the highest col 3 value for a particular col 1 value is selected in the output.
You can use ROW_NUMBER() window function
SELECT col1, col2, col3 from (
SELECT row_number() over (partition by col1 order by col3 desc) sn, * from your_table_name
) a WHERE sn=1;
I assume you require the usage of the WHERE clause in SQL which acts as a method to filter results based on a conditional.
Considering you want all entries rather than any given entry one that matches, you can use the following (written in MySQL) which uses a nested SELECT statement:
SELECT *
FROM Table
WHERE col3=(SELECT MAX(col3) FROM Table);

How to express "either the single resulting record or NULL", without an inner-query LIMIT?

Consider the following query:
SELECT (SELECT MIN(col1) FROM table1) = 7;
Assuming col1 is non-NULLable, this will yield either true or false - or possibly NULL when table1 is empty;
But now suppose I have:
SELECT (
SELECT
FIRST_VALUE (col2) OVER (
ORDER BY col1
) AS col2_for_first_col1
FROM table1
) = 7;
(and assume col2 is also non-NULLable for simplicity.)
If there is a unique col2 value for the lowest col1 value, or the table is empty, then this works just like before. But if there are multiple col2 values for the lowest col1, I'm going to get a query runtime error.
My question: What is a short, elegant way to get NULL from this last query also in the case of multiple inner-query results? I could of course duplicate it and check the count, but I would rather avoid that.
Important caveat: I'm using MonetDB, and it doesn't seem to support ORDER BY ... LIMIT 1 on inner queries.
Without the MonetDB limitation, you would seem to want:
SELECT (SELECT col2
FROM table1
ORDER BY col1
LIMIT 1
) = 7;
with the limitation, you can use window functions differently:
SELECT (SELECT col2
FROM (SELECT col2, ROW_NUMBER() OVER (ORDER BY col1) as seqnum
FROM table1
) t
WHERE seqnum = 1
) = 7;

SQL: windowed aggregate functions for DB2 iSeries

Is count not a valid aggregation function for row partitions for SQL DB2 on the iSeries?
This query works:
select ROW_NUMBER() over (partition by COL1, COL2 order by COL3 asc)
from MyTable
And this query gives a syntax error:
select COUNT(1) over (partition by COL1, COL2)
from MyTable
The error message is pointing at the parenthesis before the word partition:
[Message SQL0401] Token ( is not a valid token. A partial list of valid tokens is , FROM INTO.
I'm aware I can rewrite the query to avoid the row partition, but I'd like to know why this isn't working.
No, COUNT() is not the same type of function as ROW_NUMBER().
If you want the number of rows per (col1,col2) then you could simply use
select COL1, COL2, count(*)
from MyTable
group by col1, col2

Oracle Lag- determine if previous row columns are the same as current row

Is it possible using analytical functions in oracle (lag for instance) to check the previous row, and based on 2 columns, determine if these values are exactly the same as the current row. If they are, then output the letter 'Y', else 'N'.
Something like:
IF prev.col1 = curr.col1 and prev.col2 = curr.col2 THEN 'Y' ELSE 'N'
I want to use this Y or N to filter out these records in a Crystal Report that I am writing.
Yes, but as analytics are processed fairly late, you need write the query with the analytic as a subquery:
SELECT CASE WHEN col1 = prev_col1 AND col2 = prev_col2 THEN 'Y' ELSE 'N' END as yesno
FROM (
SELECT col1, col2,
LAG( col1, 1 ) OVER ( ORDER BY col1, col2 ) AS prev_col1
LAG( col2, 1 ) OVER ( ORDER BY col1, col2 ) AS prev_col2
FROM mytable
)
You'll need to adjust the ORDER BY clause depending on how you're defining the "previous" row. You may also want to add a PARTITION BY clause if you don't want to treat the whole table as a single group of rows.

Multiple rows match, but I only want one?

Sometimes I wish to perform a join whereby I take the largest value of one column. Doing this I have to perform a max() and a groupby- which prevents me from retrieving the other columns from the row which was the max (beause they were not contained in a GROUP BY or aggregate function).
To fix this, I join the max value back on the original data source, to get the other columns. However, my problem is that this sometimes returns more than one row.
So, so far I have something like:
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2
If the above query now returns three rows (which match the largest value for column2) I have a bit of a headache.
If there was an extra column- col3 and for the rows returned by the above query, I only wanted to return the one which was, say the minimum Col3 value- how would I do this?
If you are using SQL Server 2005+. Then you can do it like this:
CTE way
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr=1
Subquery way
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
) AS T
WHERE
T.RowNbr=1
As I got it can be something like this
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2 and Col3 = (select min(Col3) from table )
Assuming you are using SQL-Server 2005 or later You can make use of Window functions here. I have chosen ROW_NUMBER() but it is not hte only option.
;WITH T AS
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) [RowNumber]
FROM Table
)
SELECT *
FROM T
WHERE RowNumber = 1
The PARTITION BY within the OVER clause is equivalent to your group by in your subquery, then your ORDER BY determines the order in which to start numbering the rows. In this case Col2 DESC to start with the highest value of col2 (Equivalent to your MAX statement).