update value in row based on value in another row - sql

I have a table with timestamped data.
t_stamp | col1 | col2 | col3
I want to update the value in say col3 based on values in col1 and col2 from the previous row in the table. By previous row I mean the row with the next lowest timestamp value. I also want to do this for every row in the table.
For example:
col3 = col1.prev + col2
Note: The operation here is only provided as an example. I want to calculate a value for col3 given a function of col1, col2 and/or previous values of either.
I was able to use a window function to create a SELECT query to give me the desired values for col3
SELECT lag(col1) OVER (ORDER BY t_stamp ASC) + col2 AS col3
FROM table1
but this does not update the values in the table. Can I somehow apply this to the original table? Or is there a way to format an update query in the same way?

You just need to use the FROM clause along with the query you already have:
UPDATE test set col3 = prev_col1 + prev_col2
FROM (
SELECT t_stamp,
lag(col1) OVER (ORDER BY t_stamp ASC) prev_col1,
lag(col2) OVER (ORDER BY t_stamp ASC) prev_col2
FROM test) prev
WHERE prev.t_stamp = test.t_stamp;

I think you want a cumulative sum:
SELECT (sum(col2 + co1) OVER (ORDER BY t_stamp ASC) - col1) AS col3
FROM table1

Create the first 3 columns and than use Lag to create the last col.
How to get previous row data in sql server

Use Sub queries.. will solve.. here is a sample
select col2+p_col1 from
(
SELECT col1, col2, lag(col1) OVER (ORDER BY t_stamp ASC) as p_col1
FROM table1
) t

Related

SQL DISTINCT based on a single column, but keep all columns as output

--mytable
col1 col2 col3
1 A red
2 A green
3 B purple
4 C blue
Let's call the table above mytable. I want to select only distinct values from col2:
SELECT DISTINCT
col2
FROM
mytable
When I do this the output looks like this, which is expected:
col2
A
B
C
but how do I perform the same type of query, yet keep all columns? The output would look like below. In essence I'm going through mytable looking at col2, and when there's multiple occurrences of col2 I'm only keeping the first row.
col1 col2 col3
1 A red
3 B purple
4 C blue
Do SQL functions (eg DISTINCT) have arguments I could set? I could imagine it to be something like KeepAllColumns = TRUE for this DISTINCT function? Or do I need to perform JOINs to get what I want?
You can use window functions, particularly row_number():
select t.*
from (select t.*, row_number() over (partition by col2 order by col2) as seqnum
from mytable t
) t
where seqnum = 1;
row_number() enumerates the rows, starting with "1". You can control whether you get the oldest, earliest, biggest, smallest . . .
You can use the QUALIFY clause in Teradata:
SELECT col1, col2, col3
FROM mytable
QUALIFY ROW_NUMBER() OVER(PARTITION BY col2 ORDER BY col2) = 1 -- Get 1st row per group
If you want to change the ordering for how to determine which col2 row to get, just change the expression in the ORDER BY.
With NOT EXISTS:
select m.* from mytable m
where not exists (
select 1 from mytable
where col2 = m.col2 and col1 < m.col1
)
This code will return the rows for which there is not another row with the same col2 and a smaller value in col1.

how to select min value from table if table has two unique values with rest of columns are identical

ex:Input
ID Col1 Col2 Col3
-- ---- ---- ----
1 a a sql
2 a a hive
Out put
ID Col1 Col2 Col3
-- ---- ---- ----
1 a a sql
Here my id value and Col3 values are unique but i need to filter on min id and populate all records.
I know below approach will work, but any best approach other than this please suggest
select Col1,Col2,min(ID) from table group by Col1,Col2;
and join this on ID,Col1,Col2
I think you want row_number():
select t.*
from (select t.*, row_number() over (partition by col1, col2 order by id) as seqnum
from t
) t
where seqnum = 1
It appears that Hive supports ROW_NUMBER. Though I’ve never used hive, other rdbms would use it like this to get the entire contents of the min row without needing to join (doesn’t suffer problems if there are repeated minimum values)
SELECT a.* FROM
(
SELECT *, ROW_NUMBER() OVER(ORDER BY id) rn FROM yourtable
) a
WHERE a.rn = 1
The inner query selects all the table data and establishes an incrementing counter in order of ID. It could be based on any column, the min ID (in this case) being row number 1. If you wanted the max, order by ID desc
If you want the number to restart for different values of another column (eg of ten of your Col3 were “sql” and twenty rows had “hive”) you an say PARTITION BY col3 ORDER BY id, and the row number will be a counter that increments for identical values of col3, restarting from 1 for each distinct value of col3

How to get min value from multiple columns for a row in SQL

I need to get to first (min) date from a set of 4 (or more) columns.
I tried
select min (col1, col2, col3) from tbl
which is obviouslly wrong.
let's say I have these 4 columns
col1 | col2 | col3 | col4
1/1/17 | 2/2/17 | | 3/3/17
... in this case what I want to get is the value in col1 (1/1/17). and Yes, these columns can include NULLs.
I am running this in dashDB
the columns are Date data type,
there is no ID nor Primary key column in this table,
and I need to do this for ALL rows in my query,
the columns are NOT in order. meaning that col1 does NOT have to be before col2 or it has to be null AND col2 does NOT have to be before col3 or it has to be NULL .. and so on
If your DB support least function, it is the best approach
select
least
(
nvl(col1,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col2,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col3,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col4,TO_DATE('2901-01-01','YYYY-MM-DD'))
)
from tbl
Edit: If all col(s) are null, then you can hardcode the output as null. The below query should work. I couldn't test it but this should work.
select
case when
least
(
nvl(col1,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col2,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col3,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col4,TO_DATE('2901-01-01','YYYY-MM-DD'))
)
= TO_DATE('2901-01-01','YYYY-MM-DD')
then null
else
least
(
nvl(col1,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col2,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col3,TO_DATE('2901-01-01','YYYY-MM-DD')),
nvl(col4,TO_DATE('2901-01-01','YYYY-MM-DD'))
)
end
as min_date
from tbl
If a id column in your table. Then
Query
select t.id, min(t.col) as min_col_value from(
select id, col1 as col from your_table
union all
select id, col2 as col from your_table
union all
select id, col3 as col from your_table
union all
select id, col4 as col from your_table
)t
group by t.id;
If you want the first date, then use coalesce():
select coalesce(col1, col2, col3, col4)
from t;
This returns the first non-NULL value (which is one way that I interpret the question). This will be the minimum date, if the dates are in order.
Select Id, CaseWhen (Col1 <= Col2 OR Col2 is null) And (Col1 <= Col3 OR Col3 is null) Then Col1 When (Col2 <= Col1 OR Col1 is null) And (Col2 <= Col3 OR Col3 is null) Then Col2 Else Col3 End As Min From YourTable
This is for 3 Column, Same way you can write for 4 - or more column.

Oracle Lag- determine if previous row columns are the same as current row

Is it possible using analytical functions in oracle (lag for instance) to check the previous row, and based on 2 columns, determine if these values are exactly the same as the current row. If they are, then output the letter 'Y', else 'N'.
Something like:
IF prev.col1 = curr.col1 and prev.col2 = curr.col2 THEN 'Y' ELSE 'N'
I want to use this Y or N to filter out these records in a Crystal Report that I am writing.
Yes, but as analytics are processed fairly late, you need write the query with the analytic as a subquery:
SELECT CASE WHEN col1 = prev_col1 AND col2 = prev_col2 THEN 'Y' ELSE 'N' END as yesno
FROM (
SELECT col1, col2,
LAG( col1, 1 ) OVER ( ORDER BY col1, col2 ) AS prev_col1
LAG( col2, 1 ) OVER ( ORDER BY col1, col2 ) AS prev_col2
FROM mytable
)
You'll need to adjust the ORDER BY clause depending on how you're defining the "previous" row. You may also want to add a PARTITION BY clause if you don't want to treat the whole table as a single group of rows.

Multiple rows match, but I only want one?

Sometimes I wish to perform a join whereby I take the largest value of one column. Doing this I have to perform a max() and a groupby- which prevents me from retrieving the other columns from the row which was the max (beause they were not contained in a GROUP BY or aggregate function).
To fix this, I join the max value back on the original data source, to get the other columns. However, my problem is that this sometimes returns more than one row.
So, so far I have something like:
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2
If the above query now returns three rows (which match the largest value for column2) I have a bit of a headache.
If there was an extra column- col3 and for the rows returned by the above query, I only wanted to return the one which was, say the minimum Col3 value- how would I do this?
If you are using SQL Server 2005+. Then you can do it like this:
CTE way
;WITH CTE
AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
)
SELECT
*
FROM
CTE
WHERE
CTE.RowNbr=1
Subquery way
SELECT
*
FROM
(
SELECT
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) AS RowNbr,
table.*
FROM
table
) AS T
WHERE
T.RowNbr=1
As I got it can be something like this
SELECT * FROM
(SELECT Col1, Max(Col2) FROM Table GROUP BY Col1) tab1
JOIN
(SELECT Col1, Col2 FROM Table) tab2
ON tab1.Col2 = tab2.Col2 and Col3 = (select min(Col3) from table )
Assuming you are using SQL-Server 2005 or later You can make use of Window functions here. I have chosen ROW_NUMBER() but it is not hte only option.
;WITH T AS
( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Col1 ORDER BY Col2 DESC) [RowNumber]
FROM Table
)
SELECT *
FROM T
WHERE RowNumber = 1
The PARTITION BY within the OVER clause is equivalent to your group by in your subquery, then your ORDER BY determines the order in which to start numbering the rows. In this case Col2 DESC to start with the highest value of col2 (Equivalent to your MAX statement).