Partition date on a new timestamp to get previous timestamp - sql

I have a table which looks like:
col1 | col2 | col3 | timestamp
----------------------------------------
1 | 2 | 3 | 2020-1-16 16:11:10
----------------------------------------
1 | 2 | 3 | 2020-1-16 16:13:20
----------------------------------------
1 | 2 | 3 | 2020-1-24 09:29:24
I want to create another column which gives me the previous date but partitions it by day. Also if it has no previous date, then it should return the same date. It should look like this:
col1 | col2 | col3 | timestamp | prev_timestamp
------------------------------------------------------------
1 | 2 | 3 | 2020-1-16 16:11:10 | 2020-1-16 16:11:10
------------------------------------------------------------
1 | 2 | 3 | 2020-1-16 16:13:20 | 2020-1-16 16:11:10
-------------------------------------------------------------
1 | 2 | 3 | 2020-1-24 9:29:24 | 2020-1-24 09:29:24
I know i can use lag and partition by but then for the timestamp 2020-1-24 9:29:24 it gives me a previous timestamp of 2020-1-16 16:13:20 which i do not want.

You can do what you want just with lag():
select t.*,
lag(timestamp, 1, timestamp) over (partition by col1, col2, col3, date(timestamp)
order by timestamp
) as prev_timestamp
from t;
No conditional logic or subquery is necessary. You simply want to get the previous timestamp on the same day and lag() does that.

You can use lag() and conditional logic:
select
col1,
col2,
col3,
case when date_trunc(prev_timestamp) = date_trunc(timestamp)
then prev_timestamp
else timestamp
end prev_timestamp
from (
select
t.*,
lag(timestamp) over(partition by col1, col2, col3 order by timestamp) prev_timestamp
from mytable t
) t
You can remove the nested query by repeating the lag() expression like so:
select
t.*,
case when date_trunc(lag(timestamp) over(partition by col1, col2, col3 order by timestamp)) = date_trunc(timestamp)
then lag(timestamp) over(partition by col1, col2, col3 order by timestamp)
else timestamp
end prev_timestamp
from mytable t

Related

Using CTE to calculate cumulative sum

I want to write a non-recursive common table expression (CTE) in postgres to calculate a cumulative sum, here's an example, input table:
----------------------
1 | A | 0 | -1
1 | B | 3 | 1
2 | A | 1 | 0
2 | B | 3 | 2
An output should look like this:
----------------------
1 | A | 0 | -1
1 | B | 3 | 1
2 | A | 1 | -1
2 | B | 6 | 3
As you can see the cumulative sum of columns 3 and 4 are calculated, this is easy to do using a recursive CTE, but how is it done with a non-recursive one?
Use window functions. Assuming that your table has columns col1, col2, col3 and col4, that would be:
select
t.*,
sum(col3) over(partition by col2 order by col1) col3,
sum(col4) over(partition by col2 order by col1) col4
from mytable t
You would use a window function for a cumulative sum. I don't see what the sum is in your example, but the syntax is something like:
select t.*, sum(x) over (order by y) as cumulative_sum
from t;
For your example, this would seem to be:
select t.*,
sum(col3) over (partition by col2 order by col1) as new_col3,
sum(col4) over (partition by col2 order by col1) as new_col4
from t;

Select most recent rows - last 24 hours

I have a table that looks like this:
col1 | col2 | col3 | t_insert
---------------------------------
1 | z | |2018-04-25 17:23:46.686816+10
1 | zy | |2018-04-26 18:53:46.686816+10
2 | f | |2018-04-26 19:23:46.686816+10
3 | g | |2018-04-27 17:23:46.686816+10
2 | z | |2018-04-27 18:23:46.686816+10
4 | z | |2018-04-27 20:13:46.686816+10
Where there are duplicate values in col1 I want to select by most recent timestamp and create a new column (col4) and insert the string 'update'.
Where there are not duplicate values in col1 I want to select the value and insert the string 'new' into col4.
Also I only want to select rows that have a timestamp from the last 24 hours.
The expected result: (This result dosen't show select rows from last 24 hours)
col1 | col2 | col3 | t_insert | col4 |
-------------------------------------------------------------
1 | zy | |2018-04-26 18:53:46.686816+10 |update |
3 | g | |2018-04-27 17:23:46.686816+10 |new |
2 | z | |2018-04-27 18:23:46.686816+10 |update |
4 | z | |2018-04-27 20:13:46.686816+10 |new |
Thanks in advance,
Hmmm, window function can help here:
select col, col2, col3, t_insert,
(case when cnt > 1 then 'update' else 'new' end) as col4
from (select t.*,
count(*) over (partition by col1) as cnt,
row_number() over (partition by col1 order by t_insert desc) as seqnum
from t
where t_insert >= now() - interval '24 hour'
) t
where seqnum = 1;

In SQL is there a way to partition by a value if it's not continuous

I would like to do the rank the values over a partition with two columns. col1 will be the key and col2 will be some value that is also going to be used in ORDER BY. I would like to start a new partition only when col2 is discontinued. For example, I would like to do the following:
+------+------+------+
| col1 | col2 | rank |
+------+------+------+
| a | 1 | 1 |
| a | 2 | 2 |
| a | 3 | 3 |
| a | 9 | 1 |
| a | 10 | 2 |
| b | 1 | 1 |
| b | 2 | 2 |
| b | 8 | 1 |
+------+------+------+
Thinking somewhere in lines of
SELECT col1, RANK() OVER (PARTITION BY col1, SOMETHING HERE??? ORDER BY col2 DESC)
Does anyone have any ideas?
If I understand correctly, you want to enumerate by "islands" of adjoining sequential values. You can do so with a simple observation: subtracting a sequence from col2 will be constant for each group. So, let's use this observation:
select t.*,
row_number() over (partition by col1, grp order by col1) as rnk
from (select t.*,
(col2 - row_number() over (partition by col1 order by col2)) as grp
from t
) t

Trying to write a query that will display duplicates results as null

I have a table that looks like the first example.
I'm trying to write a MSSQL2012 statement that that will display results like the second example.
Basically I want null values instead of duplicate values in columns 1 and 2. This is for readability purposes during reporting.
This seems like it should be possible, but I'm drawing a blank. No amount of joins or unions I've written has rendered the results I need.
| Col1 | Col2 | Col3 |
+------+------+------+
| 1 | 2 | 4 |
| 1 | 2 | 5 |
| 1 | 3 | 6 |
| 1 | 3 | 7 |
+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| 1 | 2 | 4 |
| Null | null | 5 |
| null | 3 | 6 |
| null | null | 7 |
+------+------+------+
I would do this with no subqueries at all:
select (case when row_number() over (partition by col1 order by col2, col3) = 1
then col1
end) as col1,
(case when row_number() over (partition by col2 order by col3) = 1
then col2
end) as col2,
col3
from t
order by t.col1, t.col2, t.col3;
Note that the order by at the end of the query is very important. The result set that you want depends critically on the ordering of the rows. Without the order by, the result set could be in any order. So, the query might look like it works, and then suddenly fail one day or on a slightly different set of data.
Using a common table expression with row_number():
;with cte as (
select *
, rn_1 = row_number() over (partition by col1 order by col2, col3)
, rn_2 = row_number() over (partition by col1, col2 order by col3)
from t
)
select
col1 = case when rn_1 > 1 then null else col1 end
, col2 = case when rn_2 > 1 then null else col2 end
, col3
from cte
without the cte
select
col1 = case when rn_1 > 1 then null else col1 end
, col2 = case when rn_2 > 1 then null else col2 end
, col3
from (
select *
, rn_1 = row_number() over (partition by col1 order by col2, col3)
, rn_2 = row_number() over (partition by col1, col2 order by col3)
from t
) sub
rextester demo: http://rextester.com/UYA17142
returns:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 2 | 4 |
| NULL | NULL | 5 |
| NULL | 3 | 6 |
| NULL | NULL | 7 |
+------+------+------+

SQL select distinct rows

I have data like this (col2 is of type Date)
| col1 | col2 |
------------------------------
| 1 | 17/10/2007 07:19:07 |
| 1 | 17/10/2007 07:18:56 |
| 1 | 31/12/2070 |
| 2 | 28/11/2008 15:23:14 |
| 2 | 31/12/2070 |
How would select rows which col1 is distinct and the value of col2 is the greatest. Like this
| col1 | col2 |
------------------------------
| 1 | 31/12/2070 |
| 2 | 31/12/2070 |
SELECT col1, MAX(col2) FROM some_table GROUP BY col1;
select col1, max(col2)
from table
group by col1
i reckon it would be
select col1, max(col2)
from DemoTable
group by col1
unless i've missed something obvious
select col1, max(col2) from MyTable
group by col1
SELECT Col1, MAX(Col2) FROM YourTable GROUP BY Col1
In Oracle and MS SQL:
SELECT *
FROM (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2 DESC) rn
FROM table t
) q
WHERE rn = 1
This will select other columns along with col1 and col2