IF with timestamp bigquery - sql

I need to add an attribute that indicates if that version is an original or copy. If is the first version of the site, is original, it is not, is a copy.
the table:
id_site id_version timestamp_version
1 5589 2/3/2022
1 2030 10/7/2022
1 1560 10/8/2022
2 6748 2/3/2022
2 7890 2/4/2022
3 4532 2/3/2022
The expected result:
id_site id_version timestamp_version type_version
1 5589 2/3/2022 original
1 2030 10/7/2022 copy
1 1560 10/8/2022 copy
2 6748 2/3/2022 original
2 7890 2/4/2022 copy
3 4532 2/3/2022 original

You can use an IF or CASE here. They are mostly interchangeable, but my preference is CASE since it's portable to nearly any other RDBMS where IF is only supported in a few.
CASE WHEN ROW_NUMBER() OVER (PARTITION BY id_site ORDER BY timestamp_version ASC) = 1 THEN 'copy' ELSE 'original' END
Inside the CASE expression we do a ROW_NUMBER() window function will "window" or partition each row in the result set by id_site and number each record for each distinct id_site sequentially ordered by timestamp_version in ascending order. We test to see if that ROW_NUMBER() is 1 and then label it with original or copy.

You can use a window function in an if statement for that:
with test as (
select * from unnest([
struct(1 as id_site, 5589 as id_version, timestamp(date "2022-03-02") as timestamp_version),
(1, 2030, timestamp(date "2022-07-10")),
(1, 1560, timestamp(date "2022-08-10")),
(2, 6748, timestamp(date "2022-03-02")),
(2, 7890, timestamp(date "2022-04-02")),
(3, 4532, timestamp(date "2022-03-02"))
])
)
select
*,
IF(timestamp_version = min(timestamp_version) over (partition by id_site), "original", "copy") AS type_version
from test

Consider below option
select *,
if(lag(id_version) over prev is null, 'original', 'copy') type_version,
from your_table
window prev as (partition by id_site order by timestamp_version)
if applied to sample data in your question - output is

Related

For each unique item in a redshift sql column, get the last rows based on a looking/scanning window

patient_id
alert_id
alert_timestamp
3
xyz
2022-10-10
1
anp
2022-10-12
1
gfe
2022-10-10
2
fgy
2022-10-02
2
gpl
2022-10-03
1
gdf
2022-10-13
2
mkd
2022-10-23
1
liu
2022-10-01
I have a sql table (see simplified version above) where for each patient_id, I want to only keep the latest alert (i.e. last one) that was sent out in a given window period e.g. window_size = 7.
Note, the window size needs to look at consecutive days i.e. between day 1 -> day 1 + window_size. The ranges of alert_timestamp for each patient_id varies and is usually well beyond the window_size range.
Note, that the table example given above, is a very simple example and will have many more patient_id's and will be in a mixed order in terms alert_timestamp and alert_id.
The approach is to start from the last alert_timstamp for a given patient_id and work back using the window_size to select the alert that was the last one in that window time frame.
Please note the idea is to have a scanning/looking window, example window_size = 7 days to move across the timestamps of each patient
The end result I want, is a table with the filtered out alerts
Expected output for (this example) window_size = 7:
patient_id
alert_id
alert_timestamp
1
liu
2022-10-01
1
gdf
2022-10-13
2
gpl
2022-10-03
2
mkd
2022-10-23
3
xyz
2022-10-10
What's the most efficient way to solve for this?
This can be done with the last_value window function but you need to prep your data a bit. Here's an example of what this could look like:
create table test (
patient_id int,
alert_id varchar(8),
alert_timestamp date);
insert into test values
(3, 'xyz', '2022-10-10'),
(1, 'anp', '2022-10-12'),
(1, 'gfe', '2022-10-10'),
(2, 'fgy', '2022-10-02'),
(2, 'gpl', '2022-10-03'),
(1, 'gdf', '2022-10-13'),
(2, 'mkd', '2022-10-23'),
(1, 'liu', '2022-10-01');
WITH RECURSIVE dates (dt) AS
(
SELECT '2022-09-30'::DATE AS dt UNION ALL SELECT dt + 1
FROM dates d
WHERE dt < '2022-10-31'::DATE
),
p_dates AS
(
SELECT pid,
dt
FROM dates d
CROSS JOIN (SELECT DISTINCT patient_id AS pid FROM test) p
),
combined AS
(
SELECT *
FROM p_dates d
LEFT JOIN test t
ON d.dt = t.alert_timestamp
AND d.pid = t.patient_id
),
latest AS
(
SELECT patient_id,
pid,
alert_id,
dt,
alert_timestamp,
LAST_VALUE(alert_id IGNORE NULLS) OVER (PARTITION BY pid ORDER BY dt ROWS BETWEEN CURRENT ROW AND 7 following) AS at
FROM combined
)
SELECT patient_id,
alert_id,
alert_timestamp
FROM latest
WHERE patient_id IS NOT NULL
AND alert_id = at
ORDER BY patient_id,
alert_timestamp;
This produces the results you are looking for with the test data but there are a few assumptions. The big one is that here is at most 1 alert per patient per day. If this isn't true then some more data massaging will be needed. Either way this should give you an outline on how to do this.
First need is to ensure that there is 1 row per patient per day so that the window function can operate on rows as these will be equivalent to days (for each patient). The date range is generated by a recursive CTE and joined to the test data to achieve the 1 row per day per patient.
The "ignore nulls" option is used in the last_value window function to ignore any of these "extra" rows create by the above process. The last step is to prune out all the unneeded rows and ensure that only the latest alert of the window is produced.

pivot function in Bigquery for transpose, wrong value falling in

I'm trying to use pivot function to transpose rows however action_type=4 keeps falling to the wrong column after I ran my query. Below is the sample data:
SessionId
action_type
products
122
3
5
122
4
1
127
3
2
189
4
1
Ideal output will look like below:
SessionId
action_type_1
products_1
action_type_2
products_2
122
3
5
4
1
127
3
2
189
4
1
I have written below query trying to do the transpose:
select * from
(select * except (SessionId),
max(SessionId) over win SessionId,
row_number() over (win order by SessionId, action_type, products) tab
from
`xxx.sample.xxx`
window win as (partition by SessionId)
)
pivot (
any_value (action_type) as action_type ,
any_value(products) as products for tab in (1,2))
However this output has returning some strange results, for example I see value 4 under action_type_1, which is not what I expected. action_type_1 should only have value 3 because I wanted to define action_type_1=3 and action_type_2=4. Can anyone help look at my query? Any advises are appreciated!
I think, below is what you are looking for
select * from your_table
pivot (
any_value (action_type) as action_type ,
any_value(products) as products
for action_type in (3,4)
)
with output
so, as you can see - instead of relying on offset - you just simply go directly off of action type!
In case if for some reason you need output as _1 and _2 - use below trick
select * from your_table
pivot (
any_value (action_type) as action_type ,
any_value(products) as products
for case action_type when 3 then 1 when 4 then 2 end in (1,2)
)
with output

Exclude group of records—if number ever goes up

I have a road inspection table:
INSPECTION_ID ROAD_ID INSP_DATE CONDITION_RATING
--------------------- ------------- --------- ----------------
506411 3040 01-JAN-81 15
508738 3040 14-APR-85 15
512461 3040 22-MAY-88 14
515077 3040 17-MAY-91 14 -- all ok
505967 3180 01-MAY-81 11
507655 3180 13-APR-85 9
512374 3180 11-MAY-88 17 <-- goes up; NOT ok
515626 3180 25-APR-91 16.5
502798 3260 01-MAY-83 14
508747 3260 13-APR-85 13
511373 3260 11-MAY-88 12
514734 3260 25-APR-91 12 -- all ok
I want to write a query that will exclude the entire road -- if the road's condition ever goes up over time. For example, exclude road 3180, since the condition goes from 9 to 17 (an anomaly).
Question:
How can I do that using Oracle SQL?
Sample data: db<>fiddle
Here's one option:
find "next" condition_rating value (within the same road_id - that's the partition by clause, sorted by insp_date)
return road_id whose difference between the "next" and "current" condition_rating is less than zero
SQL> with temp as
2 (select road_id,
3 condition_rating,
4 nvl(lead(condition_rating) over (partition by road_id order by insp_date),
5 condition_rating) next_cr
6 from test
7 )
8 select distinct road_id
9 from temp
10 where condition_rating - next_cr < 0;
ROAD_ID
----------
3180
SQL>
Based on OPs own answer, which make the expected outcome more clear.
In my permanent urge to avoid self-joins I'd go for the nested window function:
SELECT road_id, condition_rating, insp_date
FROM ( SELECT prev.*
, COUNT(CASE WHEN condition_rating < next_cr THEN 1 END) OVER(PARTITION BY road_id) bad
FROM (select t.*
, lead(condition_rating) over (partition by road_id order by insp_date) next_cr
from t
) prev
) tagged
WHERE bad = 0
ORDER BY road_id, insp_date
NOTE
lead() gives null for the last row which the query considers by the case expression to mark bad rows: condition_rating < next_cr — if next_cr is null, the condition won't be true so that the case maps it as "not bad".
The case is just to mimic the filter clause: https://modern-sql.com/feature/filter
MATCH_RECOGNIZE might be another option to this problem, but due to the lack of '^' and '$' I'm worried that the backtracking might cause more problems it is worth.
Nested window functions are typically no big performance hit if they use compatible OVER clauses, like in this query.
Here's an answer that's similar to #Littlefoot's answer:
with insp as (
select
road_id,
condition_rating,
insp_date,
case when condition_rating > lag(condition_rating,1) over(partition by road_id order by insp_date) then 'Y' end as condition_goes_up
from
test_data
)
select
insp.*
from
insp
left join
(
select distinct
road_id,
condition_goes_up
from
insp
where
condition_goes_up = 'Y'
) insp_flag
on insp.road_id = insp_flag.road_id
where
insp_flag.condition_goes_up is null
--Note: I removed the ORDER BY, because I think the window function already orders the rows the way I want.
db<>fiddle
Edit:
Here's a version that's similar to what #Markus Winand did:
insp as (
select
road_id,
condition_rating,
insp_date,
case when condition_rating > lag(condition_rating,1) over(partition by road_id order by insp_date) then 'Y' end as condition_goes_up
from
test_data
)
select
insp_tagged.*
from
(
select
insp.*,
count(condition_goes_up) over(partition by road_id) as condition_goes_up_count
from
insp
) insp_tagged
where
condition_goes_up_count = 0
I ended up going with that option.
db<>fiddle

datediff for row that meets my condition only once per row

I want to do a datediff between 2 dates on different rows only if the rows have a condition.
my table looks like the following, with additional columns (like guid)
Id | CreateDateAndTime | condition
---------------------------------------------------------------
1 | 2018-12-11 12:07:55.273 | with this
2 | 2018-12-11 12:07:53.550 | I need to compare this state
3 | 2018-12-11 12:07:53.550 | with this
4 | 2018-12-11 12:06:40.780 | state 3
5 | 2018-12-11 12:06:39.317 | I need to compare this state
with this example I would like to have 2 rows in my selection which represent the difference between the dates from id 5-3 and from id 2-1.
As of now I come with a request that gives me the difference between dates from id 5-3 , id 5-1 and id 2-1 :
with t as (
SELECT TOP (100000)
*
FROM mydatatable
order by CreateDateAndTime desc)
select
DATEDIFF(SECOND, f.CreateDateAndTime, s.CreateDateAndTime) time
from t f
join t s on (f.[guid] = s.[guid] )
where f.condition like '%I need to compare this state%'
and s.condition like '%with this%'
and (f.id - s.id) < 0
My problem is I cannot set f.id - s.id to a value since other rows can be between the ones I want to make the diff on.
How can I make the datediff only on the first rows that meet my conditions?
EDIT : To make it more clear
My condition is an eventname and I want to calculate the time between the occurence of my event 1 and my event 2 and fill a column named time for example.
#Salman A answer is really close to what I want except it will not work when my event 2 is not happening (which was not in my initial example)
i.e. in table like the following , it will make the datediff between row id 5 and row id 2
Id | CreateDateAndTime | condition
---------------------------------------------------------------
1 | 2018-12-11 12:07:55.273 | with this
2 | 2018-12-11 12:07:53.550 | I need to compare this state
3 | 2018-12-11 12:07:53.550 | state 3
4 | 2018-12-11 12:06:40.780 | state 3
5 | 2018-12-11 12:06:39.317 | I need to compare this state
the code I modified :
WITH cte AS (
SELECT id
, CreateDateAndTime AS currdate
, LAG(CreateDateAndTime) OVER (PARTITION BY guid ORDER BY id desc ) AS prevdate
, condition
FROM t
WHERE condition IN ('I need to compare this state', 'with this ')
)
SELECT *
,DATEDIFF(second, currdate, prevdate) time
FROM cte
WHERE condition = 'I need to compare this state '
and DATEDIFF(second, currdate, prevdate) != 0
order by id desc
Perhaps you want to match ids with the nearest smaller id. You can use window functions for this:
WITH cte AS (
SELECT id
, CreateDateAndTime AS currdate
, CASE WHEN LAG(condition) OVER (PARTITION BY guid ORDER BY id) = 'with this'
THEN LAG(CreateDateAndTime) OVER (PARTITION BY guid ORDER BY id) AS prevdate
, condition
FROM t
WHERE condition IN ('I need to compare this state', 'with this')
)
SELECT *
, DATEDIFF(second, currdate, prevdate)
FROM cte
WHERE condition = 'I need to compare this state'
The CASE expression will match this state with with this. If you have mismatching pairs then it'll return NULL.
try by using analytic function lead()
with cte as
(
select 1 as id, '2018-12-11 12:07:55.273' as CreateDateAndTime,'with this' as condition union all
select 2,'2018-12-11 12:07:53.550','I need to compare this state' union all
select 3,'2018-12-11 12:07:53.550','with this' union all
select 4,'2018-12-11 12:06:40.780','state 3' union all
select 5,'2018-12-11 12:06:39.317','I need to compare this state'
) select *,
DATEDIFF(SECOND,CreateDateAndTime,lead(CreateDateAndTime) over(order by Id))
from cte
where condition in ('with this','I need to compare this state')
You Ideally want LEADIF/LAGIF functions, because you are looking for the previous row where condition = 'with this'. Since there are no LEADIF/LAGIFI think the best option is to use OUTER/CROSS APPLY with TOP 1, e.g
CREATE TABLE #T (Id INT, CreateDateAndTime DATETIME, condition VARCHAR(28));
INSERT INTO #T (Id, CreateDateAndTime, condition)
VALUES
(1, '2018-12-11 12:07:55', 'with this'),
(2, '2018-12-11 12:07:53', 'I need to compare this state'),
(3, '2018-12-11 12:07:53', 'with this'),
(4, '2018-12-11 12:06:40', 'state 3'),
(5, '2018-12-11 12:06:39', 'I need to compare this state');
SELECT ID1 = t1.ID,
Date1 = t1.CreateDateAndTime,
ID2 = t2.ID,
Date2 = t2.CreateDateAndTime,
Difference = DATEDIFF(SECOND, t1.CreateDateAndTime, t2.CreateDateAndTime)
FROM #T AS t1
CROSS APPLY
( SELECT TOP 1 t2.CreateDateAndTime, t2.ID
FROM #T AS t2
WHERE t2.Condition = 'with this'
AND t2.CreateDateAndTime > t1.CreateDateAndTime
--AND t2.GUID = t.GUID
ORDER BY CreateDateAndTime
) AS t2
WHERE t1.Condition = 'I need to compare this state';
Which Gives:
ID1 Date1 D2 Date2 Difference
-------------------------------------------------------------------------------
2 2018-12-11 12:07:53.000 1 2018-12-11 12:07:55.000 2
5 2018-12-11 12:06:39.000 3 2018-12-11 12:07:53.000 74
I would enumerate the values and then use window functions for the difference.
select min(id), max(id),
datediff(second, min(CreateDateAndTime), max(CreateDateAndTime)) as seconds
from (select t.*,
row_number() over (partition by condition order by CreateDateAndTime) as seqnum
from t
where condition in ('I need to compare this state', 'with this')
) t
group by seqnum;
I cannot tell what you want the results to look like. This version only output the differences, with the ids of the rows you care about. The difference can also be applied to the original rows, rather than put into summary rows.

SQL aggregate using DISTINCT on ID by latest date

Request
I have a section of data below and my goal is to limit the agent column to be distinct only containing unique values, where the unique value selected is the latest date it was modified.
Existing Data
modified agent rank
2016-10-18 346502 0
2013-06-04 346502 41
2011-10-31 346503 0
2012-08-13 346505 0
2016-04-18 346506 66
2015-01-27 346506 1
2016-01-21 346507 103
2015-01-27 346507 130
2012-01-30 346508 0
Trying to use this answer https://stackoverflow.com/a/29912858/461887 as a basis but cannot get where to aggregate it properly.
SQL not working
SELECT DISTINCT
FLiex.agtprof.modify_date_time
,FLiex.agtprof.agent_id
,FLiex.agtprof.rank
,FLiex.agtprof.external_id
WHERE
FLiex.agtprof.modify_date_time = MAX( FLiex.agtprof.modify_date_time)
FROM
FLiex.agtprof
Desired Output
modify agent rank
18/10/2016 346502 0
18/04/2016 346506 66
21/01/2016 346507 103
13/08/2012 346505 0
30/01/2012 346508 0
31/10/2011 346503 0
You're attempting to get single row data, but based on the other rows. While this may be possible with aggregate functions, it's much easier to do with window (analytic) functions:
SELECT [modified], [agent], [rank], [id]
FROM (SELECT [modified], [agent], [rank], [id],
ROW_NUMBER() OVER (PARTITION BY [agent]
ORDER BY [modified] DESC) AS rn
FROM [agtprof]) t
WHERE rn = 1
SELECT DISTINCT max(id_date), agent, rank, id
FROM fliex.agtprof
GROUP BY 2,3,4;
Try this. I think if you chose the max id_date and then group by the rest, you should get the results you're looking for.
Try this:
SELECT
FLiex.agtprof.modify_date_time
,FLiex.agtprof.agent_id
,FLiex.agtprof.rank
,FLiex.agtprof.external_id
FROM
FLiex.agtprof
INNER JOIN (
SELECT
Max(FLiex.agtprof.modify_date_time) as max_mod_date_time
,FLiex.agtprof.agent_id as agent_id
FROM
FLiex.agtprof
GROUP BY FLiex.agtprof.agent_id
) Filter
ON FLiex.agtprof.agentID = Filter.agent_id
AND FLiex.agtprof.modify_date_time = Filter.max_mod_date_time