BigQuery. CASE WHEN - sql

I am working on BigQuery.
I need to created a query that counts if the version is a fix or not. The first version is not a fix and the ones that follow are a fix. Unless the first version has been created by system, in that case, the first and second versions are not a fix.
Now my query count only the first version as a non-fix, regardless of who created it.
My query is:
Select
CASE WHEN ROW_NUMBER()
OVER (PARTITION BY id_site ORDER BY timestamp ASC) = 1 THEN false ELSE true END is_fix,
FROM table1
Table1
ID ID_site Timestamp Version version_created_by
1 1 1/1/2022 4624653 system
2 1 1/2/2022 4624651 1234
3 1 1/3/2022 4624655 4567
4 2 5/2/2022 4567830 1234
5 2 5/3/2022 4567835 5678
6 3 5/4/2022 7567836 8907
7 3 5/5/2022 7890000 9807
Expected result:
ID ID_site Timestamp Version version_created_by is_fix
1 1 1/1/2022 4624653 system FALSE
2 1 1/2/2022 4624651 1234 FALSE
3 1 1/3/2022 4624655 4567 TRUE
4 2 5/2/2022 4567830 1234 FALSE
5 2 5/3/2022 4567835 5678 TRUE
6 3 5/4/2022 7567836 8907 FALSE
7 3 5/5/2022 7890000 9807 TRUE

Try below approach
SELECT
*,
CASE WHEN ROW_NUMBER() OVER (
PARTITION BY id_site
ORDER BY timestamp DESC
) <= t.cnt
THEN true ELSE false
END is_fix,
FROM (
Select
*,
CASE WHEN version_created_by = 'system' THEN 2 ELSE 1 END cnt
FROM table1
) as t
ORDER BY id
my test data & result are ...
id id_site timestamp version version_created_by
1 1 2022-01-01T00:00:00Z 4624653 system
2 1 2022-01-02T00:00:00Z 4624651 1234
3 1 2022-01-03T00:00:00Z 4624655 4567
4 2 2022-05-02T00:00:00Z 4567830 1234
5 2 2022-05-03T00:00:00Z 4567835 5678
6 3 2022-05-04T00:00:00Z 7567836 8907
7 3 2022-05-05T00:00:00Z 7890000 9807
id id_site timestamp version version_created_by cnt is_fix
1 1 2022-01-01T00:00:00Z 4624653 system 2 false
2 1 2022-01-02T00:00:00Z 4624651 1234 1 false
3 1 2022-01-03T00:00:00Z 4624655 4567 1 true
4 2 2022-05-02T00:00:00Z 4567830 1234 1 false
5 2 2022-05-03T00:00:00Z 4567835 5678 1 true
6 3 2022-05-04T00:00:00Z 7567836 8907 1 false
7 3 2022-05-05T00:00:00Z 7890000 9807 1 true
To match the format with the expected result,
SELECT * EXCEPT (cnt)
FROM (
SELECT
*,
CASE WHEN ROW_NUMBER() OVER (
PARTITION BY id_site
ORDER BY timestamp DESC
) <= t.cnt
THEN true ELSE false
END is_fix,
FROM (
Select
*,
CASE WHEN version_created_by = 'system' THEN 2 ELSE 1 END cnt
FROM table1
) as t
)
ORDER BY id

You can check if the 1st version was created by 'system' with FIRST_VALUE() window function:
SELECT *,
CASE ROW_NUMBER() OVER (PARTITION BY id_site ORDER BY timestamp)
WHEN 1 THEN false
WHEN 2 THEN FIRST_VALUE(version_created_by) OVER (PARTITION BY id_site ORDER BY timestamp) <> 'system'
ELSE true
END is_fix
FROM table1;
See the demo (for MySql).

Related

How to select a Algokey when [ UserSelect] column has any of row value is 1 otherwise switch to[ SytemSelect]column haivng row value is 1

I have a table with Scenario,Product,AlgoKey,User Select,System Select columns, I have to select the algo key for each scenario, The first priority goes to user selected otherwise system selected.
I have Shared my Inout & output result below, could you please help me how to write query for this.
Scenario
Product
AlgoKey
User Select
SystemSelect
1
P101
1
0
1
1
P102
2
1
0
2
P101
1
0
1
2
P102
2
0
0
3
P101
1
1
1
3
P102
2
0
0
4
P101
1
0
0
4
P102
2
0
1
OutPut :
Scenario
AlgoKey
Columnselected
1
2
User
2
1
System
3
1
User
4
2
System
here is how you can do it
select scenario , AlgoKey, case when Userselect = 1 then 'User' else 'System' end Columnselected
from (
select *, ROW_NUMBER() over (partition by scenario,productkey order by userselect desc, systemselect desc) rn
from tableName
) t
where rn = 1

Retrieving last record in each group from database with order by

There is a table ticket that contains data as shown below:
Id Impact group create_date
------------------------------------------
1 3 ABC 2020-07-28 00:42:00.0
1 2 ABC 2020-07-28 00:45:00.0
1 3 ABC 2020-07-28 00:48:00.0
1 3 ABC 2020-07-28 00:52:00.0
1 3 XYZ 2020-07-28 00:55:00.0
1 3 XYZ 2020-07-28 00:59:00.0
Expected result:
Id Impact group create_date
------------------------------------------
1 3 ABC 2020-07-28 00:42:00.0
1 2 ABC 2020-07-28 00:45:00.0
1 3 ABC 2020-07-28 00:52:00.0
1 3 XYZ 2020-07-28 00:59:00.0
At present, this is the query that I use:
WITH final AS (
SELECT p.*,
ROW_NUMBER() OVER(PARTITION BY p.id,p.group,p.impact
ORDER BY p.create_date desc, p.impact) AS rk
FROM ticket p
)
SELECT f.*
FROM final f
WHERE f.rk = 1
Result, i am getting is:
Id Impact group create_date
-----------------------------------------
1 2 ABC 2020-07-28 00:45:00.0
1 3 ABC 2020-07-28 00:52:00.0
1 3 XYZ 2020-07-28 00:59:00.0
it seems that partition by is getting precedence over order by values. is there other way to achieve expected result. I am running these queries on amazon Redshift.
You could use LEAD() to check if the Impact changes between rows, taking only the rows where the value will change.
WITH
look_forward AS
(
SELECT
*,
LEAD(impact) OVER (PARTITION BY id, group ORDER BY create_date) AS lead_impact
FROM
ticket
)
SELECT
*
FROM
look_forward
WHERE
lead_impact IS NULL
OR lead_impact <> impact
You seem to want rows where id/impact/group change relative to the next row. A simple way is to look at the next create_date overall and the next create_date for the group. If these are the same, then filter:
select t.*
from (select t.*,
lead(create_date) over (order by create_date) as next_create_date,
lead(create_date) over (partition by id, impact, group order by create_date) as next_create_date_img
from ticket t
) t
where next_create_date_img is null or next_create_date_img <> next_create_date;

Get most frequent value from a windowing function

I have a SQL table that looks like:
user_id role date
1 1 2019-11-26 21:20:54.397+00
1 2 2019-11-27 22:46:28.923+00
2 1 2019-12-06 22:17:53.925+00
2 3 2019-12-13 00:12:28.006+00
3 1 2019-11-25 21:57:17.701+00
3 1 2019-12-06 20:48:28.314+00
3 1 2019-12-15 23:59:06.81+00
4 3 2019-12-04 15:26:10.639+00
4 3 2019-11-22 19:20:01.025+00
4 3 2019-11-25 12:38:53.169+00
I would like to get the most frequent role according to past dates and use. The result should looks like:
user_id role date most_frequent_role
1 1 2019-11-26 21:20:54.397+00 NULL
1 2 2019-11-27 22:46:28.923+00 1
2 1 2019-12-06 22:17:53.925+00 NULL
2 3 2019-12-13 00:12:28.006+00 1
3 1 2019-11-25 21:57:17.701+00 NULL
3 1 2019-12-06 20:48:28.314+00 1
3 1 2019-12-15 23:59:06.81+00 1
4 3 2019-12-04 15:26:10.639+00 NULL
4 3 2019-11-22 19:20:01.025+00 3
4 3 2019-11-25 12:38:53.169+00 3
Following query will work for you.
select test.user_id,test.role,test.role_date,
case when test.role_date in
(select min(role_date) from test group by user_id) then NULL
else t.role end as MOST_FREQUENT_ROLE
from
(select user_id,min(role) as role from test group by user_id
)t
join test on t.user_id=test.user_id
order by user_id,role_date
Output
USER_ID ROLE ROLE_DATE MOST_FREQUENT_ROLE
1 1 26-NOV-19 -
1 2 27-NOV-19 1
2 1 06-DEC-19 -
2 3 13-DEC-19 1
3 1 25-NOV-19 -
3 1 06-DEC-19 1
3 1 15-DEC-19 1
4 3 22-NOV-19 -
4 3 25-NOV-19 3
4 3 04-DEC-19 3
If you strictly want to go with window function, Try below -
SELECT user_id
,role
,date
,CASE WHEN date = MIN(date) OVER(PARTITION BY user_id ORDER BY date)
THEN NULL
ELSE MIN(role) OVER(PARTITION BY user_id) END MOST_FREQUENT_ROLE
FROM YOUR_TABLE;
Technically, what you are trying to calculate is the mode (this is a statistical term).
Postgres has a built-in mode() function. Alas, it does not work as you need as a window function, so it provides little help.
I would recommend using a lateral join:
select t.*, m.role
from t left join lateral
(select t2.role
from t t2
where t2.user_id = t.user_id and
t2.date < t.date
group by t2.role
order by count(*) desc,
max(date) desc -- in the event of ties, use the most recent
limit 1
) m
on 1=1
order by user_id, date;
Here is a db<>fiddle. Note that I added some rows to give an example of where the running mode changes.
This will not be particularly efficient but an index on (user_id, date, role) should help.
If you have just a handful of roles there are probably more efficient solutions. If that is the case and performance is an issue, ask a new question.

Add position column based on order by - Oracle

I have this table with the following records:
table1
id ele_id_1 ele_val ele_id_2
1 2 123 1
1 1 abc 1
1 4 xyz 2
1 4 456 1
2 5 22 1
2 4 344 1
2 3 6 1
2 2 Test Name 1
2 1 Hello 1
I am trying to add position for each id when ele_id_1 and ele_id_2 is order by ASC.
Here is the output:
id ele_id_1 ele_val ele_id_2 position
1 2 123 1 2
1 1 abc 1 1
1 4 xyz 2 4
1 4 456 1 3
2 5 22 1 5
2 4 344 1 4
2 3 6 1 3
2 2 Test Name 1 2
2 1 Hello 1 1
I have 34 million rows in table1, so would like to use an efficient way of doing this.
Any idea on how I can add position with values?
I think you want row_number() used like this:
select row_number() over (partition by id
order by ele_id_1, ele_id_2
) as position
Oracle can use an index for this, on (id, ele_id_1, ele_id_2).
I should note that for your example data order by ele_id_1, ele_id_2 and order by ele_id_2, ele_id_1 produce the same result. Your question suggests that you want the first.
So, you would get
id ele_id_1 ele_val ele_id_2 position
1 1 123 2 2
1 1 abc 1 1
1 4 xyz 2 4
1 4 456 1 3
Rather than:
id ele_id_1 ele_val ele_id_2 position
1 1 123 2 3
1 1 abc 1 1
1 4 xyz 2 4
1 4 456 1 2
EDIT:
If you want to update the data, then merge is probably the best approach.
MERGE INTO <yourtable> dest
USING (select t.*,
row_number() over (partition by id
order by ele_id_1, ele_id_2
) as new_position
from <yourtable> t
) src
ON dest.id = src.id AND
dest.ele_id_1 = src.ele_id_1 AND
dest.ele_id_2 = src.ele_id_2
WHEN MATCHED THEN UPDATE
SET desc.postition = src.new_position;
Note that updating all the rows in a table is an expensive operation. Truncating the table and recreating it might be easier:
create table temp_t as
select t.*,
row_number() over (partition by id
order by ele_id_1, ele_id_2
) as new_position
from t;
truncate table t;
insert into t ( . . . )
select . . . -- all columns but position
from temp_t;
However, be very careful if you truncate the table. Be sure to back it up first!

SQL Server: SELECT value with multiple criteria

Looking for a SQL solution to the following problem
Return USER and NUMBER combination WHERE PRIORITY = MIN(PRIORITY) [NULL is equivalent to MAX(PRIORITY + 1)] ... in the case of ties in PRIORITY, break using lowest LINEITEM
FIELDS:
USER,
LINEITEM,
NUMBER,
PRIORITY
VALUES: ('X' signifies desired combination)
USER LINEITEM NUMBER PRIORITY
-------------------------------------
1 1 12345 NULL
1 2 23456 2
1 3 34567 1 X
2 1 9876 3
2 2 98765 1 X
2 3 12345 2
2 4 23456 1
3 1 23456 NULL X
3 2 12345 NULL
4 1 34567 NULL
4 2 45678 NULL
4 3 12345 1 X
4 4 12345 2
4 5 23456 1
Thanks in advance.
In response to PM 77-1,
My current method:
SELECT table1.user,table1.number
FROM table1
JOIN (
SELECT user,
CAST(MIN((COALESCE(priority,999) *
(10 ^ (5 - LEN(COALESCE(CAST(priority AS VARCHAR),'999'))))) +
lineitem) AS VARCHAR) AS selector
FROM table1 GROUP BY user
) AS table2
ON table1.user = table2.user
AND table1.lineitem = CAST(RIGHT(table2.selector, 1) AS int)
ORDER BY table1.user;
Use ROW_NUMBER:
SQL Fiddle
;WITH Cte AS(
SELECT *,
ROW_NUMBER() OVER(
PARTITION BY [User]
ORDER BY
CASE WHEN Priority IS NULL THEN 1 ELSE 0 END,
Priority,
LineItem
) AS rn
FROM tbl
)
SELECT
[User], LineItem, Number, Priority
FROM Cte
WHERE rn = 1