sql how to assign the same ID for the same group

sql how to assign the same ID for the same group - sql

I have a dataset as this:
ID SESSION DATE
1 A 2021/1/1
1 A 2021/1/2
1 B 2021/1/3
1. B 2021/1/4
1 A 2021/1/5
1 A 2021/1/6
So what I want to create is the GROUP column which assigns the same row number for where ID column AND SESSION column is the same as below:
ID SESSION DATE GROUP
1 A 2021/1/1 1
1 A 2021/1/2 1
1 B 2021/1/3 2
1 B 2021/1/4 2
1 A 2021/1/5 3
1 A 2021/1/6 3
Does anyone know how to do this in SQL in an efficient way because I have about 5 billion rows? Thank you in advance!

You have a kind of gaps and islands problem, you can create your groupings by counting when the session changes using lag, like so:
select Id, Session, Date,
Sum(case when session = prevSession then 0 else 1 end) over(partition by Id order by date) "Group"
from (
select *,
Lag(Session) over(partition by Id order by date) prevSession
from t
)t;
Example Fiddle using MySql but this is ansi SQL that should work in most DBMS.

Related

SQL Query getting the latest record of the Group and calculate the value of those particular records

I do have the following table (just a sample) and would like to get the Points subtract from Record2 to Record1. (Record2-Record1) from the latest record of both record1 and 2. The records are entered in category of Match. 1 Match will consists of 2 records which are Record 1 and Record 2.
The output will be 3 as the newest record is ID 3 and 4 from the Match2.)
ID
Name
Points
TimeRecorded
Match
1
Record 1
3
2-Mar 2pm
1
2
Record 2
5
2-Mar 2pm
1
3
Record 1
5
4-Mar 5pm
2
4
Record 2
8
4-Mar 5pm
2
I tried to get the value of subtracting both query as below. But I feel that this is not the good way as it is hard coded for the match and the Name of the record. May I know how to construct a better query in order to get the latest record of the grouped match and calculate the points whereby subtracting Record1 from Record2.
SELECT
(select Points from RunRecord where Name= 'Record2' AND Match = 2)
- (select Points from RunRecord where Name= 'Record1' AND Match = 2)

You could use:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TimeRecorded DESC) rn
FROM yourTable
)
SELECT
MAX(CASE WHEN Name = 'Record 2' THEN Points END) -
MAX(CASE WHEN Name = 'Record 1' THEN Points END) AS diff
FROM cte
WHERE rn = 1;
The CTE assigns a row number for each group of records of the same name, with 1 being assigned to the most recent record. Then, we aggregate over the entire table and pivot out the points to find the difference.

You can use the rank() window function to rank the records by match descending. Then take the top of the ranked records and use conditional aggregation to control the sign of the points added.
SELECT sum(CASE x.name
WHEN 'Record2' THEN
x.points
WHEN 'Record1' THEN
-x.points
END)
FROM (SELECT rr.name,
rr.points,
rank() OVER (ORDER BY rr.match DESC) r
FROM runrecord rr
WHERE name IN ('Record1',
'Record2')) x
WHERE x.r = 1;

Oracle SQL - select last 3 rows after a specific row

Below is my data:
My requirement is to get the first 3 consecutive approvals. So from above data, ID 4, 5 and 6 are the rows that I need to select. ID 1 and 2 are not eligible, because ID 3 is a rejection and hence breaks the consecutive condition of actions. Basically, I am looking for the last rejection in the list and then finding the 3 consecutive approvals after that.
Also, if there are no rejections in the chain of actions then the first 3 actions should be the result. For below data:
So my output should be ID 11, 12 and 13.
And if there are less than 3 approvals, then the output should be the list of approvals. For below data:
output should be ID 21 and 22.
Is there any way to achieve this with SQL query only - i.e. no PL-SQL code?

Here is one method that uses window functions:
Find the first row where there are three approvals.
Find the minimum action_at among the rows with three approvals
Filter
Keep the three rows you want
This version uses fetch which is in Oracle 12+:
select t.*
from (select t.*,
min(case when has_approval_3 = 3 then action_at end) over () as first_action_at
from (select t.*,
sum(case when action = 'APPROVAL' then 1 else 0 end) over (order by action_at rows between current row and 2 following) as has_approval_3
from t
) t
) t
where action = 'APPROVAL' and
(action_at >= first_action_at or first_action_at is null)
order by action_at
fetch first 3 rows only;

You can use IN and ROW_NUMBER analytical function as following:
SELECT * FROM
( SELECT
T.*,
ROW_NUMBER() OVER(ORDER BY Y.ACTION_AT) AS RN
FROM YOUR_TABLE Y
WHERE Y.ACTION = 'APPROVE'
AND Y.ACTION_AT >= COALESCE(
(SELECT MAX(YIN.ACTION_AT)
FROM YOUR_TABLE YIN
WHERE YIN.ACTION = 'REJECT'
), Y.ACTION_AT) )
WHERE RN <= 3;
Cheers!!

How can I create this conditional grouped field on SQL Server 2008?

Sorry for this question, but i cannot resolve this simple query.
I have this table:
ID_Type Item
-----------------
A 1
P 2
P 3
A 4
P 5
A 6
I need to calculate a "group" incremental counter based on ID_Type Field where This field has an "A" Value. This is the expected result:
ID_Type Item Counter
-----------------------------
A 1 1
P 2 1
P 3 1
A 4 2
P 5 2
A 6 3
So every time a record with ID_Type='A' appear, I need to increment the counter. Any help will be apreciated.

In SQL Server 2012+, you can use a cumulative sum:
select t.*,
sum(case when id_type = 'A' then 1 else 0 end) over (order by item) as counter
from t;
This will be much more efficient than a correlated subquery approach, particularly on larger data sets.

One way is a subquery:
SELECT ID_Type, Item, (
SELECT COUNT(*) FROM MyTable t2
WHERE t2.Item <= t1.Item
AND t2.ID_Type='A'
) AS Counter
FROM MyTable t1
ORDER BY Item ASC
This will work on any version of SQL Server.

How to identify subsequent user actions based on prior visits

I want to identify the users who visited section a and then subsequently visited b. Given the following data structure. The table contains 300,000 rows and updates daily with approx. 8,000 rows:
**USERID** **VISITID** **SECTION** Desired Solution--> **Conversion**
1 1 a 0
1 2 a 0
2 1 b 0
2 1 b 0
2 1 b 0
1 3 b 1
Ideally I want a new column that flags the visit to section b. For example on the third visit User 1 visited section b for the first time. I was attempting to do this using a CASE WHEN statement but after many failed attempts I am not sure it is even possible with CASE WHEN and feel that I should take a different approach, I am just not sure what that approach should be. I do also have a date column at my disposal.
Any suggestions on a new way to approach the problem would be appreciated. Thanks!

Correlated sub-queries should be avoided at all cost when working with Redshift. Keep in mind there are no indexes for Redshift so you'd have to rescan and restitch the column data back together for each value in the parent resulting in an O(n^2) operation (in this particular case going from 300 thousand values scanned to 90 billion).
The best approach when you are looking to span a series of rows is to use an analytic function. There are a couple of options depending on how your data is structured but in the simplest case, you could use something like
select case
when section != lag(section) over (partition by userid order by visitid)
then 1
else 0
end
from ...
This assumes that your data for userid 2 increments the visitid as below. If not, you could also order by your timestamp column
**USERID** **VISITID** **SECTION** Desired Solution--> **Conversion**
1 1 a 0
1 2 a 0
2 1 b 0
2 *2* b 0
2 *3* b 0
1 3 b 1

select t.*, case when v.ts is null then 0 else 1 end as conversion
from tbl t
left join (select *
from tbl x
where section = 'b'
and exists (select 1
from tbl y
where y.userid = x.userid
and y.section = 'a'
and y.ts < x.ts)) v
on t.userid = v.userid
and t.visitid = v.visitid
and t.section = v.section
Fiddle:
http://sqlfiddle.com/#!15/5b954/5/0
I added sample timestamp data as that field is necessary to determine whether a comes before b or after b.
To incorporate analytic functions you could use:
(I've also made it so that only the first occurrence of B (after an A) will get flagged with the 1)
select t.*,
case
when v.first_b_after_a is not null
then 1
else 0
end as conversion
from tbl t
left join (select userid, min(ts) as first_b_after_a
from (select t.*,
sum( case when t.section = 'a' then 1 end)
over( partition by userid
order by ts ) as a_sum
from tbl t) x
where section = 'b'
and a_sum is not null
group by userid) v
on t.userid = v.userid
and t.ts = v.first_b_after_a
Fiddle: http://sqlfiddle.com/#!1/fa88f/2/0

SQL Select with Group By and Order By Date

I am using SQL Server 2008, and I am wondering if i can accomplish my query in one select statement and without sub-query.
I want to set variable to true if a field in a record is true in the last 10 created records, where if the field is true in the last 10 records the variable will be true while if it is false the variable will be false, also if the total number of records is less than 10 then the variable will be false too.
My problem is, to get the latest 10 created records then i need to user order by descending and do the filter on the top 10, so my query should look like the following where it is not a valid query:
declare #MyVar bit
set #MyVar = 0
select top(10) #MyVar = 1 from MyTable
where SomeId = 1000 and SomeFlag = 1
group by SomeId
having count(SomeId) >= 10
order by CreatedDate
Please provide me with your suggestions.
Here is an example, say we have the following table, and say that i want to check the latest 3 records for each id:
ID Joined CreatedDate
1 true 03/27/2013
1 false 03/26/2013
1 false 03/25/2013
1 true 03/24/2013
1 true 03/23/2013
2 true 03/22/2013
2 true 03/21/2013
2 true 03/20/2013
2 false 03/19/2013
3 true 03/18/2013
3 true 03/17/2013
For id="1", the result will be FALSE as the latest 3 created records don't have the value true for JOINED field in those 3 records.
For id="2", the result will be TRUE as the latest 3 created records have true JOINED field in those 3 records.
For id="3", the result will be FALSE as the latest created records to be checked must be minimum 3 records.

(Answer given before OP specified 2008. The below only works on 2012)
This query gives (for each ID value) the number of rows in the last 10 for which flag is equal to 1. It should be simple enough (if required) to filter this to only rows for which the count is 10, and to restrict it to a single ID value.
Without better sample data, I'll leave it at that for now:
;with Vals as (
select
*,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY CreatedDate DESC) as rn,
SUM(CASE WHEN Flag = 1 THEN 1 ELSE 0 END)
OVER (PARTITION BY ID
ORDER BY CreatedDate ASC
ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) as Cnt
from
T1
)
select * from Vals where rn = 1
(This does depend on the SQL Server 2012 version of the OVER clause - but you didn't specify which version)
Result:
ID Flag CreatedDate rn Cnt
----------- ----- ----------------------- -------------------- -----------
1 1 2012-01-12 00:00:00.000 1 10
2 1 2012-01-12 00:00:00.000 1 9
3 1 2012-01-12 00:00:00.000 1 6
(Only ID 1 meets your criteria)
Sample data:
create table T1 (ID int not null,Flag bit not null,CreatedDate datetime not null)
insert into T1 (ID,Flag,CreatedDate) values
(1,1,'20120101'),
(1,0,'20120102'),
(1,1,'20120103'),
(1,1,'20120104'),
(1,1,'20120105'),
(1,1,'20120106'),
(1,1,'20120107'),
(1,1,'20120108'),
(1,1,'20120109'),
(1,1,'20120110'),
(1,1,'20120111'),
(1,1,'20120112'),
(2,1,'20120101'),
(2,1,'20120102'),
(2,1,'20120103'),
(2,1,'20120104'),
(2,1,'20120105'),
(2,1,'20120106'),
(2,0,'20120107'),
(2,1,'20120108'),
(2,1,'20120109'),
(2,1,'20120110'),
(2,1,'20120111'),
(2,1,'20120112'),
(3,1,'20120107'),
(3,1,'20120108'),
(3,1,'20120109'),
(3,1,'20120110'),
(3,1,'20120111'),
(3,1,'20120112')

In SQLServer2008 instead of subquery you can use CTE with ROW_NUMBER() ranking function
;WITH cte AS
(
SELECT ID, CAST(Joined AS int) AS Flag,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY CreatedDate DESC) AS rn
FROM dbo.test63 t
)
SELECT ID, CASE WHEN SUM(Flag) != 3 THEN 0 ELSE 1 END AS Flag
FROM cte
WHERE rn <= 3
GROUP BY ID
Demo on SQLFiddle

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

sql how to assign the same ID for the same group - sql

Related

SQL Query getting the latest record of the Group and calculate the value of those particular records

Oracle SQL - select last 3 rows after a specific row

How can I create this conditional grouped field on SQL Server 2008?

How to identify subsequent user actions based on prior visits

SQL Select with Group By and Order By Date

Categories

Resources