Teradata/SQL, select all rows until a certain value is reached per partition - sql

I'd like to select all rows from a table until (and including) a certain value is reached per partition. In this case all rows per id that precede when status has the value 'b' for the last time. Note: the timestamp is in order per id
id
name
status
status
timestamp
1
Sta
open
a
10:50:09.000000
1
Danny
open
c
10:50:19.000000
1
Elle
closed
b
10:50:39.000000
2
anton
closed
a
16:00:09.000000
2
jill
done
b
16:00:19.000000
2
tom
open
b
16:05:09.000000
2
bill
open
c
16:07:09.000000
3
ann
done
b
08:00:13.000000
3
stef
done
b
08:12:13.000000
3
martin
open
b
08:25:13.000000
3
jeff
open
a
09:00:13.000000
3
luke
open
c
09:07:13.000000
3
karen
open
c
09:15:13.000000
3
lucy
open
a
10:00:13.000000
The output would look like this:
id
name
status
status
timestamp
1
Sta
open
a
10:50:09.000000
1
Danny
open
c
10:50:19.000000
1
Elle
closed
b
10:50:39.000000
2
anton
closed
a
16:00:09.000000
2
jill
done
b
16:00:19.000000
2
tom
open
b
16:05:09.000000
3
ann
done
b
08:00:13.000000
3
stef
done
b
08:12:13.000000
3
martin
open
b
08:25:13.000000
I've tried to solve this using qualify with rank etc. but unfortunately with no succes. would be appreciated if somebody would be able to help me!

all rows per id that precede when status has the value 'b' for the last time is the same as no rows before value 'b' occurs the first time when you revert the sort order:
SELECT *
FROM tab
QUALIFY -- tag the last 'b'
Count(CASE WHEN status = 'b' THEN 1 end)
Over (PARTITION BY id
ORDER BY timestamp DESC
ROWS Unbounded Preceding) > 0
ORDER BY id, timestamp
;
This will not return ids where no 'b' exists.
If you want to return those, too, add another condition to QUALIFY:
OR -- no 'b' found
Count(CASE WHEN status = 'b' THEN 1 end)
Over (PARTITION BY id) = 0
As both counts share the same partition, it's still a single STAT step in Explain.

Related

Assign incremental id based on number series in ordered sql table

My table of interview candidates has three columns and looks like this (attempt is what I want to calculate):
candidate_id
interview_stage
stage_reached_at
attempt <- want to calculate
1
1
2019-01-01
1
1
2
2019-01-02
1
1
3
2019-01-03
1
1
1
2019-11-01
2
1
2
2019-11-02
2
1
1
2021-01-01
3
1
2
2021-01-02
3
1
3
2021-01-03
3
1
4
2021-01-04
3
The table represents candidate_id 1 who has had 3 separate interview attempts at a company.
Made it to interview_stage 3 on the 1st attempt
Made it to interview_stage 2 on the 2nd attempt
Made it to interview_stage 4 on the 3d attempt
Question: Can I somehow use the number series if I order by stage_reached_at? As soon as the next step for a particular candidate_id is lower than the row before, I know it's a new process.
I want to be able to group on candidate_id and process_grouping at the end of the day.
Thx in advance.
You can use lag() and then a cumulative sum:
select t.*,
sum(case when prev_interview_stage >= interview_stage then 1 else 0 end) over (partition by candidate_id order by stage_reached_at) as attempt
from (select t.*,
lag(interview_stage) over (partition by candidate_id order by stage_reached_at) as prev_interview_stage
from t
) t;
Note: Your question specifically says "lower". I wonder, though, if you really mean "lower or equal to". If the latter, change the >= to >.

How to count distinct a field cumulatively using recursive cte or other method in SQL?

Using example below, Day 1 will have 1,3,3 distinct name(s) for A,B,C respectively.
When calculating distinct name(s) for each house on Day 2, data up to Day 2 is used.
When calculating distinct name(s) for each house on Day 3, data up to Day 3 is used.
Can recursive cte be used?
Data:
Day
House
Name
1
A
Jack
1
B
Pop
1
C
Anna
1
C
Dew
1
C
Franco
2
A
Jon
2
B
May
2
C
Anna
3
A
Jon
3
B
Ken
3
C
Dew
3
C
Dew
Result:
Day
House
Distinct names
1
A
1
1
B
1
1
C
3
2
A
2 (jack and jon)
2
B
2
2
C
3
3
A
2 (jack and jon)
3
B
3
3
C
3
Without knowing the need and size of data it'll be hard to give an ideal/optimal solution. Assuming a small dataset needing a quick and dirty way to calculate, just use sub query like this...
SELECT p.[Day]
, p.House
, (SELECT COUNT(DISTINCT([Name]))
FROM #Bing
WHERE [Day]<= p.[Day] AND House = p.House) DistinctNames
FROM #Bing p
GROUP BY [Day], House
ORDER BY 1
There is no need for a recursive CTE. Just mark the first time a name is seen in a house and use a cumulative sum:
select day, house,
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (partition by house order by day) as num_unique_names
from (select t.*,
row_number() over (partition by house, name order by day) as seqnum
from t
) t
group by day, house

SQL to pick the next value

I have a table of values. Each value may have 1 or more entry, but only 1 should be active at any one time. The table has a primary INT ID
I need a method to make the 'current' value inactive and make the 'next' value the active value. If the current active value is the last active, instead make the first value active. Values with only 1 entry will always be active.
The sequence should work like below
Is anyone able to provide a way to achieve this?
You should not be showing runs in separate columns. Your data should put this information in separate rows. So your data should have a separate set of rows for each run:
id value run active
1 Apple 1 1
2 Apple 1 0
3 Apple 1 0
4 Banana 1 1
5 Banana 1 0
6 Cherry 1 1
1 Apple 2 0
2 Apple 2 1
3 Apple 2 0
4 Banana 2 0
5 Banana 2 1
6 Cherry 2 1
You can add the next run as:
with r as
select t.*, max(run) over () as max_run,
row_number() over (partition by run, value order by id) as seqnum,
lag(active) over (partition by run, value order by id) as prev_active
from runs
)
insert into runs (id, value, run, active)
select id, value, max_run + 1,
(case when prev_active = 1 then 1
when prev_active is null or seqnum = 1 then 1
else 0
end) as active
from r
where run = max_run;
Simply make a check, that is select id from the table is not max(id) of that table, then update the log to inactive and then update the id+1 to active.
And if select id from the table is max(ID) then simply update that row to inactive and update min(ID) to active.
build the query, itll be fun.

SQL Server : how can I get difference between counts of total rows and those with only data

I have a table with data as shown below (the table is built every day with current date, but I left off that field for ease of reading).
This table keeps track of people and the doors they enter on a daily basis.
Table entrance_t:
id entrance entered
------------------------
1 a 0
1 b 0
1 c 0
1 d 0
2 a 1
2 b 0
2 c 0
2 d 0
3 a 0
3 b 1
3 c 1
3 d 1
My goal is to report on people and count entrances not used(grouping on people), but ONLY if they entered(entered=1).
So using the above table, I would like the results of query to be...
id count
----------
2 3
3 1
(id=2 did not use 3 of the entrances and id=3 did not use 1)
I tried queries(some with inner joins on two instances of same table) and I can get the entrances not used, but it's always for everybody. Like this...
id count
----------
1 4
2 3
3 1
How do I not display results id=1 since they did not enter at all?
Thank you,
You could use conditional aggregation:
SELECT id, count(CASE WHEN entered = 0 THEN 1 END) AS cnt
FROM entrance_t
GROUP BY id
HAVING count(CASE WHEN entered = 1 THEN 1 END) > 0;
DBFiddle Demo

Conditional Row Deleting in SQL

I have a table that contains 4 columns. I need to remove some of the rows based on the Code and ID columns. A code of 1 initiates the process I'm trying to track and a code of 2 terminates it. I would like to remove all rows for a specific ID when a code of 2 comes after a code of 1 and there is not an additional code 1. For example, my current data set looks like this:
Code Deposit Date ID
1 $100 3/2/2016 5
2 $0 3/1/2016 5
1 $120 2/8/2016 5
1 $120 3/22/2016 4
2 $70 2/8/2016 3
1 $120 1/3/2016 3
2 $0 6/15/2015 2
1 $120 3/22/2016 2
1 $50 8/15/2015 1
2 $200 8/1/2015 1
After I run my script I would like it to look like this:
Code Deposit Date ID
1 $100 3/2/2016 5
2 $0 3/1/2016 5
1 $120 2/8/2016 5
1 $120 3/22/2016 4
1 $50 8/15/2015 1
2 $200 8/1/2015 1
In all I have about 150,000 ID's in my actual table but this is the general idea.
You can get the ids using logic like this:
select t.id
from t
group by t.id
having max(case when code = 2 then date end) > min(case when code = 1 then date end) and -- code 2 after code 1
max(case when code = 2 then date end) > max(case when code = 1 then date end) -- no code 1 after code2
It is then easy enough to incorporate this into a query to get the rest of the details:
select t.*
from t
where t.id not in (select t.id
from t
group by t.id
having max(case when code = 2 then date end) > min(case when code = 1 then date end) and -- code 2 after code 1
max(case when code = 2 then date end) > max(case when code = 1 then date end)
);
The approach I took was to add up the Code per each ID. If it equals 3 exactly, it should be removed.
;WITH keepID as (
Select
ID
,SUM(code) as 'sumCode'
From #testInit
Group by ID
HAVING SUM(code) <> 3
)
Select *
From #testInit
Where ID IN (Select ID from keepID)
Your post showed keeping ID = 1 which does not seem to fit the criteria ? Are you sure you would be keeping ID = 1 ? It only as 2 records with a code of 1 and a code of 2 which adds up to 3 ... thus, remove it.
I just showed the approach in logic ... let me know if you need help with the delete code.
delete from table
where table.id in
(select id from B where A.id=B.id and B.date>A.date
from
(select code,id,max(date),id where code=1 group by id) as A,
(select code ,id,max(date),id where code=2 group by id) as B)
explanation: select code,id,max(date),id where code=1 as A
will fetch data with the highest date for a specific id of code 1
select code ,id,max(date),id where code=2 group by id) as B
will fetch data with the highest date for a specific id of code 2
select id from B where A.id=B.id and B.date>A.date wil select all the ids for which the code 2 date is higher than code 1 date.