Conditional Row Deleting in SQL - sql

I have a table that contains 4 columns. I need to remove some of the rows based on the Code and ID columns. A code of 1 initiates the process I'm trying to track and a code of 2 terminates it. I would like to remove all rows for a specific ID when a code of 2 comes after a code of 1 and there is not an additional code 1. For example, my current data set looks like this:
Code Deposit Date ID
1 $100 3/2/2016 5
2 $0 3/1/2016 5
1 $120 2/8/2016 5
1 $120 3/22/2016 4
2 $70 2/8/2016 3
1 $120 1/3/2016 3
2 $0 6/15/2015 2
1 $120 3/22/2016 2
1 $50 8/15/2015 1
2 $200 8/1/2015 1
After I run my script I would like it to look like this:
Code Deposit Date ID
1 $100 3/2/2016 5
2 $0 3/1/2016 5
1 $120 2/8/2016 5
1 $120 3/22/2016 4
1 $50 8/15/2015 1
2 $200 8/1/2015 1
In all I have about 150,000 ID's in my actual table but this is the general idea.

You can get the ids using logic like this:
select t.id
from t
group by t.id
having max(case when code = 2 then date end) > min(case when code = 1 then date end) and -- code 2 after code 1
max(case when code = 2 then date end) > max(case when code = 1 then date end) -- no code 1 after code2
It is then easy enough to incorporate this into a query to get the rest of the details:
select t.*
from t
where t.id not in (select t.id
from t
group by t.id
having max(case when code = 2 then date end) > min(case when code = 1 then date end) and -- code 2 after code 1
max(case when code = 2 then date end) > max(case when code = 1 then date end)
);

The approach I took was to add up the Code per each ID. If it equals 3 exactly, it should be removed.
;WITH keepID as (
Select
ID
,SUM(code) as 'sumCode'
From #testInit
Group by ID
HAVING SUM(code) <> 3
)
Select *
From #testInit
Where ID IN (Select ID from keepID)
Your post showed keeping ID = 1 which does not seem to fit the criteria ? Are you sure you would be keeping ID = 1 ? It only as 2 records with a code of 1 and a code of 2 which adds up to 3 ... thus, remove it.
I just showed the approach in logic ... let me know if you need help with the delete code.

delete from table
where table.id in
(select id from B where A.id=B.id and B.date>A.date
from
(select code,id,max(date),id where code=1 group by id) as A,
(select code ,id,max(date),id where code=2 group by id) as B)
explanation: select code,id,max(date),id where code=1 as A
will fetch data with the highest date for a specific id of code 1
select code ,id,max(date),id where code=2 group by id) as B
will fetch data with the highest date for a specific id of code 2
select id from B where A.id=B.id and B.date>A.date wil select all the ids for which the code 2 date is higher than code 1 date.

Related

How to check the count of each values repeating in a row

I have two tables. Data in the first table is:
ID Username
1 Dan
2 Eli
3 Sean
4 John
Second Table Data:
user_id Status_id
1 2
1 3
4 1
3 2
2 3
1 1
3 3
3 3
3 3
. .
goes on goes on
These are my both tables.
I want to find the frequency of individual users doing 'status_id'
My expected result is:
username status_id(1) status_id(2) status_id(3)
Dan 1 1 1
Eli 0 0 1
Sean 0 1 2
John 1 0 0
My current code is:
SELECT b.username , COUNT(a.status_id)
FROM masterdb.auth_user b
left outer join masterdb.xmlform_joblist a
on a.user1_id = b.id
GROUP BY b.username, b.id, a.status_id
This gives me the separate count but in a single row without mentioning which status_id each column represents
This is called pivot and it works in two steps:
extracts the data for the specific field using a CASE statement
aggregates the data on users, to make every field value lie on the same record for each user
SELECT Username,
SUM(CASE WHEN status_id = 1 THEN 1 END) AS status_id_1,
SUM(CASE WHEN status_id = 2 THEN 1 END) AS status_id_2,
SUM(CASE WHEN status_id = 3 THEN 1 END) AS status_id_3
FROM t2
INNER JOIN t1
ON t2.user_id = t1._ID
GROUP BY Username
ORDER BY Username
Check the demo here.
Note: This solution assumes that there are 3 status_id values. If you need to generalize on the amount of status ids, you would require a dynamic query. In any case, it's better to avoid dynamic queries if you can.

SQL Query to get multiple resultant on single column

I have a table that looks something like this:
id name status
2 a 1
2 a 2
2 a 3
2 a 2
2 a 1
3 b 2
3 b 1
3 b 2
3 b 1
and the resultant i want is:
id name total count count(status3) count(status2) count(status1)
2 a 5 1 2 2
3 b 4 0 2 2
please help me get this result somehow, i can just get id, name or one of them at a time, don't know how to put a clause to get this table at once.
Here's a simple solution using group by and case when.
select id
,count(*) as 'total count'
,count(case status when 3 then 1 end) as 'count(status1)'
,count(case status when 2 then 1 end) as 'count(status3)'
,count(case status when 1 then 1 end) as 'count(status2)'
from t
group by id
id
total count
count(status3)
count(status2)
count(status1)
2
5
1
2
2
3
4
0
2
2
Fiddle
Here's a way to solve it using pivot.
select *
from (select status,id, count(*) over (partition by id) as "total count" from t) tmp
pivot (count(status) for status in ([1],[2],[3])) pvt
d
total count
1
2
3
3
4
2
2
0
2
5
2
2
1
Fiddle

Teradata/SQL, select all rows until a certain value is reached per partition

I'd like to select all rows from a table until (and including) a certain value is reached per partition. In this case all rows per id that precede when status has the value 'b' for the last time. Note: the timestamp is in order per id
id
name
status
status
timestamp
1
Sta
open
a
10:50:09.000000
1
Danny
open
c
10:50:19.000000
1
Elle
closed
b
10:50:39.000000
2
anton
closed
a
16:00:09.000000
2
jill
done
b
16:00:19.000000
2
tom
open
b
16:05:09.000000
2
bill
open
c
16:07:09.000000
3
ann
done
b
08:00:13.000000
3
stef
done
b
08:12:13.000000
3
martin
open
b
08:25:13.000000
3
jeff
open
a
09:00:13.000000
3
luke
open
c
09:07:13.000000
3
karen
open
c
09:15:13.000000
3
lucy
open
a
10:00:13.000000
The output would look like this:
id
name
status
status
timestamp
1
Sta
open
a
10:50:09.000000
1
Danny
open
c
10:50:19.000000
1
Elle
closed
b
10:50:39.000000
2
anton
closed
a
16:00:09.000000
2
jill
done
b
16:00:19.000000
2
tom
open
b
16:05:09.000000
3
ann
done
b
08:00:13.000000
3
stef
done
b
08:12:13.000000
3
martin
open
b
08:25:13.000000
I've tried to solve this using qualify with rank etc. but unfortunately with no succes. would be appreciated if somebody would be able to help me!
all rows per id that precede when status has the value 'b' for the last time is the same as no rows before value 'b' occurs the first time when you revert the sort order:
SELECT *
FROM tab
QUALIFY -- tag the last 'b'
Count(CASE WHEN status = 'b' THEN 1 end)
Over (PARTITION BY id
ORDER BY timestamp DESC
ROWS Unbounded Preceding) > 0
ORDER BY id, timestamp
;
This will not return ids where no 'b' exists.
If you want to return those, too, add another condition to QUALIFY:
OR -- no 'b' found
Count(CASE WHEN status = 'b' THEN 1 end)
Over (PARTITION BY id) = 0
As both counts share the same partition, it's still a single STAT step in Explain.

Best way to by column and aggregation on another column

I want to create a rank column using existing rank and binary columns. Suppose for example a table with ID, RISK, CONTACT, DATE. The existing rank is RISK, say 1,2,3,NULL, with 3 being the highest. The binary-valued is CONTACT with 0,1 or FAILURE/SUCESS. I want to create a new RANK that will order by RISK once a certain number of successful contacts has been exceeded.
For example, suppose the constraint is a minimum of 2 successful contacts. Then the rank should be created as follows in the two instances below:
Instance 1. Three ID, all have a min of two successful contacts. In that case the rank mirrors the risk:
ID risk contact date rank
1 3 S 1 3
1 3 S 2 3
1 3 F 3 3
1 3 F 4 3
2 2 S 1 2
2 2 S 2 2
2 2 F 3 2
2 2 F 4 2
3 1 S 1 1
3 1 S 2 1
3 1 S 3 1
Instance 2. Suppose ID=1 has only one successful contact. In that case it is relegated to the lowest rank, rank=1, while ID=2 gets the highest value, rank=3, and ID=3 maps to rank=2 because it satisfies the constraint but has a lower risk value than ID=2:
ID risk contact date rank
1 3 S 1 1
1 3 F 2 1
1 3 F 3 1
1 3 F 4 1
2 2 S 1 3
2 2 S 2 3
2 2 F 3 3
2 2 F 4 3
3 1 S 1 2
3 1 S 2 2
3 1 S 3 2
This is SQL, specifically Hive. Thanks in advance.
Edit - I think Gordon Linoff's code does it correctly. In the end, I used three interim tables. The code looks like that:
First,
--numerize risk, contact
select A.* ,
case when A.risk = 'H' then 3
when A.risk = 'M' then 2
when A.risk = 'L' then 1
when A.risk is NULL then NULL
when A.risk = 'NULL' then NULL
else -999 end as RISK_RANK,
case when A.contact = 'Successful' then 1
else NULL end as success
Second,
-- sum_successes_by_risk
select A.* ,
B.sum_successes_by_risk
from T as A
inner join
(select A.person, A.program, A.risk, sum(a.success) as sum_successes_by_risk
from T as A
group by A.person, A.program, A.risk
) as B
on A.program = B.program
and A.person = B.person
and A.risk = B.risk
Third,
--Create table that contains only max risk category
select A.* ,
B.max_risk_rank
from T as A
inner join
(select A.person, max(A.risk_rank) as max_risk_rank
from T as A
group by A.person
) as B
on A.person = B.person
and A.risk_rank = B.max_risk_rank
This is hard to follow, but I think you just want window functions:
select t.*,
(case when sum(case when contact = 'S' then 1 else 0 end) over (partition by id) >= 2
then risk
else 1
end) as new_risk
from t;

SQL Server : how can I get difference between counts of total rows and those with only data

I have a table with data as shown below (the table is built every day with current date, but I left off that field for ease of reading).
This table keeps track of people and the doors they enter on a daily basis.
Table entrance_t:
id entrance entered
------------------------
1 a 0
1 b 0
1 c 0
1 d 0
2 a 1
2 b 0
2 c 0
2 d 0
3 a 0
3 b 1
3 c 1
3 d 1
My goal is to report on people and count entrances not used(grouping on people), but ONLY if they entered(entered=1).
So using the above table, I would like the results of query to be...
id count
----------
2 3
3 1
(id=2 did not use 3 of the entrances and id=3 did not use 1)
I tried queries(some with inner joins on two instances of same table) and I can get the entrances not used, but it's always for everybody. Like this...
id count
----------
1 4
2 3
3 1
How do I not display results id=1 since they did not enter at all?
Thank you,
You could use conditional aggregation:
SELECT id, count(CASE WHEN entered = 0 THEN 1 END) AS cnt
FROM entrance_t
GROUP BY id
HAVING count(CASE WHEN entered = 1 THEN 1 END) > 0;
DBFiddle Demo