Group Matching Values and Finding Which Ones Are Missing an Associated Value? - sql

I need help with writing a query to generate a result that will provide me with all record numbers NOT assigned to Group D from a table structured like the below example. From the below table my desired result would be record number "3" .
Record_Number Assigned_To_Group
1 A
1 B
1 C
1 D
2 A
2 E
2 D
3 A
3 B
3 E

One method uses aggregation:
select Record_Number
from t
group by Record_Number
having sum(case when Assigned_To_Group = 'D' then 1 else 0 end) = 0;

Related

How to check the count of each values repeating in a row

I have two tables. Data in the first table is:
ID Username
1 Dan
2 Eli
3 Sean
4 John
Second Table Data:
user_id Status_id
1 2
1 3
4 1
3 2
2 3
1 1
3 3
3 3
3 3
. .
goes on goes on
These are my both tables.
I want to find the frequency of individual users doing 'status_id'
My expected result is:
username status_id(1) status_id(2) status_id(3)
Dan 1 1 1
Eli 0 0 1
Sean 0 1 2
John 1 0 0
My current code is:
SELECT b.username , COUNT(a.status_id)
FROM masterdb.auth_user b
left outer join masterdb.xmlform_joblist a
on a.user1_id = b.id
GROUP BY b.username, b.id, a.status_id
This gives me the separate count but in a single row without mentioning which status_id each column represents
This is called pivot and it works in two steps:
extracts the data for the specific field using a CASE statement
aggregates the data on users, to make every field value lie on the same record for each user
SELECT Username,
SUM(CASE WHEN status_id = 1 THEN 1 END) AS status_id_1,
SUM(CASE WHEN status_id = 2 THEN 1 END) AS status_id_2,
SUM(CASE WHEN status_id = 3 THEN 1 END) AS status_id_3
FROM t2
INNER JOIN t1
ON t2.user_id = t1._ID
GROUP BY Username
ORDER BY Username
Check the demo here.
Note: This solution assumes that there are 3 status_id values. If you need to generalize on the amount of status ids, you would require a dynamic query. In any case, it's better to avoid dynamic queries if you can.

How to check the value of any row in a group after a previous one fulfils a condition?

I have a dataset grouped by test subjects that is filled according to the actions they perform. I need to find which customer does A and then, at some point, does B; but it doesn't necessarily have to be in the next action/row. And it can't be first does B and then A, it has to be specifically in that order. For example, I have this table:
Subject ActionID ActionOrder
1 A 1
1 C 2
1 D 3
1 B 4
1 C 5
2 D 1
2 A 2
2 C 3
2 B 4
3 B 1
3 D 2
3 A 3
4 A 1
Here subjects 1 and 2 are the ones that fulfil the order of actions condition. While 3 does not because it performs the actions in reverse order. And 4 only does action A
How can I get only subjects 1 and 2 as results? Thank you very much
Use conditional aggregation:
SELECT Subject
FROM tablename
WHERE ActionID IN ('A', 'B')
GROUP BY Subject
HAVING MAX(CASE WHEN ActionID = 'A' THEN ActionOrder END) <
MIN(CASE WHEN ActionID = 'B' THEN ActionOrder END)
See the demo.
Consider below option
select Subject
from (
select Subject,
regexp_replace(string_agg(ActionID, '' order by ActionOrder), r'[^AB]', '') check
from `project.dataset.table`
group by Subject
)
where not starts_with(check, 'B')
and check like '%AB%'
Above assumes that Subject can potentially do same actions multiple times that's why few extra checks in where clause. Other wise it would be just check = 'AB'

Best way to by column and aggregation on another column

I want to create a rank column using existing rank and binary columns. Suppose for example a table with ID, RISK, CONTACT, DATE. The existing rank is RISK, say 1,2,3,NULL, with 3 being the highest. The binary-valued is CONTACT with 0,1 or FAILURE/SUCESS. I want to create a new RANK that will order by RISK once a certain number of successful contacts has been exceeded.
For example, suppose the constraint is a minimum of 2 successful contacts. Then the rank should be created as follows in the two instances below:
Instance 1. Three ID, all have a min of two successful contacts. In that case the rank mirrors the risk:
ID risk contact date rank
1 3 S 1 3
1 3 S 2 3
1 3 F 3 3
1 3 F 4 3
2 2 S 1 2
2 2 S 2 2
2 2 F 3 2
2 2 F 4 2
3 1 S 1 1
3 1 S 2 1
3 1 S 3 1
Instance 2. Suppose ID=1 has only one successful contact. In that case it is relegated to the lowest rank, rank=1, while ID=2 gets the highest value, rank=3, and ID=3 maps to rank=2 because it satisfies the constraint but has a lower risk value than ID=2:
ID risk contact date rank
1 3 S 1 1
1 3 F 2 1
1 3 F 3 1
1 3 F 4 1
2 2 S 1 3
2 2 S 2 3
2 2 F 3 3
2 2 F 4 3
3 1 S 1 2
3 1 S 2 2
3 1 S 3 2
This is SQL, specifically Hive. Thanks in advance.
Edit - I think Gordon Linoff's code does it correctly. In the end, I used three interim tables. The code looks like that:
First,
--numerize risk, contact
select A.* ,
case when A.risk = 'H' then 3
when A.risk = 'M' then 2
when A.risk = 'L' then 1
when A.risk is NULL then NULL
when A.risk = 'NULL' then NULL
else -999 end as RISK_RANK,
case when A.contact = 'Successful' then 1
else NULL end as success
Second,
-- sum_successes_by_risk
select A.* ,
B.sum_successes_by_risk
from T as A
inner join
(select A.person, A.program, A.risk, sum(a.success) as sum_successes_by_risk
from T as A
group by A.person, A.program, A.risk
) as B
on A.program = B.program
and A.person = B.person
and A.risk = B.risk
Third,
--Create table that contains only max risk category
select A.* ,
B.max_risk_rank
from T as A
inner join
(select A.person, max(A.risk_rank) as max_risk_rank
from T as A
group by A.person
) as B
on A.person = B.person
and A.risk_rank = B.max_risk_rank
This is hard to follow, but I think you just want window functions:
select t.*,
(case when sum(case when contact = 'S' then 1 else 0 end) over (partition by id) >= 2
then risk
else 1
end) as new_risk
from t;

To pull 1 record out of multiple records having same data in a field based on other fields

A | B | C | D | E
a y 6 12 21
b n 3 10 5
c n 4 12 12
c n 7 12 2
c y 1 12 22
d n 6 10 32
d n 7 10 32
OUTPUT TABLE:
A | B | C | F
a y 6 21
b n 3 12
c y 1 22
d n 6 10
I have a table that contains certain fields. From that table I want to remove duplicate records in A and produce the output table.
Now, the field F is calculated based on the field C when there are no duplicates for the records in A. So, if there is only one record of a in A then if C>5 then the F Column(Output table) pulls the record in E column. So, if record b has the value <5 in field C, then the F column (output table) will pull the record in D column for b. I have been able to achieve this using a case statement.
However, when there are duplicate records in column A, I want only one of the records based on the column B. Only that record should be pulled that has the value 'y' in column B and where the column F contains the value from column E. If none of the duplicate records in A have a value of 'n' in the B column, then pull any record with column D as column F in the output table. I am not able to figure out this part.
Please let me know if anything is not clear.
Code I am using:
SELECT A,B,C,
CASE
WHEN (SELECT COUNT(*) FROM MyTable t2 WHERE t1.A=t2.A)>1
THEN (SELECT TOP 1 CASE WHEN b='y' THEN E ELSE D END
FROM MyTable t3
WHERE t3.A=t1.A
ORDER BY CASE WHEN b='y' THEN 0 ELSE 1 END)
ELSE {
case when cast(C as float) >= 5.00 then (Case when E = '0.00' then D else E end)
when cast(C as float)< 5.00 then D end )
}
END AS F
FROM MyTable t1
You might want to encapsulate this logic in a Function to make it look cleaner, but the logic would go like this:
IF the record count of rows in the table with the same value for A as the current row is greater than 1, THEN SELECT the TOP 1 record with this value for A ORDER BY CASE WHEN b='y' THEN 0 ELSE 1 END
Use another CASE WHEN b='y' to determine if you will use column E or D for output column F.
And ELSE (the record count is not greater than 1), use your existing CASE expression.
EDIT: Here is a more psuedo-codey explanation:
WITH cte AS (SELECT A,B,C,
ROW_NUMBER() OVER (PARTITION BY A, ORDER BY CASE WHEN b='y' THEN 0 ELSE 1 END) rn
FROM MyTable
)
SELECT A,B,C,
CASE
WHEN (SELECT COUNT(*) FROM MyTable t2 WHERE t1.A=t2.A)>1
THEN CASE WHEN b='y' THEN E ELSE D END
ELSE {use your existing CASE Expression}
END AS F
FROM cte t1
WHERE rn=1

Count number of not exist in child table

Essentially what I'm trying to do is count the number of rows something doesn't exist in an audit/history table. I'd like the following query to return a count of one per detail. Currently it gives me one per row in the history table.
--Detail Table
ID DETAIL_GROUP
1 A
2 B
3 B
--Detail History Table
DETAIL_ID_FK VALUE1
1 NOT_MATCH
1 NOT_MATCH
2 MATCH
2 NOT_MATCH
3 MATCH
3 NOT_MATCH
SELECT D.DETAIL_GROUP, COUNT(*)
FROM DETAIL D
WHERE (NOT EXISTS(
SELECT NULL
FROM DETAIL_HISTORY HI
WHERE HI.D_ID_FK = D.ID
AND HI.VALUE1 = 'MATCH'))
GROUP BY D.DETAIL_GROUP;
I'd like to see the following result:
DETAIL_GROUP COUNT(*)
A 1
but I'm receiving the following result:
DETAIL_GROUP COUNT(*)
A 2
Thank you in advance for any assistance provided.
Assuming that your detail table is as follows:
D_ID VALUE1
1 MATCH
1 NOT_MATCH
2 MATCH
2 NOT_MATCH
3 MATCH
3 NOT_MATCH
The below query:
SELECT d.detail_group, count(*)
FROM detail d
JOIN detail_history dh ON dh.d_id = d.id
WHERE dh.value1 = 'MATCH'
GROUP BY d.detail_group
Would produce:
DETAIL_GROUP COUNT(*)
A 1
B 2
The above query creates the groups matching the ids and then goes into each group and restricts the items based on value1.