Best way to by column and aggregation on another column - sql

I want to create a rank column using existing rank and binary columns. Suppose for example a table with ID, RISK, CONTACT, DATE. The existing rank is RISK, say 1,2,3,NULL, with 3 being the highest. The binary-valued is CONTACT with 0,1 or FAILURE/SUCESS. I want to create a new RANK that will order by RISK once a certain number of successful contacts has been exceeded.
For example, suppose the constraint is a minimum of 2 successful contacts. Then the rank should be created as follows in the two instances below:
Instance 1. Three ID, all have a min of two successful contacts. In that case the rank mirrors the risk:
ID risk contact date rank
1 3 S 1 3
1 3 S 2 3
1 3 F 3 3
1 3 F 4 3
2 2 S 1 2
2 2 S 2 2
2 2 F 3 2
2 2 F 4 2
3 1 S 1 1
3 1 S 2 1
3 1 S 3 1
Instance 2. Suppose ID=1 has only one successful contact. In that case it is relegated to the lowest rank, rank=1, while ID=2 gets the highest value, rank=3, and ID=3 maps to rank=2 because it satisfies the constraint but has a lower risk value than ID=2:
ID risk contact date rank
1 3 S 1 1
1 3 F 2 1
1 3 F 3 1
1 3 F 4 1
2 2 S 1 3
2 2 S 2 3
2 2 F 3 3
2 2 F 4 3
3 1 S 1 2
3 1 S 2 2
3 1 S 3 2
This is SQL, specifically Hive. Thanks in advance.
Edit - I think Gordon Linoff's code does it correctly. In the end, I used three interim tables. The code looks like that:
First,
--numerize risk, contact
select A.* ,
case when A.risk = 'H' then 3
when A.risk = 'M' then 2
when A.risk = 'L' then 1
when A.risk is NULL then NULL
when A.risk = 'NULL' then NULL
else -999 end as RISK_RANK,
case when A.contact = 'Successful' then 1
else NULL end as success
Second,
-- sum_successes_by_risk
select A.* ,
B.sum_successes_by_risk
from T as A
inner join
(select A.person, A.program, A.risk, sum(a.success) as sum_successes_by_risk
from T as A
group by A.person, A.program, A.risk
) as B
on A.program = B.program
and A.person = B.person
and A.risk = B.risk
Third,
--Create table that contains only max risk category
select A.* ,
B.max_risk_rank
from T as A
inner join
(select A.person, max(A.risk_rank) as max_risk_rank
from T as A
group by A.person
) as B
on A.person = B.person
and A.risk_rank = B.max_risk_rank

This is hard to follow, but I think you just want window functions:
select t.*,
(case when sum(case when contact = 'S' then 1 else 0 end) over (partition by id) >= 2
then risk
else 1
end) as new_risk
from t;

Related

How to check the count of each values repeating in a row

I have two tables. Data in the first table is:
ID Username
1 Dan
2 Eli
3 Sean
4 John
Second Table Data:
user_id Status_id
1 2
1 3
4 1
3 2
2 3
1 1
3 3
3 3
3 3
. .
goes on goes on
These are my both tables.
I want to find the frequency of individual users doing 'status_id'
My expected result is:
username status_id(1) status_id(2) status_id(3)
Dan 1 1 1
Eli 0 0 1
Sean 0 1 2
John 1 0 0
My current code is:
SELECT b.username , COUNT(a.status_id)
FROM masterdb.auth_user b
left outer join masterdb.xmlform_joblist a
on a.user1_id = b.id
GROUP BY b.username, b.id, a.status_id
This gives me the separate count but in a single row without mentioning which status_id each column represents
This is called pivot and it works in two steps:
extracts the data for the specific field using a CASE statement
aggregates the data on users, to make every field value lie on the same record for each user
SELECT Username,
SUM(CASE WHEN status_id = 1 THEN 1 END) AS status_id_1,
SUM(CASE WHEN status_id = 2 THEN 1 END) AS status_id_2,
SUM(CASE WHEN status_id = 3 THEN 1 END) AS status_id_3
FROM t2
INNER JOIN t1
ON t2.user_id = t1._ID
GROUP BY Username
ORDER BY Username
Check the demo here.
Note: This solution assumes that there are 3 status_id values. If you need to generalize on the amount of status ids, you would require a dynamic query. In any case, it's better to avoid dynamic queries if you can.

SQL Query to get multiple resultant on single column

I have a table that looks something like this:
id name status
2 a 1
2 a 2
2 a 3
2 a 2
2 a 1
3 b 2
3 b 1
3 b 2
3 b 1
and the resultant i want is:
id name total count count(status3) count(status2) count(status1)
2 a 5 1 2 2
3 b 4 0 2 2
please help me get this result somehow, i can just get id, name or one of them at a time, don't know how to put a clause to get this table at once.
Here's a simple solution using group by and case when.
select id
,count(*) as 'total count'
,count(case status when 3 then 1 end) as 'count(status1)'
,count(case status when 2 then 1 end) as 'count(status3)'
,count(case status when 1 then 1 end) as 'count(status2)'
from t
group by id
id
total count
count(status3)
count(status2)
count(status1)
2
5
1
2
2
3
4
0
2
2
Fiddle
Here's a way to solve it using pivot.
select *
from (select status,id, count(*) over (partition by id) as "total count" from t) tmp
pivot (count(status) for status in ([1],[2],[3])) pvt
d
total count
1
2
3
3
4
2
2
0
2
5
2
2
1
Fiddle

Identify a FK which has the highest value from a list of values in its source table

I have following tables.
Part
id
name
1
Part 1
2
Part 2
3
Part 3
Operation
id
name
part_id
order
1
Op 1
1
10
2
Op 2
1
20
3
Op 3
1
30
4
Op 1
2
10
5
Op 2
2
20
6
Op 1
3
10
Lot
id
part_id
Operation_id
10
1
2
11
2
5
12
3
6
I am selecting the results from Lot table and I want to select a column last_Op which is based on the order value of the operation_id. If value of order for the operation_id is the highest for the respective part_id, return 1 else return 0
SELECT
id,
part_id,
operation_id,
last_Op
FROM Lot
expected result set based on the tables above.
id
part_id
operation_id
last_op
10
1
2
0
11
2
5
1
12
3
6
1
In above example, first row returns last_op = 0 because operation_id = 2 is associated with part_id = 1 and it has the highest order = 30. Since operation_id for this part is not pointing towards the highest order value, 0 is returned.
The other two rows return 1 because operation_id 5 and 6 are associated with part_id 2 and 3 respectively and they are pointing towards the highest 'order' value.
If value of order for the operation_id is the highest for the respective part_id, return 1 else return 0
This sounds like window functions will help:
select l.*,
(case when o.order = o.max_order then 1 else 0 end) as last_op
from lot l left join
(select o.*,
max(o.order) over (partition by o.part_id) as max_order
from operations o
) o
on l.operation_id = o.id;
Note: order is a very poor name for a column because it is a SQL keyword.

SQL Query - Convert data values into attributes into antother table.

I am building a report and I am stuck formulating a query. I am bringing the following data from multiple tables after a lot of joins.
ID TYPE RATING
----- ---- ------
ID_R1 A 1
ID_R1 B 3
ID_R2 A 2
ID_R2 B 1
ID_R3 A 4
ID_R3 B 4
ID_R4 A 2
ID_R4 B 3
ID_R5 A 2
ID_R5 B 3
What actually is happening is that Every ID will have a Rating for Type A & B so what I need to do is transform the above into the following
ID Type_A_Rating Type_B_Rating
----- ------------- -------------
ID_R1 1 3
ID_R2 3 1
ID_R3 4 4
ID_R4 2 3
ID_R5 2 3
I have think group by and different techniques but so far I am unable to come up with a solution. Need help F1! F1!
p.s just for the record my end game is getting the count of (A,B) combinations
Type_A_Rating Type_B_Rating Count
------------- ------------- -----
1 1 0
1 2 0
1 3 1
1 4 0
2 1 0
2 2 0
2 3 2
2 4 0
3 1 1
3 2 0
3 3 0
3 4 0
4 1 0
4 2 0
4 3 0
4 4 1
From this you can see that a simple GROUP BY with any form AND OR conditions doesn't suffice until I get the data as mentioned. I could use two intermediate/temp tables, in one get Type_A_Rating with ID and then in second Type_B_Rating with ID and then in another combine both but isn't there a better way.
This should work as SQL engine agnostic solution (provided that there is exactly one row with type A for each ID and one row with type B for each ID):
select
TA.ID,
TA.RATING as Type_A_Rating,
TB.RATING as Type_B_Rating
from
(select ID, RATING
from T where TYPE = 'A') as TA
inner join
(select ID, RATING
from T where TYPE = 'B') as TB
on TA.ID = TB.ID
Related SQL Fiddle: http://sqlfiddle.com/#!9/7e6fd9/2
Alternative (simpler) solution:
select
ID,
sum(case when TYPE = 'A' then RATING else 0 end) as Type_A_Rating,
sum(case when TYPE = 'B' then RATING else 0 end) as Type_B_Rating
from
T
group by
ID
Fiddle: http://sqlfiddle.com/#!9/7e6fd9/3
EDIT:
The above is correct but both can be simplified a bit:
select TA.ID, TA.RATING as Type_A_Rating, TB.RATING as Type_B_Rating
from T TA join
T TB
on TA.ID = TB.ID AND A.type = 'A' and B.type = 'B';
And (because I prefer NULL when there are no matches:
select ID,
max(case when TYPE = 'A' then RATING end) as Type_A_Rating,
max(case when TYPE = 'B' then RATING end) as Type_B_Rating
from T
group by ID

Count occurrences of field values as they are displayed in order

thanks in advance for the help and sorry for how the "table" looks. Here's my question...
Let's say I have a subquery with this table (imagine the bold as column headers) as its output -
id 1 1 2 3 3 3 3 4 5 6 6 6
action o c o c c o c o o c c c
I would like my new query to output -
id 1 1 2 3 3 3 3 4 5 6 6 6
action o c o c c o c o o c c c
ct 1 2 1 1 2 3 4 1 1 1 2 3
#c 0 1 0 1 2 2 3 0 0 1 2 3
#o 1 1 1 0 0 1 1 1 1 0 0 0
where ct stands for count. Basically, I want to count (for each id) the occurrences of consecutive id and action as they happen. Let me know if this makes sense, and if not, how I can clarify my question.
Note: I realize the lag/lead functions may be helpful in this situation, along with the row_number() function. Looking for as many creative solutions as possible!
You are looking for the row_number() analytic function:
select id, action, row_number() over (partition by id order by id) as ct
from table t;
For #c and #o, you want cumulative sum:
select id, action, row_number() over (partition by id order by id) as ct,
sum(case when action = 'c' then 1 else 0 end) over
(partition by id order by <some column here>) as "#c",
sum(case when action = 'c' then 1 else 0 end) over
(partition by id order by <some column here>) as "#o"
from table t;
The one caveat is that you need a way to specify the order of the rows -- an id or date time stamp or something. SQL result sets and tables are inherently unordered, so there is no idea that one row comes before or after another.
SQL> select id, action,
2 row_number() over(partition by id order by rowid) ct,
3 sum(decode(action,'c',1,0)) over(partition by id order by rowid) c#,
4 sum(decode(action,'o',1,0)) over(partition by id order by rowid) o#
5 from t1
6 /
ID A CT C# O#
---------- - ---------- ---------- ----------
1 o 1 0 1
1 c 2 1 1
2 o 1 0 1
3 c 1 1 0
3 c 2 2 0
3 o 3 2 1
3 c 4 3 1
4 o 1 0 1
5 o 1 0 1
6 c 1 1 0
6 c 2 2 0
6 c 3 3 0
P.S. Sorry Gordon, didn't see your post.