I have 3 different recommendation model that gives me the output in three different tables.
Recommendation 1 : In a ideal situation, I want to take top 2 recommendation per user from this table ordered by ProductRecommendation ascending.
Recommendation 2 : In a ideal situation, I want to take top 3 recommendation per user from this table based on top score.
Recommendation 3 : In a ideal situation, take remaining recommendation from this table to add up to 5 recommendation per user
In the end, I want to see a final output which is a merge of all the recommendation into one which would look like this.
I want to take top 5 recommendation across 3 different tables. FYI, not all the user id can appear in all the tables. Ideally, I want to take TOP 2 from recommendation 1, TOP 3 from recommendation 2. Recommendation 3 is just there so that if there are not enough recommendation from the first two table then recommendation 3 will compensate so at the end I will get 5 results per userID. I don't need to refer to recommendation 3 if I can get 5 recommendation (2 from recommendation 1 and 3 from recommendation 2). when the recommendation 1 has < 2 recommendations per user then I want to get the remaining of the recommendation from recommendation 2. For example, when there is 1 recommendation in Recommendtiaon1 then get 4 recommendation from Recommendation2. Alternatively, if there are 0 recommendation in Recommendation1 then get 5 recommendation from Recommendation2. If Recommednation1 and Recommendation2 doesn't add up to 5 that's when I need to refer to recommendation3. I need to do this in big query SQL. Can you please help?
Thanks for your help.
Consider below approach
with output1 as (
select *, null as Score, row_number() over win pos
from Recommendation1
where true
qualify row_number() over win <= 2
window win as (partition by UserID order by ProductRecommendation)
), output2 as (
select *, 2 + row_number() over win pos
from Recommendation2
where not (UserID, ProductRecommendation) in (select as struct UserID, ProductRecommendation from output1)
qualify row_number() over win <= 5
window win as (partition by UserID order by Score desc)
), output3 as (
select *, 7 + row_number() over win pos
from Recommendation3
where not (UserID, ProductRecommendation) in (select as struct UserID, ProductRecommendation from output1)
and not (UserID, ProductRecommendation) in (select as struct UserID, ProductRecommendation from output2)
qualify row_number() over win <= 5
window win as (partition by UserID order by Score desc)
)
select * except(pos) from (
select * from output1 union all
select * from output2 union all
select * from output3
)
where true
qualify row_number() over win <=5
window win as (partition by UserID order by pos)
# order by UserID, pos
if applied to sample data in your question - the output is
Your description is a bit unclear. The following takes 2 rows from the first table for each user, 3 from the second, and additional rows from the third. The outer query then ensures that there are 5 rows (if available) for each user:
select r.*
from ((select userid, recommendation, 1 as which
from recommendation1
where 1=1
qualify row_number() over (partition by userid order by recommendation) <= 2
) union all
(select userid, recommendation, 2 as which
from recommendation2
where 1=1
qualify row_number() over (partition by userid order by score desc) <= 3
) union all
(select userid, recommendation, 3 as which
from recommendation3
)
) r
where 1=1
qualify row_number() over (partition by userid order by which) <= 5;
Related
I'm running a query like this:
SELECT id FROM table
WHERE table.type IN (1, 2, 3)
LIMIT 15
This returns a random sampling. I might have 7 items from class_1 and 3 items from class_2. I would like to return exactly 5 items from each class, and the following code works:
SELECT id FROM (
SELECT id, type FROM table WHERE type = 1 LIMIT 5
UNION
SELECT id, type FROM table WHERE type = 2 LIMIT 5
UNION ...
ORDER BY type ASC)
This gets unwieldy if I want a random sampling from ten classes, instead of only three. What is the best way to do this?
(I'm using Presto/Hive, so any tips for those engines would be appreciated).
Use a function like row_number to do this. This makes the selection independent of the number of types.
SELECT id,type
FROM (SELECT id, type, row_number() over(partition by type order by id) as rnum --adjust the partition by and order by columns as needed
FROM table
) T
WHERE rnum <= 5
I would strongly suggest adding ORDER BY. Anyway, you can do something like:
with
x as (
select
id,
type,
row_number() over(partition by type order by id) as rn
from table
)
select * from x where rn <= 5
In my table I have 4 columns Id, Type InitialRanking & FinalRanking. Based on certain criteria I’ve managed to apply InitialRanking to the records (1-20). I now need to apply FinalRanking by identifying the top 7 of Type 1 followed by the
top 3 of Type 2. Then I need to repeat the above until all records have a FinalRanking. My goal would be to achieve the output in the final column of the attached image.
The 7 & 3 will vary over time but for the purposes of this example let’s say they are fixed.
you can try like this
SELECT * FROM(
( SELECT ID,DISTINCT TYPE,
CASE WHEN TYPE=1 THEN
( SELECT TOP 7 INITIALRANK, FINALRANK
from table where type=1)
ELSE
( SELECT TOP 3 INITIALRANK, FINALRANK
from table where type=2)
END CASE
FROM TABLE WHERE TYPE IN (1,2)
)
UNION
( SELECT ID,TYPE,
INITIALRANK, FINALRANK
from table where type not in (1,2))
)
)
A simple (or simplistic) approach to your Final Rank would be the following:
row_number() over (partition by type order by initrank) +
case type
when 1 then (ceil((row_number() over (partition by type order by initrank))/7)-1)*(10-7)
when 2 then (ceil((row_number() over (partition by type order by initrank))/3)-1)*(10-3)+7
end FinalRank
This can be generalized for more than 2 groups for example with three groups of size 7, 3 and 2, the pattern size is 7+3+2=12 the general form is PartitionedRowNum+(Ceil(PartitionedRowNum/GroupSize)-1)*(PaternSize-GroupSize)+Offset where the offset is the sum of the preceding group sizes:
row_number() over (partition by type order by initrank) +
case type
when 1 then (ceil((row_number() over (partition by type order by initrank))/7)-1)*(12-7)
when 2 then (ceil((row_number() over (partition by type order by initrank))/3)-1)*(12-3)+7
when 3 then (ceil((row_number() over (partition by type order by initrank))/2)-1)*(12-2)+7+3
end FinalRank
I have a table as like below
Id RC_CLASS RC_DATE RC_TYPE
14 FI-321619 22-Jan-16 S
14 FI-399481 29-Jan-16 D
14 FI-321619 20-Jan-17 S
Here is what i tried
SELECT *
FROM (SELECT rc.*,
RANK() OVER (PARTITION BY ID,RC_CLASS order by rc__date) AS LATEST_VERSION
FROM table
)
WHERE LATEST_VERSION = 1
ORDER BY rc_vendorid;
Expected output
Id RC_CLASS RC_DATE RC_TYPE
14 FI-399481 29-Jan-16 D
14 FI-321619 20-Jan-17 S
I wanna group by ID and Class and bring top one sort by the RC_DATE. What i am getting is always the top one based on date, partition is not working here. What is missing?
I think you are very close. Basically, you just need a descending sort to get the latest version:
SELECT rc.*
FROM (SELECT rc.*,
RANK() OVER (PARTITION BY ID, RC_CLASS ORDER BY rc_date DESC) AS LATEST_VERSION
FROM table rc
) rc
WHERE LATEST_VERSION = 1
ORDER BY rc_vendorid;
I note that you use RANK() for this. This can return duplicates, if you have two rows on the same date. If that is not desirable, you can use ROW_NUMBER() which would arbitrarily choose one (if all the other keys are the same).
I'd like to select the 3 best results of a rank() function for each partition
For instance, in this query :
SELECT id, rank() over (PARTITION BY year order by ...) as rank
FROM table1
GROUP BY year
I'd like to have 3 best ranked for every year.
I can manage that by making a new :
Select *
from ...
where rank <= 3
but then if I have some equalities, i'll get more than 3 row per year.
Do someone have an idea how to solve that ?
We have not much information about your table and query structures, but as a generic solution I'd suggest to add row_number() over (ORDER BY ... desc) as rn and filter by it too with where rn = 1 like here.
I have 3 tables in my database which is
table 1 (users)
userid(PK)
EmployeeName
table 2 (SubDept)
SubDeptID(PK)
Department
table 3 (SubDeptTransfer)
TransferID(PK)
userid(FK)
SubDeptID(FK)
here is my example table for Table 3
what i wanted to do is to be able to print the SubDeptID of user 100. The problem is since there are two userid of 100 its printing both. the mission is to be able to print only one data with a latter TransferID. What could be the best select statement for the problem?
The best way to do this is using the window function row_number():
select transferId, userId, subDeptId
from (select t.*,
row_number() over (partition by userid order by TransferId desc) as seqnum
from t
) t
where seqnum = 1
I would do it like so:
SELECT subDeptId FROM SubDeptTransfer WHERE userId = 100 ORDER BY transferId DESC LIMIT 1