Selecting highest number (of 2 letters and number) - hive

enter image description here
Hello,
I would like to create a rule that will run only on class_id=12 and the output shows all the class_ids I have in the table - what is the right way to achieve it?
I have book_id consisting of 2 letters at first and numbers afterward.
I would like to pick for each Student_id the highest book_id number (NV5602 - in the example above). How could I do it ?
HQL
Thanks

Your first point is still not cleared. For 2nd point, you can try below -
SELECT Student_id, Book_id
FROM (SELECT Student_id,
Book_id,
ROW_NUMBER() OVER(ORDER BY SUBSTR(Book_id, 3, LENGTH(Book_id)) DESC) RN
FROM YOUR_TABLE) X
WHERE RN = 1;

Related

SQL numbering with different Code restarted

I have different code for each record. Let say: S, W, I
So, when first record is created for code S. The Id column should be S001, next S002. Then if there's a new record for W, it will be started from 001. So the Id would be W001 and so on.
How can I create this type of ID? The table looks like:
Id GroupCode Address
---------------------------
S001 S Brisbane
Ff.
You can use row_number():
select t.*,
concat(group_code,
format(row_number() over (partition by group_code order by (select null)), '0000')
) as new_id
from t;
Note that your table has not specified an ordering, so the assignment of values is arbitrary.

How do I grab each student’s 3rd max assignment mark in each subject

I am trying to write an sql that will allow me select each student’s 3rd best assignment mark in each subject. I have tried with the query below but it isn't working for me. I will be grateful to get some answers. I am getting an error [Code: 0, SQL State: 21000] ERROR: more than one row returned by a subquery used as an expression.
This is the table structure Students , Courses(Id) , bridging table called StudentsCourses(ID, StudentID,CourseID) and then assignment table which has StudentsCourse(FK) and Grade
select max(Assignments.Grade)
from Assignments
where grade < (select max(Assignments.Grade)
from Assignments
where grade < (select max(Assignments.Grade)
from Assignments
group by Assignments.StudentCourseID))
You can use window functions:
select *
from (
select a.*, row_number() over(partition by student_id, subject_id order by grade desc)
from assignments a
) a
where rn = 3
Your question is a bit unclear about the structure of table assignments. This assumes that a student is identified by student_id and a subject by subject_id - you many need to ajust that to your actual column names.
Use row_number():
select a.*
from (select a.*,
row_number() over (partition by student_id, StudentCourseID order by grade desc) as seqnum
from assignments a
) a
where seqnum = 3;
Note: If all the assignments have the same value, this will return the highest value.
If you want the third highest distinct score, then use dense_rank() instead of row_number().

Snowflake SQL code to show only second record for items with duplicate ID

I'm trying to get my head around SQL and am using Snowflake as a testbed to do this. I have a table with products which have multiple reviews against them. I am trying to structure a query to only show products with 2 or more reviews and then only show the second review. As I say, this is merely me trying to better understand SQL so selecting the second review is a random ask. The table is made up of 4 columns. 1 is Product ID, 2 is Product Name, 3 is Review and 4 is Date Review was posted.
Thanks in advance for any help.
You use row_number() for this type of query:
select t.*
from (select t.*,
row_number() over (partition by product_id order by date_review asc) as seqnum
from t
) t
where seqnum = 2;
You can use a windowing function like ROW_NUMBER() to make numbered groupings, eg:
WITH Review_Sequence (
SELECT r.*,
ROW_NUMBER() OVER (PARTITION BY Product_ID ORDER BY Review_Date) Review_No
FROM Reviews r
)
SELECT * FROM Review_Sequence WHERE Review_No = 2

Unable to find Max Age of a Player

i am a newbie to SQL.
I wanna find out what which player is oldest by age.
So here is my table..
Somehow my Query give error.
Can you please tell me where i am doing it wrong.
Thanks.
select * from players
where age = (select max(age) as Oldest_Player from players);
limit 1
SQL has a SELECT TOP command, which allows you to retrieve a set number of rows. You can do SELECT TOP 1 name AS 'Oldest Person' FROM players ORDER BY age DESC
What this will do is: first retrieve all the players, sort them by age descending (oldest first), then take the first one.
You can use row_number as below:
Select * from (
Select *, RowN = Row_Number() over(order by age desc) from Players
) a Where a.RowN = 1

SQL random sample with groups

I have a university graduate database and would like to extract a random sample of data of around 1000 records.
I want to ensure the sample is representative of the population so would like to include the same proportions of courses eg
I could do this using the following:
select top 500 id from degree where coursecode = 1 order by newid()
union
select top 300 id from degree where coursecode = 2 order by newid()
union
select top 200 id from degree where coursecode = 3 order by newid()
but we have hundreds of courses codes so this would be time consuming and I would like to be able to reuse this code for different sample sizes and don't particularly want to go through the query and hard code the sample sizes.
Any help would be greatly appreciated
You want a stratified sample. I would recommend doing this by sorting the data by course code and doing an nth sample. Here is one method that works best if you have a large population size:
select d.*
from (select d.*,
row_number() over (order by coursecode, newid) as seqnum,
count(*) over () as cnt
from degree d
) d
where seqnum % (cnt / 500) = 1;
EDIT:
You can also calculate the population size for each group "on the fly":
select d.*
from (select d.*,
row_number() over (partition by coursecode order by newid) as seqnum,
count(*) over () as cnt,
count(*) over (partition by coursecode) as cc_cnt
from degree d
) d
where seqnum < 500 * (cc_cnt * 1.0 / cnt)
Add a table for storing population.
I think it should be like this:
SELECT *
FROM (
SELECT id, coursecode, ROW_NUMBER() OVER (PARTITION BY coursecode ORDER BY NEWID()) AS rn
FROM degree) t
LEFT OUTER JOIN
population p ON t.coursecode = p.coursecode
WHERE
rn <= p.SampleSize
It is not necessary to partition the population at all.
If you are taking a sample of 1000 from a population among hundreds of course codes, it stands to reason that many of those course codes will not be selected at all in any one sampling.
If the population is uniform (say, a continuous sequence of student IDs), a uniformly-distributed sample will automatically be representative of population weighting by course code. Since newid() is a uniform random sampler, you're good to go out of the box.
The only wrinkle that you might encounter is if a student ID is a associated with multiple course codes. In this case make a unique list (temporary table or subquery) containing a sequential id, student id and course code, sample the sequential id from it, grouping by student id to remove duplicates.
I've done similar queries (but not on MS SQL) using a ROW_NUMBER approach:
select ...
from
( select ...
,row_number() over (partition by coursecode order by newid()) as rn
from degree
) as d
join sample size as s
on d.coursecode = s.coursecode
and d.rn <= s.samplesize