SQL query to take top elements of ordered list on Apache Hive - sql

I have the table below in an SQL database.
user rating
1 10
1 7
1 6
1 2
2 8
2 3
2 2
2 2
I would like to keep only the best two ratings by user to get:
user rating
1 10
1 7
2 8
2 3
What would be the SQL query to do that? I am not sure how to do it.

It will work
;with cte as
(select user,rating, row_number() over (partition by user order by rating desc) maxval
from yourtable)
select user,rating
from cte
where maxval in (1,2)

Related

Recursive query with CTE

I need some help with one query.
So, I already have CTE with the next data:
ApplicationID
CandidateId
JobId
Row
1
1
1
1
2
1
2
2
3
1
3
3
4
2
1
1
5
2
2
2
6
2
5
3
7
3
2
1
8
3
6
2
9
3
3
3
I need to find one job per candidate in a way, that this job was distinct for table.
I expect that next data from query (for each candidate select the first available jobid that's not taken by the previous candidate):
ApplicationID
CandidateId
JobId
Row
1
1
1
1
5
2
2
2
8
3
6
2
I have never worked with recursive queries in CTE, having read about them, to be honest, I don't fully understand how this can be applied in my case. I ask for help in this regard.
The following query returns the expected result.
WITH CTE AS
(
SELECT TOP 1 *,ROW_NUMBER() OVER(ORDER BY ApplicationID) N,
CONVERT(varchar(max), CONCAT(',',JobId,',')) Jobs
FROM ApplicationCandidateCTE
ORDER BY ApplicationID
UNION ALL
SELECT a.*,ROW_NUMBER() OVER(ORDER BY a.ApplicationID),
CONCAT(Jobs,a.JobId,',') Jobs
FROM ApplicationCandidateCTE a JOIN CTE b
ON a.ApplicationID > b.ApplicationID AND
a.CandidateId > b.CandidateId AND
CHARINDEX(CONCAT(',',a.JobId,','), b.Jobs)=0 AND
b.N = 1
)
SELECT * FROM CTE WHERE N = 1;
However, I have the following concerns:
The recursive CTE may extract too many rows.
The concatenated JobId may exceed varchar(max).
See dbfiddle.

How to COUNT in a specific column after GROUP BY

I'm stuck with how to write SQL statements, so I would appreciate it if you could teach me.
Current status
items table
id
session_id
item_id
competition_id
1
1
2
1
2
1
3
1
2
1
2
1
2
1
2
1
2
1
5
2
3
1
7
2
4
1
4
2
5
1
5
2
want to
grouping by competition_id,
Count the same numbers in item_id,Extract the most common numbers and their numbers.
For example
If competition_id is 1,item_id → 2 ,and the number is 3
If competition_id is 2,item_id → 5 ,and the number is 2
If competition_id is 3,・・・
If competition_id is 4,・・・
environment
macOS BigSur
ruby 2.7.0
Rails 6.1.1
sqlite
In statistics, what you are asking for is the mode, the most common value.
You can use aggregation and row_number():
select ct.*
from (select competition_id, item_id, count(*) as cnt,
row_number() over (partition by competition_id order by count(*) desc) as seqnum
from t
group by competition_id, item_id
) ci
where seqnum = 1;
In the event that there are ties, this returns only one of the values, arbitrarily. If you want all modes when there are ties use rank() instead of row_number().

How to find the most frequently repeated column?

ID UserID LevelID
1 1 1
2 1 2
3 1 2
4 1 2
5 2 1
6 2 3
7 3 2
8 4 1
9 4 1
The query should return: LevelID: 1 (3 times) - the LevelID column that is most frequently repeated by different Users (UserID).
I have the following query:
SELECT LevelID, COUNT(LevelID) AS 'Occurrence'
FROM
(
SELECT DISTINCT * FROM
(
SELECT UserID, LevelID
FROM SampleTable
) cv
) levels
GROUP BY LevelID
ORDER BY 'Occurrence' DESC
Which returns:
LevelID Occurence
1 3
2 2
3 1
But it doesn't let me to add LIMIT 1; at the bottom to retrieve the first top row of the selection. What's wrong with the query?
There is no need for these several levels of nesting. Consider using aggregation, count(distinct ...), ordering the results and using a row-limiting clause to keep the top record only:
select top(1) levelID, count(distinct userID) cnt
from mytable
group by levelID
order by cnt desc
If you want to allow possible top ties, then use top (1) with ties instead of just top (1).

Retrieve unique rows based on id

I have two tables:
Report
ReportId CreatedDate
1 2018-01-12
2 2018-02-12
3 2018-03-12
ReportSpecialty
SpecialtyId ReportId IsPrimarySpecialty
1 1 1
2 2 1
3 3 1
1 2 0
1 3 0
I am trying to write a query that will retrieve me the last 10 reports that were published. However, I need to get 1 report from each specialty. Assume there are 100 specialties, I can pass in as an argument any number of specialties, 10, 20, 5, 2, etc...
I'm trying to figure out a way where if I send it all specialties, it will get me the last 10 reports posted based on the last date created, but it won't give me 2 articles from same specialty. If I send it 10 specialties, then I will get 1 of each. If I send it 5, then I'll get 2 of each. If I send it 3 then I'll get 4 of 1 and 3 of other two.
I may need to write multiple queries for this, I'm trying to see if there is a way to do this on the SQL side of things? If there isn't, then how would I break down to multiple queries to get the result I want?
What I have tried is this, however I get multiple reports with same specialties:
SELECT TOP 10 r.ReportId, rs.SpecialtyId, r.CreatedDate
FROM Report r
INNER JOIN ReportSpecialty rs ON r.ReportId = rs.ReportId AND rs.IsPrimarySpecialty = 1
GROUP BY rs.SpecialtyId, r.AceReportid, r.CreatedDate
ORDER BY r.CreatedDate DESC
with cte as (
SELECT R.ReportId, R.CreatedDate, RS.SpecialtyId,
ROW_NUMBER() OVER (PARTITION BY RS.SpecialtyId
ORDER BY R.CreatedDate DESC) as rn
FROM Report R
JOIN ReportSpecialty RS
ON R.ReportId = RS.ReportId
AND RS.IsPrimarySpecialty = 1
WHERE RS.SpecialtyId IN ( .... ids ... )
)
SELECT TOP 10 *
FROM cte
ORDER BY rn, CreatedDate DESC
row_number will create a id for each speciality, so if you pass 3 speciality you will get something like this.
rn speciality_id
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3

How to select a random row when 2 rows have an equal property

I have a table containing items in a priority order as such:
id priority
1 1
2 2
3 3
4 8
5 3
6 4
Currently I retrieve items (SQL Server) in priority order, although a random item when there are matching priorities using the following query:
select item
from table
order by priority, newid()
This will return
id priority
1 1
2 2
3 3
5 3
6 4
4 8
or
id priority
1 1
2 2
5 3
3 3
6 4
4 8
So it's approximately 50/50 traffic
I now have a requirement to only retrieve one row of the rows when there are two matching priorities, for example..
id priority
1 1
2 2
3 3
6 4
4 8
or
id priority
1 1
2 2
5 3
6 4
4 8
You can use ROW_NUMBER, presuming SQL-Server (because of NEWID):
WITH CTE AS
(
SELECT t.*, RN = ROW_NUMBER() OVER (PARTITION BY Priority
ORDER BY ID)
FROM dbo.table t
)
SELECT * FROM CTE WHERE RN = 1
If these are all columns you could also use this sql:
SELECT MIN(t.ID) AS ID, t.Priority
FROM dbo.table t
GROUP BY t.priority
Update "No, I need to be able to get a random row when two (or more) priorities match"
Then i have misunderstood your requirement. You can use ORDER BY NEWID:
WITH CTE AS
(
SELECT t.*, RN = ROW_NUMBER() OVER (PARTITION BY Priority
ORDER BY NEWID())
FROM dbo.table t
)
SELECT * FROM CTE WHERE RN = 1