Select record with max value from each group with Query DSL - sql

I have a score table where I have players scores, and I want select unique records for each player with the biggest score.
Here is the table:
id | player_id | score | ...
1 | 1 | 10 | ...
2 | 2 | 21 | ...
3 | 3 | 9 | ...
4 | 1 | 30 | ...
5 | 3 | 2 | ...
Expected result:
id | player_id | score | ...
2 | 2 | 21 | ...
3 | 3 | 9 | ...
4 | 1 | 30 | ...
I can achieve that with pure SQL like this:
SELECT *
FROM player_score ps
WHERE ps.score =
(
SELECT max(ps2.score)
FROM player_score ps2
WHERE ps2.player_id = ps.player_id
)
Can you tell me how to achieve the same query with query dsl? I found some solutions with JPASubQuery but this class doesn't work for me (my IDE cannot resolve this class). I am using querydsl 4.x. Thank you in advance.

JPASubQuery has been removed in querydsl 4. Instead use JPAExpressions.select. Your WHERE clause should look something like this:
.where(playerScore.score.eq(JPAExpressions.select(playerScore2.score.max())
.from(playerScore2))
.where(playerScore2.playerId.eq(playerScore.playerId)))

Related

Select max value from column for every value in other two columns

I'm working on a webapp that tracks tvshows, and I need to get all episodes id's that are season finales, which means, the highest episode number from all seasons, for all tvshows.
This is a simplified version of my "episodes" table.
id tvshow_id season epnum
---|-----------|--------|-------
1 | 1 | 1 | 1
2 | 1 | 1 | 2
3 | 1 | 1 | 3
4 | 1 | 2 | 1
5 | 1 | 2 | 2
6 | 2 | 1 | 1
7 | 2 | 1 | 2
8 | 2 | 1 | 3
9 | 2 | 1 | 4
10 | 2 | 2 | 1
11 | 2 | 2 | 2
The expect output:
id
---|
3 |
5 |
9 |
11 |
I've managed to get this working for the latest season but I can't make it work for all seasons.
I've also tried to take some ideas from this but I can't seem to find a way to add the tvshow_id in there.
I'm using Postgres v10
SELECT Id from
(Select *, Row_number() over (partition by tvshow_id,season order by epnum desc) as ranking from tbl)c
Where ranking=1
You can use the below SQL to get your result, using GROUP BY with sub-subquery as:
select id from tab_x
where (tvshow_id,season,epnum) in (
select tvshow_id,season,max(epnum)
from tab_x
group by tvshow_id,season)
Below is the simple query to get desired result. Below query is also good in performance with help of using distinct on() clause
select
distinct on (tvshow_id,season)
id
from your_table
order by tvshow_id,season ,epnum desc

Aggregate values of grouped SQL results

I have a query like the below:
SELECT value
FROM people
GROUP BY id
With people table structure like:
... | id | value
----------------
... | 1 | 5.43
... | 1 | 4.92
... | 1 | 1.22
... | 2 | 2.11
... | 2 | 1.00
... | 3 | 4.33
... | 4 | 9.12
... | 5 | 4.43
... | 5 | 5.09
... |...| ...
This would return a result set like the below:
id | value
----------
1 | 5.43
2 | 2.11
3 | 4.33
4 | 9.12
5 | 4.43
...| ...
It only takes the first value per id, but I want to aggregate them. eg. the value of the grouped id = 1 would be 3.86. I'm not sure the SQL for this, or even if it is possible. Any ideas?
Do you mean average?
SELECT id,avg(value)
FROM people
GROUP BY id
Looks like you're trying to get an average.
SELECT id, avg(value)
FROM people
GROUP BY id

Error in executing two groupbys in sparkSQL

I am new to sparksql and i was trying to experiment certain queries with that.
This is the query i am trying to execute
sqlContext.sql(SELECT id , category ,AVG(mark) FROM data GROUP BY id, category)
I am not getting proper output when i run the query.
instead of actual value of category i am getting some value as 1,2,3.
I am stuck at this weird error for long time
but when i do simple select statement and one group by its working perfectly
sqlContext.sql(SELECT id , category FROM data)
sqlContext.sql(SELECT id ,AVG(mark) FROM data GROUP BY id)
What is wrong? Does SPARKSQL has something to do with multiple group by.
right now i am running this complex query
sqlContext.sql(SELECT data.id , data.category, AVG(id_avg.met_avg) FROM (SELECT id, AVG(mark) AS met_avg FROM data GROUP BY id) AS id_avg, data GROUP BY data.category, data.id)
This works, but taking a longer time to execute.
Please Help
Sample data:
|id | category | marks
| 1 | a | 40
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
| 1 | a | 30
The output should be:
|id | category | avg
| 1 | a | 35
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
Please try this query:
SELECT
data.id
, data.category
, AVG(mark)
FROM data
GROUP BY
data.id
, data.category
Based on this sample data:
|id | category | marks
| 1 | a | 40
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
| 1 | a | 30
The output WILL be this:
|id | category | avg
| 1 | a | 35
| 2 | b | 44
| 3 | a | 50
| 4 | b | 40
and, the following expected row cannot be produced using group by:
| 5 | a | 30
That is a bug in sparksql.
Try using the next version. Its fixed.
i got the proper output by using spark-1.0.2
it worked with pure scala code also. Try either of them :)

SQL query: please help weird output on SQL query. (Counting each client latest status)

I used to ask this question in this SO and already got the solution, however, that sql query produced only partial correct result. I have tried to figure it out but it seem to complicate for me to understand. So would you please kindly help me. Thank you.
for table tbl_sActivity
act_id| Client_id | act_status| user_id | act_date
1 | 7 | warm | 1 | 19/7/12
2 | 7 | dealed | 1 | 30/7/12 <- lastest status of client(7)
3 | 8 | hot | 1 | 6/8/12 <- lastest status of client(8)
4 | 5 | cold | 22 | 7/8/12 <- lastest status of client(5)
5 | 6 | cold | 1 | 16/7/12
6 | 6 | warm | 1 | 18/7/12
7 | 6 | dealed | 1 | 7/8/12 <- lastest status of client(6)
8 | 9 | warm | 26 | 2/8/12
9 | 10 | warm | 26 | 2/8/12
10 | 9 | hot | 26 | 4/8/12 <- lastest status of client(9)
11 | 10 | hot | 26 | 4/8/12
12 | 10 | dealed | 26 | 10/8/12 <- lastest status of client(10)
13 | 13 | dealed | 26 | 8/8/12 <- lastest status of client(13)
14 | 11 | hot | 25 | 8/8/12
15 | 11 | dealed | 25 | 14/8/12 <- lastest status of client(11)
I want to produce the User Progressive Report that shows the latest progress of how each user follow up his/her clients.
Therefore this report must show number of each user's clients group by each of their latest status.
The correct output should be looked like this..
user_id | act_status | Count(act_status)
1 | dealed | 2
1 | hot | 1
22 | cold | 1
25 | dealed | 1
26 | hot | 1
26 | dealed | 2
The SQL query that I had is below:
select user_id, act_status, count(act_Status)
from your_table
where act_date in (
select max(act_date)
from your_table
group by Client_id
)
group by user_id, act_status
which when i add more transactions to the database, the output went wrong like this..
user_id | act_status | Count(act_status)
1 | dealed | 2
1 | hot | 1
22 | cold | 1
25 | hot | 1 ** (this one shouldn't show up)
25 | dealed | 1
26 | hot | 2 ** (should be only '1')
26 | dealed | 2
Tried and tested:
SELECT S.user_id, S.act_status, Count(S.act_status) AS CountOfact_status
FROM tbl_sActivity AS S INNER JOIN [SELECT A.client_id, Max(A.act_date) AS max_act
FROM tbl_sActivity AS A
GROUP BY A.client_id]. AS S2 ON (S.act_date = S2.max_act) AND (S.client_id = S2.client_id)
GROUP BY S.user_id, S.act_status
A similar request to an answer I gave yesterday, please try and read through my query as well as just using it, this will hopefully help you understand and is the only way to learn : )
Will it not be better if you use the primary key column: act_id, instead of using the date column.
Thus, getting the max primary key?
What I meant was, instead of this:
where act_date in (
select max(act_date)
from your_table
group by Client_id
)
something like this:
where act_id in (
select max(act_id)-- for each user, can't think of syntax now, but this could point you into the right direction
)
BugFinder's answer looks better.
select user_id, act_status, count(act_Status)
from your_table left join
(select act_date in (
select max(act_date)
from your_table
group by Client_id
) as t where t.client_id = your_table.client_id and t.act_date = your_table.act_date
group by user_id, act_status
Should give you what you wanted
Eg make a table of max dates by client, then link it back to your table using both fields to be precise.

How to count distinct records

Could anybody please help me on SQL command?
I have a table (tbl_sActivity) that have below data:
user_id | client_id | act_status |
1 | 7 |
cold |
1 | 7 |
dealed |
22 | 5 |
cold |
1 | 6 |
cold |
1 | 6 |
warm |
1 | 6 |
hot |
1 | 6 |
dealed |
1 | 8 |
warm |
1 | 8 |
dealed |
21 | 4 |
warm |
21 | 4 |
dealed |
The out put should be
user_id | Count_C_id |
1 |
3 |
21 |
1 |
22 |
1 |
I've searched from net and learnt that MS ACCESS cannot use COUNT(DISTINCT) function. So I'm stuck at this stage for days.
Try this one. The "trick" is to have a subquery first to get all the distinct combinations of user and client IDs and then do the grouping per user:
SELECT
user_id
, COUNT(*) AS count_distinct_clients
FROM
( SELECT DISTINCT
user_id,
client_id
FROM tbl_sActivity
) AS tmp
GROUP BY
user_id ;
Recommendation is to make query without using sub-query.
Please find the below code which will be faster and accurate then subquery.
// Temp Table
CREATE TABLE #TempStudent(userId int, c_id int , Name varchar(MAX) )
SELECT max(userid) as UserId, count(c_id) as C_ID from #TempStudent
GROUP BY userId