SQL Server Group By which counts occurrences of a score - sql

This might be a bit difficult to explain but I have two columns in my SQL server database which I have simplified...
Items
ID
itemName
voteCount
score
Votes
itemID
score
So, basically I am storing every vote that is placed in one table, but also a count of the number of votes for each item in the item table along with it's average score (out of 10) (which I know is duplicating data but it makes things easier).
Anyway, I want to create a SQL query which finds the 2 items which have the lowest score. This would be easy you would think as you'd just do this...
SELECT TOP 2 itemName FROM Items ORDER BY score ASC;
However, the client has added the following complication.
When 2 or more items have the same score then the item with the highest number of 10/10 votes would be placed above. If 2 or more items have the same score AND the same number of 10/10 votes then it would rank the item with the most 9/10 votes above the others and so on, right down to the number of 0/10 votes if everything else is equal.
So, the challenge is to rank all the items by these criteria then pick off the bottom 2. I have tried every combination of grouping, aggregating and "sub-querying" to work this out but I think I need the help of somebody much cleverer than me.
Any help would really be appreciated.
Clarification
The average score for an item is stored in the item table and the score cast against each vote is kept in the votes table. Initially we need to rank by the average score (I.score) and where 2 items have the same score we need to start counting the number of 10/10's in the votes linked to that item (v.score).
So, we might have an item called "t-shirt" which has an average score of 5/10. This comes from 6 votes with the following scores 5,5,5,5,5,5.
The next item is called "Ferrari" and also has an average score of 5/10, but this item only has 4 votes with the following scores 6,5,5,4
Clearly, the ferrari should win because the sql would see that it has no 10's, no 9's, no 8's, not 7's, but it does have a vote of 6 which trumps the t-shirt.

SELECT TOP 2 i.itemName
FROM Items i
left outer join (
select ItemID,
sum(case when score = 10 then 1 end) as Score10,
sum(case when score = 9 then 1 end) as Score9,
sum(case when score = 8 then 1 end) as Score8,
sum(case when score = 7 then 1 end) as Score7,
sum(case when score = 6 then 1 end) as Score6,
sum(case when score = 5 then 1 end) as Score5,
sum(case when score = 4 then 1 end) as Score4,
sum(case when score = 3 then 1 end) as Score3,
sum(case when score = 2 then 1 end) as Score2,
sum(case when score = 1 then 1 end) as Score1
from Votes
group by ItemID
) v on i.ID = v.ItemID
ORDER BY i.score,
v.Score10,
v.Score9,
v.Score8,
v.Score7,
v.Score6,
v.Score5,
v.Score4,
v.Score3,
v.Score2,
v.Score1

Related

Order competitors by multiple conditions

I use a concrete, but hypothetical, example.
Consider a database containing the results of a shooting competition, where each competitor made several series of shots. DB contains 3 tables: Competitors, Series and Shots.
Competitors:
id
name
1
A
2
B
Series:
id
competitorId
1
1
2
1
3
1
4
2
5
2
6
2
Shots:
id
serieId
score
1
1
8
2
1
8
3
1
8
4
2
10
5
2
7
6
2
6
7
3
10
8
3
8
9
3
6
10
4
8
11
4
8
12
4
7
13
5
7
14
5
10
15
5
7
16
6
7
17
6
10
18
6
7
(DDL with the above statements: dbfiddle)
What I need is to order competitors by multiple conditions, which are:
Total score of all series
Number of center hits (center hit has 10 points score)
The next step to order by is:
Highest score on last serie
Highest score on next to last serie
Highest score on next to next to last serie
...
and so on for the number of series in the competition.
The query that uses the first two order conditions is quite straightforward:
SELECT comp.name,
SUM(shots.score) AS score,
SUM(IIF(shots.score = 10, 1, 0)) AS centerHits
FROM Shots shots
INNER JOIN Series series ON series.id = shots.serieId
INNER JOIN Competitors comp ON comp.id = series.competitorId
GROUP BY comp.name
ORDER BY score DESC, centerHits DESC
It produces following results:
name
score
centerHits
A
71
2
B
71
2
With the 3rd order condition I expect B competitor to be above A, because both have the same total score, the same centerHits and the same score for the last serie (24), but the score of next to last serie of B is 24 while A's score is only 23.
I wonder if it's possible to make a query that uses the third and following order conditions.
You should be able to do this pretty simply, as your requirements can be done with normal aggregation and window functions.
For each level of ordering:
"Total score of all series" can be satisfied by summing all scores.
"Number of center hits (center hit has 10 points score)" can be satisfied with a conditional count.
To order by each series working backwards by date, we can aggregate the total score per series (which we calculate using a window function) using STRING_AGG, ordering the aggregation by date (or id). Then if we order the final query by that aggregation, the later series will be sorted first.
This method allows you to order by an arbitrary number of series, as opposed to the other answer.
It's unclear how you define "later" and "earlier" as you have no date column, but I've used series.id as a proxy for that.
SELECT
comp.name,
SUM(shots.score) as totalScore,
COUNT(CASE WHEN shots.score = 10 THEN 1 END) AS centerHits,
STRING_AGG(NCHAR(shots.MaxScore + 65), ',') WITHIN GROUP (ORDER BY series.id DESC) as AllShots
FROM (
SELECT *,
SUM(shots.score) OVER (PARTITION BY shots.serieID) MaxScore
FROM Shots shots
) shots
INNER JOIN Series series ON series.id = shots.serieId
INNER JOIN Competitors comp ON comp.id = series.competitorId
GROUP BY
comp.id,
comp.name
ORDER BY
totalScore DESC,
centerHits DESC,
AllShots DESC;
Note that when grouping by name, you should also add in the primary key to the GROUP BY as the name might not be unique.
A similar, but slightly more complex query, is to pre-aggregate shots in the derived table. This is likely to perform better than using a window function.
SELECT
comp.name,
SUM(shots.totalScore) as totalScore,
SUM(centerHits) AS centerHits,
STRING_AGG(NCHAR(shots.totalScore + 65), ',') WITHIN GROUP (ORDER BY series.id DESC) as AllShots
FROM (
SELECT
shots.serieId,
SUM(shots.score) as totalScore,
COUNT(CASE WHEN shots.score = 10 THEN 1 END) AS centerHits
FROM Shots shots
GROUP BY
shots.serieId
) shots
INNER JOIN Series series ON series.id = shots.serieId
INNER JOIN Competitors comp ON comp.id = series.competitorId
GROUP BY
comp.id,
comp.name
ORDER BY
totalScore DESC,
centerHits DESC,
AllShots DESC;
db<>fiddle
It appears you need a multi-level query, each building on the one prior.
The INNER-MOST query with alias PQ is a simple sum on a per SerieID which gets the total Center Hits and total points for each respective set. Similar to what you had for the counting.
From that, you need to know which series is the latest (most recent) and work your way backwards to the prior and again prior to that. By using the OVER / PARTITION, I am joining to the series table to get the competitor ID and name while I'm at it.
By Partitioning the data based on each competitor, and applying the order based on the SerieID DESCENDING, I am getting the row number which will put the most recent as row_number() becoming 1, 2 and 3 respectively, such that for Competitor A, who had SerieID 1, then 2, then 3 will have the final "MostRecent" column as 3, 2 and 1 respetively, so SerieID 3 = 1 -- the most recent, and SerieID 1 = 3 the OLDEST serie or the competitor.
Similarly for the second competitor B, SerieIDs 4, 5 and 6 become 3, 2, 1 respectively. So now, you have a basis to know what was the latest (1 = most recent), the next to last (2 = next most recent), and next to next to last (3...)
Now that these two parts are all set, I can sum the respective totals, center hits, and now expliitly know what the most recent (1) was for its sort, and second latest (2) and third from last (3) are. These are added to the group by.
Now, if one competitor has 6 shooting series vs another having 4 series (not that it will happen in a real competition, but to understand the context), the 6 series will have their LATEST as the MostRecent = 1, similarly with 4 series, the 4th series will be MostRecent = 1.
So the final group by at the COMPETITOR level, you can assess all the parts in question.
select
F.Name,
F.CompetitorID,
sum( F.SeriesTotalScore ) TotalScore,
sum( F.CenterHits ) CenterHits,
sum( case when F.MostRecent = 1
then F.SeriesTotalScore else 0 end ) MostRecentScore,
sum( case when F.MostRecent = 2
then F.SeriesTotalScore else 0 end ) SecondToMostRecentScore,
sum( case when F.MostRecent = 3
then F.SeriesTotalScore else 0 end ) ThirdToMostRecentScore
from
( select
c.Name,
Se.CompetitorID,
PQ.SerieId,
PQ.CenterHits,
PQ.SeriesTotalScore,
ROW_NUMBER() OVER( PARTITION BY Se.CompetitorID
order by PQ.SerieId DESC) AS MostRecent
from
( select
s.serieId,
sum( case when s.score = 10 then 1 else 0 end ) as CenterHits,
sum( s.Score ) SeriesTotalScore
from
Shots s
group by
s.SerieID ) PQ
Join Series Se
on PQ.SerieID = se.id
JOIN Competitors c
on Se.CompetitorID = c.id
) F
group by
F.Name,
F.CompetitorID
order by
sum( F.SeriesTotalScore ) desc,
sum( F.CenterHits ),
sum( case when F.MostRecent = 1
then F.SeriesTotalScore else 0 end ) desc,
sum( case when F.MostRecent = 2
then F.SeriesTotalScore else 0 end ) desc,
sum( case when F.MostRecent = 3
then F.SeriesTotalScore else 0 end ) desc

Creating row with different where

I have this code to get the number of users of all items in the list and the average level.
select itemId,count(c.characterid) as numberOfUse, avg(maxUpgrade) as averageLevel
from items i inner join characters c on i.characterId=c.characterId
where itemid in (22001,22002,22003,22004,22005,22006,22007,22008,22009,22010,22011,22012,22013,22014,22015,22016,22030,22031,22032,22033,22034,22035,22036,22037,22038,22039,22040,22041,22042,22050,22051,22052,22053,22054,22055,22056,22057,22058,22059,22060,22070,22071,22072,22073,22074,22075,22076,22077,22085,22086,22087,22091,22092)
and attached>0
group by itemId
It does is creating a row for the rune id, one for the number of users, and one for the average-level people who upgrade it, and it does that for all players of the server.
I would like to create a new column every 10 levels to have stats every 10 levels, so I can see what item is more used depending on player level. The item level depending on the level, so the way I do to select only a certain level is using WHERE itemid>0 and itemid<10, and I do that every 10 levels, copy data, and push them in a google sheet.
So I would like a result with columns :
itemid use_1-10 avg_level_1-10 use_11-20 avg_level_21-30 etc...
So I could copy all the results at once and not having to do the same process 15 times.
If I am following this correctly, you can do conditional aggregation. Assuming that a "level" is stored in column level in table characters, you would do:
select i.itemId,
sum(case when c.level between 1 and 10 then 1 else 0 end) as use_1_10,
avg(case when c.level between 1 and 10 then maxUpgrade end) as avg_level_1_10,
sum(case when c.level between 11 and 20 then 1 else 0 end) as use_11_20,
avg(case when c.level between 11 and 20 then maxUpgrade end) as avg_level_11_20,
...
from items i
inner join characters c on i.characterId = c.characterId
where i.itemid in (...) and attached > 0
group by i.itemId
Note: consider prefixing column attached in the where clause with the table it belongs to, in order to avoid ambiguity.

Add a Total Row and converting into percentages

I have a database with information about people's work condition and neighbourhood.
I have to display a chart of information in percentages like this:
Neighbourhood Total Employed Unemployed Inactive
Total 100 50 25 25
1 100 45 30 25
2 100 55 20 25
To do that, the code that I've made so far is:
select neighbourhood, Count (*) as Total,
Count(Case when (condition = 1) then 'employed' end) as employed,
Count (case when (condition = 2) then 'unemployed' end) as unemployed,
Count (Case when (condition =3) then 'Inactive' end) as Inactive
from table
group by neighbourhood
order by neighbourhood
the output for that code is (the absolut numbers are made up, they dont result in the percentages above):
Neighbourhood Total Employed Unemployed Inactive
1 600 300 200 100
2 450 220 159 80
So, I have to turn the absolut numbers in percentages and add the Total Row (suming the values from the neighbourhoods) but I all my efforts were a failure. I can't solve how to add that Total row nor how to have that total for each neighbourhood for calculating the percentages
I started studying SQL just two weeks ago so I apologize for any inconvenience. I tried my best to keep it simple (in my database are 15 neighbourhoods and it's ok if they are labeled by numbers)
Thanks
You need to UNION to the add the total row
select 'All' as neighbourhood, Count (*) as Total,
Count(Case when (condition = 1) then 1 end) as employed,
Count (case when (condition = 2) then 1 end) as unemployed,
Count (Case when (condition =3) then 1 end) as Inactive
from table
UNION all
select neighbourhood, Count (*) as Total,
Count(Case when (condition = 1) then 1 end) as employed,
Count (case when (condition = 2) then 1 end) as unemployed,
Count (Case when (condition =3) then 1 end) as Inactive
from table
group by neighbourhood
order by neighbourhood
You can add the total rows using grouping sets:
select neighbourhood, Count(*) as Total,
sum((condition = 1)::int) as employed,
sum((condition = 2)::int) as unemployed,
sum((condition = 3)::int) as Inactive
from table
group by grouping sets ( (neighbourhood), () )
order by neighbourhood;
If you want averages within each row, then use avg() rather than sum().

Calculate percentages of columns in Oracle SQL

I have three columns, all consisting of 1's and 0's. For each of these columns, how can I calculate the percentage of people (one person is one row/ id) who have a 1 in the first column and a 1 in the second or third column in oracle SQL?
For instance:
id marketing_campaign personal_campaign sales
1 1 0 0
2 1 1 0
1 0 1 1
4 0 0 1
So in this case, of all the people who were subjected to a marketing_campaign, 50 percent were subjected to a personal campaign as well, but zero percent is present in sales (no one bought anything).
Ultimately, I want to find out the order in which people get to the sales moment. Do they first go from marketing campaign to a personal campaign and then to sales, or do they buy anyway regardless of these channels.
This is a fictional example, so I realize that in this example there are many other ways to do this, but I hope anyone can help!
The outcome that I'm looking for is something like this:
percentage marketing_campaign/ personal campaign = 50 %
percentage marketing_campaign/sales = 0%
etc (for all the three column combinations)
Use count, sum and case expressions, together with basic arithmetic operators +,/,*
COUNT(*) gives a total count of people in the table
SUM(column) gives a sum of 1 in given column
case expressions make possible to implement more complex conditions
The common pattern is X / COUNT(*) * 100 which is used to calculate a percent of given value ( val / total * 100% )
An example:
SELECT
-- percentage of people that have 1 in marketing_campaign column
SUM( marketing_campaign ) / COUNT(*) * 100 As marketing_campaign_percent,
-- percentage of people that have 1 in sales column
SUM( sales ) / COUNT(*) * 100 As sales_percent,
-- complex condition:
-- percentage of people (one person is one row/ id) who have a 1
-- in the first column and a 1 in the second or third column
COUNT(
CASE WHEN marketing_campaign = 1
AND ( personal_campaign = 1 OR sales = 1 )
THEN 1 END
) / COUNT(*) * 100 As complex_condition_percent
FROM table;
You can get your percentages like this :
SELECT COUNT(*),
ROUND(100*(SUM(personal_campaign) / sum(count(*)) over ()),2) perc_personal_campaign,
ROUND(100*(SUM(sales) / sum(count(*)) over ()),2) perc_sales
FROM (
SELECT ID,
CASE
WHEN SUM(personal_campaign) > 0 THEN 1
ELSE 0
end AS personal_campaign,
CASE
WHEN SUM(sales) > 0 THEN 1
ELSE 0
end AS sales
FROM the_table
WHERE ID IN
(SELECT ID FROM the_table WHERE marketing_campaign = 1)
GROUP BY ID
)
I have a bit overcomplicated things because your data is still unclear to me. The subquery ensures that all duplicates are cleaned up and that you only have for each person a 1 or 0 in marketing_campaign and sales
About your second question :
Ultimately, I want to find out the order in which people get to the
sales moment. Do they first go from marketing campaign to a personal
campaign and then to sales, or do they buy anyway regardless of these
channels.
This is impossible to do in this state because you don't have in your table, either :
a unique row identifier that would keep the order in which the rows were inserted
a timestamp column that would tell when the rows were inserted.
Without this, the order of rows returned from your table will be unpredictable, or if you prefer, pure random.

Selecting count by row combinations

I'm strugling with what on the first sight appeared to be simple SQL query :)
So I have following table which has three columns: PlayerId, Gender, Result (all of type integer).
What I'm trying to do, is to select distinct players of gender 2 (male) with number of each results.
There are about 50 possible results, so new table should have 51 columns:
|PlayerId | 1 | 2 | 3 | ... | 50 |
So I would like to see how many times each individual male (gender 2) player got specific result.
*** In case question is still not entirely clear to you: After each game I insert a row with a player ID, gender and result (from 1 - 50) player achieved in that game. Now I'd like to see how many times each player achieved specfic results.
If there are 50 results and you want them in columns, then you are talking about a pivot. I tend to do these with conditional aggregation:
select player,
sum(case when result = 0 then 1 else 0 end) as result_00,
sum(case when result = 1 then 1 else 0 end) as result_01,
. . .
sum(case when result = 50 then 1 else 0 end) as result_50
from t
group by player;
You can choose a particular gender if you like, with where gender = 2. But why not calculate all at the same time?
try
select player, result, count(*)
from your_table
where Gender = 2
group by player, result;
select PleyerId from tablename where result = 'specific result you want' and gender = 2 group by PleyerId
The easiest way is to use pivoting:
;with cte as(Select * from t
Where gender = 2)
Select * from cte
Pivot(count(gender) for result in([1],[2],[3],....,[50]))p
Fiddle http://sqlfiddle.com/#!3/8dad5/3
One note: keeping gender in scores table is a bad idea. Better make a separate table for players and keep gender there.