How to Rank Based on Multiple Columns - sql

I'm trying to score people in Microsoft Access based on the count they have for a particular category.
There are 7 possible categories a person can have against them, and I want to assigned each person a score from 1-7, with 1 being assigned to the highest scoring category, 7 being the lowest. They might not have an answer for every category, in which case that category can be ignored.
The aim would be to have an output result as shown in this image:
I've tried a few different things, including partition over and joins, but none have worked. To be honest I think I'm way off the mark with the queries I've been trying. I've tried to write the code in SQL from scratch, and used query builder.
Any help is really appreciated!

As you for an email can have duplicated counts, you will need two subqueries for this:
SELECT
Score.email,
Score.category,
Score.[Count],
(Select Count(*) From Score As T Where
T.email = Score.email And
T.[Count] >= Score.[Count])-
(Select Count(*) From Score As S Where
S.email = Score.email And
S.[Count] = Score.[Count] And
S.category > Score.category) AS Rank
FROM
Score
ORDER BY
Score.email,
Score.[Count] DESC,
Score.category;

For categories with equal Count values for the same email, the following will rank the records alphabetically descending by Category name (since this is what is shown in your example):
select t.email, t.category, t.count,
(
select count(*) from YourTable u
where t.email = u.email and
((t.count = u.count and t.category <= u.category) or t.count < u.count)
) as rank
from YourTable t
order by t.email, t.count desc, t.category desc
Change both references of YourTable to the name of your table.

Related

Selecting 1 column's value in a group after grouping by another column

How would I include the name of any one of the books that belong to that particular type in the below query?
select distinct
(select sum(ob.Balance)),
ob.BookType
from orders.OrderBooks ob
group by ob.BookType
In its current state it does what I need it to and groups books by BookType and sums their balances, as seen below.
However I need the name of any book that belongs to that BookType as part of the result.
If I select the BookName column and then group by it like below, it results in more unique entries and to an extent undoes the original grouping.
select distinct
(select sum(ob.Balance)),
ob.BookType,
ob.BookName
from orders.OrderBooks ob
group by ob.BookType, ob.BookName
;WITH x AS
(
SELECT
Balance = SUM(Balance) OVER (PARTITION BY BookType),
BookType,
BookName,
rn = ROW_NUMBER() OVER (PARTITION BY BookType ORDER BY BookName DESC)
FROM orders.OrderBooks
)
SELECT Balance, BookType, BookName
FROM x
WHERE rn = 1;
db<>fiddle
ORDER BY BookName DESC was dealer's choice. If you truly don't care which title shows up in the result, you can use any ordering you like. If you want the results to be random every time, you can use ORDER BY NEWID().
In general I like this flexibility better than the TOP (1) subquery approach, in addition to a single scan instead of an additional table access per row. But you can also do it a different way; just take min/max of the bookname, too:
SELECT Balance = SUM(Balance),
BookType,
BookName = MIN(BookName) -- or MAX()
FROM dbo.OrderBooks
GROUP BY BookType;
You can see these give similar results in this db<>fiddle. Plan is simpler, too; most notably: no spools. However when you use an aggregate function against that column, it makes it harder to provide arbitrary/random results, and if you intend to add other columns pulled from the right row, you'll need to go back to the row_number solution.
You can use a correlated subquery to get a single book name of that type. This assumes there's an ID field and you want to pull the most recent one:
select
Balance = (select sum(ob.Balance)),
ob.BookType,
BookName = (SELECT TOP(1) ob.BookName FROM orders.OrderBooks ob2 WHERE ob2.BookType = ob.BookType ORDER BY ob2.ID DESC)
from orders.OrderBooks ob
group by ob.BookType, ob.BookName

MS Access TRIMMEAN how to

I need to perform TREAMMEAN in Access, which does not have this function.
In a table I have many Employees, each has many records.
I need to TRIMMEAN Values for each Employee separately.
Following queries perform TOP 10 percent for all records:
qry_data_TOP10_ASC
qry_data_TOP10_DESC
unionqry_TOP10_ASCandDESC
qry_data_ALL_minus_union_qry
After that, I can use Avg (Average).
But I don't know how to do it for each employee.
Visualization:
Note:
This question is edited to simplify problem.
You don't really give information in your pseudo code about your data fields but using your example that DOES have basic field information I can suggest the following should work as you described
It assumes field1 is your unique record ID - but you make no mention of which fields are keys
SELECT AVG(qry_data.field2) FROM qry_data WHERE qry_data.field1 NOT IN
(SELECT * FROM
(SELECT TOP 10 PERCENT qry_data.field1, qry_data.field2
FROM qry_data
ORDER BY qry_data.field2 ASC)
UNION
(SELECT TOP 10 PERCENT qry_data.field1, qry_data.field2
FROM qry_data
ORDER BY qry_data.field2 DESC)
)
This should give you what you want, the two sub-queries should correlate the TOP 10s (ascending and descending) for every employee. The two NOT INs should then remove those from the Table1 records and then you group the Employees and Average the Scores.
SELECT Table1.Employee, AVG(Table1.Score) AS AvgScore
FROM Table1
WHERE ID NOT IN
(
SELECT TOP 10 ID
FROM Table1 a
WHERE a.Employee = Table1.Employee
ORDER BY Score ASC, Employee, ID
)
AND ID NOT IN
(
SELECT TOP 10 ID
FROM Table1 b
WHERE b.Employee = Table1.Employee
ORDER BY Score DESC, Employee, ID
)
GROUP BY Table1.Employee;

Get n grouped categories and sum others into one

I have a table with the following structure:
Contents (
id
name
desc
tdate
categoryid
...
)
I need to do some statistics with the data in this table. For example I want to get number of rows with the same category by grouping and id of that category. Also I want to limit them for n rows in descending order and if there are more categories available I want to mark them as "Others". So far I have come out with 2 queries to database:
Select n rows in descending order:
SELECT COALESCE(ca.NAME, 'Unknown') AS label
,ca.id AS catid
,COUNT(c.id) AS data
FROM contents c
LEFT OUTER JOIN category ca ON ca.id = c.categoryid
GROUP BY label
,catid
ORDER BY data DESC LIMIT 7
Select other rows as one:
SELECT 'Others' AS label
,COUNT(c.id) AS data
FROM contents c
LEFT OUTER JOIN category ca ON ca.id = c.categoryid
WHERE c.categoryid NOT IN ($INCONDITION)
But when I have no category groups left in db table I still get an "Others" record. Is it possible to make it in one query and make the "Others" record optional?
The specific difficulty here: Queries with one or more aggregate functions in the SELECT list and no GROUP BY clause produce exactly one row, even if no row is found in the underlying table.
There is nothing you can do in the WHERE clause to suppress that row. You have to exclude such a row after the fact, i.e. in the HAVING clause, or in an outer query.
Per documentation:
If a query contains aggregate function calls, but no GROUP BY clause,
grouping still occurs: the result is a single group row (or perhaps no
rows at all, if the single row is then eliminated by HAVING). The same
is true if it contains a HAVING clause, even without any aggregate
function calls or GROUP BY clause.
It should be noted that adding a GROUP BY clause with only a constant expression (which is otherwise completely pointless!) works, too. See example below. But I'd rather not use that trick, even if it's short, cheap and simple, because it's hardly obvious what it does.
The following query only needs a single table scan and returns the top 7 categories ordered by count. If (and only if) there are more categories, the rest is summarized into 'Others':
WITH cte AS (
SELECT categoryid, count(*) AS data
, row_number() OVER (ORDER BY count(*) DESC, categoryid) AS rn
FROM contents
GROUP BY 1
)
( -- parentheses required again
SELECT categoryid, COALESCE(ca.name, 'Unknown') AS label, data
FROM cte
LEFT JOIN category ca ON ca.id = cte.categoryid
WHERE rn <= 7
ORDER BY rn
)
UNION ALL
SELECT NULL, 'Others', sum(data)
FROM cte
WHERE rn > 7 -- only take the rest
HAVING count(*) > 0; -- only if there actually is a rest
-- or: HAVING sum(data) > 0
You need to break ties if multiple categories can have the same count across the 7th / 8th rank. In my example, categories with the smaller categoryid win such a race.
Parentheses are required to include a LIMIT or ORDER BY clause to an individual leg of a UNION query.
You only need to join to table category for the top 7 categories. And it's generally cheaper to aggregate first and join later in this scenario. So don't join in the the base query in the CTE (common table expression) named cte, only join in the first SELECT of the UNION query, that's cheaper.
Not sure why you need the COALESCE. If you have a foreign key in place from contents.categoryid to category.id and both contents.categoryid and category.name are defined NOT NULL (like they probably should be), then you don't need it.
The odd GROUP BY true
This would work, too:
...
UNION ALL
SELECT NULL , 'Others', sum(data)
FROM cte
WHERE rn > 7
GROUP BY true;
And I even get slightly faster query plans. But it's a rather odd hack ...
SQL Fiddle demonstrating all.
Related answer with more explanation for the UNION ALL / LIMIT technique:
Sum results of a few queries and then find top 5 in SQL
The quick fix, to make the 'Others' row conditional would be to add a simple HAVING clause to that query.
HAVING COUNT(c.id) > 0
(If there are no other rows in the contents table, then COUNT(c.id) is going to be zero.)
That only answers half the question, how to make the return of that row conditional.
The second half of the question is a little more involved.
To get the whole resultset in one query, you could do something like this
(this is not tested yet; desk checked only.. I'm not sure if postgresql accepts a LIMIT clause in an inline view... if it doesn't we'd need to implement a different mechanism to limit the number of rows returned.
SELECT IFNULL(t.name,'Others') AS name
, t.catid AS catid
, COUNT(o.id) AS data
FROM contents o
LEFT
JOIN category oa
ON oa.id = o.category_id
LEFT
JOIN ( SELECT COALESCE(ca.name,'Unknown') AS name
, ca.id AS catid
, COUNT(c.id) AS data
FROM contents c
LEFT
JOIN category ca
ON ca.id = c.categoryid
GROUP
BY COALESCE(ca.name,'Unknown')
, ca.id
ORDER
BY COUNT(c.id) DESC
, ca.id DESC
LIMIT 7
) t
ON ( t.catid = oa.id OR (t.catid IS NULL AND oa.id IS NULL))
GROUP
BY ( t.catid = oa.id OR (t.catid IS NULL AND oa.id IS NULL))
, t.catid
ORDER
BY COUNT(o.id) DESC
, ( t.catid = oa.id OR (t.catid IS NULL AND oa.id IS NULL)) DESC
, t.catid DESC
LIMIT 7
The inline view t basically gets the same result as the first query, a list of (up to) 7 id values from category table, or 6 id values from category table and a NULL.
The outer query basically does the same thing, joining content with category, but also doing a check if there's a matching row from t. Because t might be returning a NULL, we have a slightly more complicated comparison, where we want a NULL value to match a NULL value. (MySQL conveniently gives us shorthand operator for this, the null-safe comparison operator <=>, but I don't think that's available in postgresql, so we have to express differently.
a = b OR (a IS NULL AND b IS NULL)
The next bit is getting a GROUP BY to work, we want to group by the 7 values returned by the inline view t, or, if there's not matching value from t, group the "other" rows together. We can get that to happen by using a boolean expression in the GROUP BY clause.
We're basically saying "group by 'if there was a matching row from t'" (true or false) and then group by the row from 't'. Get a count, and then order by the count descending.
This isn't tested, only desk checked.
You can approach this with nested aggregation. The inner aggregation calculates the counts along with a sequential number. You want to take everything whose number is 7 or less and then combine everything else into the others category:
SELECT (case when seqnum <= 7 then label else 'others' end) as label,
(case when seqnum <= 7 then catid end) as catid, sum(cnt)
FROM (SELECT ca.name AS label, ca.id AS catid, COUNT(c.id) AS cnt,
row_number() over (partition by ca.name, catid order by count(c.id) desc) as seqnum
FROM contents c LEFT OUTER JOIN
category ca
ON ca.id = c.categoryid
GROUP BY label, catid
) t
GROUP BY (case when seqnum <= 7 then label else 'others' end),
(case when seqnum <= 7 then catid end)
ORDER BY cnt DESC ;

SQL Finding maximum value without top command

Let's say I have a bases with a table:
-courses (key: name [ofthecourse], other attributes: year in which the course takes place)
I want to complete a query looking for an answer to the question:
On which year of study there is a maximum number of courses?
Normally, the query would be:
SELECT TOP 1 STUDYEAR
FROM COURSES
GROUP BY STUDYEAR
ORDER BY COUNT(CNO) DESC;
But my question is, which query could complete this without using the TOP 1 phrase?
You can use an inner query to get the maximum count. The only difference is though that it can return more than one record if they have the same count.
SELECT STUDYEAR
FROM COURSES
GROUP BY STUDYEAR
HAVING COUNT(CNO) = (SELECT MAX(CNOCount) FROM
(SELECT COUNT(CNO) CNOCount
FROM COURSES
GROUP BY STUDYEAR) X)
Another version with only one inner query:
SELECT STUDYEAR
FROM
(SELECT STUDYEAR, ROW_NUMBER() OVER (ORDER BY COUNT(CNO) DESC) RowNumber
FROM COURSES
GROUP BY STUDYEAR) X
WHERE RowNumber = 1

Fetch one row per account id from list, part 2

Not sure how to ask a followup on SO, but this is in reference to an earlier question:
Fetch one row per account id from list
The query I'm working with is:
SELECT *
FROM scores s1
WHERE accountid NOT IN (SELECT accountid FROM scores s2 WHERE s1.score < s2.score)
ORDER BY score DESC
This selects the top scores, and limits results to one row per accountid; their top score.
The last hurdle is that this query is returning multiple rows for accountids that have multiple occurrences of their top score. So if accountid 17 has scores of 40, 75, 30, 75 the query returns both rows with scores of 75.
Can anyone modify this query (or provide a better one) to fix this case, and truly limit it to one row per account id?
Thanks again!
If you're only interested in the accountid and the score, then you can use the simple GROUP BY query given by Paul above.
SELECT accountid, MAX(score)
FROM scores
GROUP BY accountid;
If you need other attributes from the scores table, then you can get other attributes from the row with a query like the following:
SELECT s1.*
FROM scores AS s1
LEFT OUTER JOIN scores AS s2 ON (s1.accountid = s2.accountid
AND s1.score < s2.score)
WHERE s2.accountid IS NULL;
But this still gives multiple rows, in your example where a given accountid has two scores matching its maximum value. To further reduce the result set to a single row, for example the row with the latest gamedate, try this:
SELECT s1.*
FROM scores AS s1
LEFT OUTER JOIN scores AS s2 ON (s1.accountid = s2.accountid
AND s1.score < s2.score)
LEFT OUTER JOIN scores AS s3 ON (s1.accountid = s3.accountid
AND s1.score = s3.score AND s1.gamedate < s3.gamedate)
WHERE s2.accountid IS NULL
AND s3.accountid IS NULL;
select accountid, max(score) from scores group by accountid;
If your RDBMS supports them, then an analytic function would be a good approach particularly if you need all the columns of the row.
select ...
from (
select accountid,
score,
...
row_number() over
(partition by accountid
order by score desc) score_rank
from scores)
where score_rank = 1;
The row returned is indeterminate in the case you describe, but you can easily modify the analytic function, for example by ordering on (score desc, test_date desc) to get the more recent of two matching high scores.
Other analytic functions based on rank will achieve a similar purpose.
If you don't mind duplicates then the following would probably me more efficient than your current method:
select ...
from (
select accountid,
score,
...
max(score) over (partition by accountid) max_score
from scores)
where score = max_score;
If you are selecting a subset of columns then you can use the DISTINCT keyword to filter results.
SELECT DISTINCT UserID, score
FROM scores s1
WHERE accountid NOT IN (SELECT accountid FROM scores s2 WHERE s1.score < s2.score)
ORDER BY score DESC
Does your database support distinct? As in select distinct x from y?
This solutions works in MS SQL, giving you the whole row.
SELECT *
FROM scores
WHERE scoreid in
(
SELECT max(scoreid)
FROM scores as s2
JOIN
(
SELECT max(score) as maxscore, accountid
FROM scores s1
GROUP BY accountid
) sub ON s2.score = sub.maxscore AND s2.accountid = s1.accountid
GROUP BY s2.score, s2.accountid
)