Looking to select values grouped by one column but create a hierarchy of the different columns to find "the best" column - sql

Mightn't make much sense but let's try.
I have a dataset that is quite large and I have a few "duplicates" in a column. Within that column, I want to group it but select the corresponding row that is the "best fit" based on the max/sum of other columns. Is this possible within SQL?
Input:
Name
Transactions
Date
Apple #
Orange #
John
10
today
10
10
John
15
Yesterday
10
10
Jack
10
Today
5
5
Output I expect:
Name
Transactions
Date
Apple #
Orange #
Total #
John
15
Yesterday
10
10
20
Jack
10
Today
5
5
10
The hierarchy would be, max(transactions), max(date) and then sum(Apple, Orange).
I want to do it then for every unique name.

If I understand correctly, you can use row_number(). The key is setting up the order by to reflect the conditions you want:
select t.*
from (select t.*,
row_number() over (partition by name order by transactions desc, date desc, apple + orange desc) as seqnum
from t
) t
where seqnum = 1;

Related

Order table by the total count but do not lose the order by names

I have a table, consisting of 3 columns (Person, Year and Count), so for each person, there are several rows with different years and counts and the final row with total count. I want to keep the table ordered by Name, but also order it by the total count.
So the rows should be ordered by sum, but also grouped by the Person and ordered by year. When I am trying to order by sum, of course, both person and years are messed up. Is there a way to sort like this?
You've stored those "total" rows as well? Gosh! Why did you do that?
Anyway: if you
compute rank for rows whose year column is equal to 'total' and
add case expression into the order by clause,
you might get what you want:
SQL> with sorter as
2 (select name, cnt,
3 rank() over (order by cnt) rnk
4 from test
5 where year = 'total'
6 )
7 select t.*
8 from test t join sorter s on s.name = t.name
9 order by s.rnk, case when year = 'total' then '9'
10 else year
11 end;
NAME YEAR CNT
---- ----- ----------
John 2018 3
John 2019 2
John total 5
Bob 2017 2
Bob 2019 4
Bob total 6
6 rows selected.
SQL>

Running Total by Year in SQL

I have a table broken out into a series of numbers by year, and need to build a running total column but restart during the next year.
The desired outcome is below
Amount | Year | Running Total
-----------------------------
1 2000 1
5 2000 6
10 2000 16
5 2001 5
10 2001 15
3 2001 18
I can do an ORDER BY to get a standard running total, but can't figure out how to base it just on the year such that it does the running total for each unique year.
SQL tables represent unordered sets. You need a column to specify the ordering. One you have this, it is a simple cumulative sum:
select amount, year, sum(amount) over (partition by year order by <ordering column>)
from t;
Without a column that specifies ordering, "cumulative sum" does not make sense on a table in SQL.

SQL field default "count(another_field) +1"

I need to create a field COUNT whose default value is the automatically generated count of times NAME has appeared in that table till now, as shown in example below. Since i am adding the field to an existing table, i also need to populate existing rows. How best to go about this please?
ID NAME COUNT
1 peter 1
2 jane 1
3 peter 2
4 peter 3
5 frank 1
6 jane 2
7 peter 4
You would do this when you are querying the table, using the ANSI-standard row-number function:
select id, name, row_number() over (partition by name order by id) as seqnum
from t;

Retrieve highest value from sql table

How can retrieve that data:
Name Title Profit
Peter CEO 2
Robert A.D 3
Michael Vice 5
Peter CEO 4
Robert Admin 5
Robert CEO 13
Adrin Promotion 8
Michael Vice 21
Peter CEO 3
Robert Admin 15
to get this:
Peter........4
Robert.......15
Michael......21
Adrin........8
I want to get the highest profit value from each name.
If there are multiple equal names always take the highest value.
select name,max(profit) from table group by name
Since this type of request almost always follows with "now can I include the title?" - here is a query that gets the highest profit for each name but can include all the other columns without grouping or applying arbitrary aggregates to those other columns:
;WITH x AS
(
SELECT Name, Title, Profit, rn = ROW_NUMBER()
OVER (PARTITION BY Name ORDER BY Profit DESC)
FROM dbo.table
)
SELECT Name, Title, Profit
FROM x
WHERE rn = 1;

selecting top N rows for each group in a table

I am facing a very common issue regarding "Selecting top N rows for each group in a table".
Consider a table with id, name, hair_colour, score columns.
I want a resultset such that, for each hair colour, get me top 3 scorer names.
To solve this i got exactly what i need on Rick Osborne's blogpost "sql-getting-top-n-rows-for-a-grouped-query"
That solution doesn't work as expected when my scores are equal.
In above example the result as follow.
id name hair score ranknum
---------------------------------
12 Kit Blonde 10 1
9 Becca Blonde 9 2
8 Katie Blonde 8 3
3 Sarah Brunette 10 1
4 Deborah Brunette 9 2 - ------- - - > if
1 Kim Brunette 8 3
Consider the row 4 Deborah Brunette 9 2. If this also has same score (10) same as Sarah, then ranknum will be 2,2,3 for "Brunette" type of hair.
What's the solution to this?
If you're using SQL Server 2005 or newer, you can use the ranking functions and a CTE to achieve this:
;WITH HairColors AS
(SELECT id, name, hair, score,
ROW_NUMBER() OVER(PARTITION BY hair ORDER BY score DESC) as 'RowNum'
)
SELECT id, name, hair, score
FROM HairColors
WHERE RowNum <= 3
This CTE will "partition" your data by the value of the hair column, and each partition is then order by score (descending) and gets a row number; the highest score for each partition is 1, then 2 etc.
So if you want to the TOP 3 of each group, select only those rows from the CTE that have a RowNum of 3 or less (1, 2, 3) --> there you go!
The way the algorithm comes up with the rank, is to count the number of rows in the cross-product with a score equal to or greater than the girl in question, in order to generate rank. Hence in the problem case you're talking about, Sarah's grid would look like
a.name | a.score | b.name | b.score
-------+---------+---------+--------
Sarah | 9 | Sarah | 9
Sarah | 9 | Deborah | 9
and similarly for Deborah, which is why both girls get a rank of 2 here.
The problem is that when there's a tie, all girls take the lowest value in the tied range due to this count, when you'd want them to take the highest value instead. I think a simple change can fix this:
Instead of a greater-than-or-equal comparison, use a strict greater-than comparison to count the number of girls who are strictly better. Then, add one to that and you have your rank (which will deal with ties as appropriate). So the inner select would be:
SELECT a.id, COUNT(*) + 1 AS ranknum
FROM girl AS a
INNER JOIN girl AS b ON (a.hair = b.hair) AND (a.score < b.score)
GROUP BY a.id
HAVING COUNT(*) <= 3
Can anyone see any problems with this approach that have escaped my notice?
Use this compound select which handles OP problem properly
SELECT g.* FROM girls as g
WHERE g.score > IFNULL( (SELECT g2.score FROM girls as g2
WHERE g.hair=g2.hair ORDER BY g2.score DESC LIMIT 3,1), 0)
Note that you need to use IFNULL here to handle case when table girls has less rows for some type of hair then we want to see in sql answer (in OP case it is 3 items).