SQL: How to get the AVG(MIN(number))? - sql

I am looking for the AVERAGE (overall) of the MINIMUM number (grouped by person).
My table looks like this:
Rank Name
1 Amy
2 Amy
3 Amy
2 Bart
1 Charlie
2 David
5 David
1 Ed
2 Frank
4 Frank
5 Frank
I want to know the AVERAGE of the lowest scores. For these people, the lowest scores are:
Rank Name
1 Amy
2 Bart
1 Charlie
2 David
1 Ed
2 Frank
Giving me a final answer of 1.5 - because three people have a MIN(Rank) of 1 and the other three have a MIN(Rank) of 2. That's what I'm looking for - a single number.
My real data has a couple hundred rows, so it's not terribly big. But I can't figure out how to do this in a single, simple statement. Thank you for any help.

Try this:
;WITH MinScores
AS
(
SELECT
"Rank",
Name,
ROW_NUMBER() OVER(PARTITION BY Name ORDER BY "Rank") row_num
FROM Table1
)
SELECT
CAST(SUM("Rank") AS DECIMAL(10, 2)) /
COUNT("Rank")
FROM MinScores
WHERE row_num = 1;
SQL Fiddle Demo

Selecting the set of minimum values is straightforward. The cast() is necessary to avoid integer division later. You could also avoid integer division by casting to float instead of decimal. (But you should be aware that floats are "useful approximations".)
select name, cast(min(rank) as decimal) as min_rank
from Table1
group by name
Now you can use the minimums as a common table expression, and select from it.
with minimums as (
select name, cast(min(rank) as decimal) as min_rank
from Table1
group by name
)
select avg(min_rank) avg_min_rank
from minimums
If you happen to need to do the same thing on a platform that doesn't support common table expressions, you can a) create a view of minimums, and select from that view, or b) use the minimums as a derived table.

You might try using a derived table to get the minimums, then get the average minimum in the outer query, as in:
-- Get the avg min rank as a decimal
select avg(MinRank * 1.0) as AvgRank
from (
-- Get everyone's min rank
select min([Rank]) as MinRank
from MyTable
group by Name
) as a

I think the easiest one will be
for max
select name , max_rank = max(rank)
from table
group by name;
for average
select name , avg_rank = avg(rank)
from table
cgroup by name;

Related

How can I find the variation in strings in a single column using Snowflake SQL?

Say I have a table like this:
Person1
Person2
Dave
Fred
Dave
Dave
Dave
Mike
Fred
Dave
Dave
Mike
Dave
Jeff
In column 'Person1' clearly Dave is the most popular input, so I'd like to produce a 'similarity score' or 'variation within column' score that would reflect that in SQL (Snowflake).
In contrast, for the column 'Person2' there is more variation between the strings and so the similarity score would be lower, or variation within column higher. So you might end up with a similarity score output as something like: 'Person1': 0.9, 'Person2': 0.4.
If this is just row-wise Levenshtein Distance (LD), how can I push EDITDISTANCE across these to get a score for each column please? At the moment I can only see how to get the LD between 'Person1' and 'Person2', rather than within 'Person1' and 'Person2'.
Many thanks
You proposed values of 0.9 and 0.4 seem like ratio's of sameness, so that can be calculated with a count and ratio_of_report like so:
with data(person1, person2) as (
select * from values
('Dave','Fred'),
('Dave','Dave'),
('Dave','Mike'),
('Fred','Dave'),
('Dave','Mike'),
('Dave','Jeff')
), p1 as (
select
person1
,count(*) as c_p1
,ratio_to_report(c_p1) over () as q
from data
group by 1
qualify row_number() over(order by c_p1 desc) = 1
), p2 as (
select
person2
,count(*) as c_p2
,ratio_to_report(c_p2) over () as q
from data
group by 1
qualify row_number() over(order by c_p2 desc) = 1
)
select
p1.q as p1_same,
p2.q as p2_same
from p1
cross join p2
;
giving:
P1_SAME
P2_SAME
0.833333
0.333333
Editdistance:
So using a full cross join, we can calculate the editdistance of all values, and find the ratio of this to the total count:
with data(person1, person2) as (
select * from values
('Dave','Fred'),
('Dave','Dave'),
('Dave','Mike'),
('Fred','Dave'),
('Dave','Mike'),
('Dave','Jeff')
), combo as (
select
editdistance(da.person1, db.person1) as p1_dist
,editdistance(da.person2, db.person2) as p2_dist
from data as da
cross join data as db
)
select count(*) as c
,sum(p1_dist) as s_p1_dist
,sum(p2_dist) as s_p2_dist
,c / s_p1_dist as p1_same
,c / s_p2_dist as p2_same
from combo
;
But given editdistance gives a result of zero for same and positive value for difference, the scaling of these does not align with the desired result...
JAROWINKLER_SIMILARITY:
Given the Jarowinklet similarity result is already scaled between 0 - 100, it makes more sense to be able to average this..
select
avg(JAROWINKLER_SIMILARITY(da.person1, db.person1)/100) as p1_dist
,avg(JAROWINKLER_SIMILARITY(da.person2, db.person2)/100) as p2_dist
from data as da
cross join data as db;
P1_DIST
P2_DIST
0.861111111111
0.527777777778

How can i choose the Max Decision number of two different column?

The same client can have multiple decision numbers, I need to choose the max of multiple the decision number. Please help.
sample data
CNO DNO
1 1
1 2
3 3
You can use window function like:
max(decisionNumber) over (partition by ClientName) as 'MAXofDecisionNumber'
SELECT DECNO,MAX(CNO) FROM TABLE1 GROUP BY DECNO
FOR BOTH MAX
SELECT DECNO,MAX(CNO) FROM TABLE1
WHERE DECNO = (SELECT MAX(DECNO) FROM TABLE1)
GROUP BY DECNO
Live Demo
http://sqlfiddle.com/#!18/3cdd0/3

Complex SQL query or queries

I looked at other examples, but I don't know enough about SQL to adapt it to my needs. I have a table that looks like this:
ID Month NAME COUNT First LAST TOTAL
------------------------------------------------------
1 JAN2013 fred 4
2 MAR2013 fred 5
3 APR2014 fred 1
4 JAN2013 Tom 6
5 MAR2014 Tom 1
6 APR2014 Tom 1
This could be in separate queries, but I need 'First' to equal the first month that a particular name is used, so every row with fred would have JAN2013 in the first field for example. I need the 'Last" column to equal the month of the last record of each name, and finally I need the 'total' column to be the sum of all the counts for each name, so in each row that had fred the total would be 10 in this sample data. This is over my head. Can one of you assist?
This is crude but should do the trick. I renamed your fields a bit because you are using a bunch of "RESERVED" sql words and that is bad form.
;WITH cte as
(
Select
[NAME]
,[nmCOUNT]
,ROW_NUMBER() over (partition by NAME order by txtMONTH ASC) as 'FirstMonth'
,ROW_NUMBER() over (partition by NAME order by txtMONTH DESC) as 'LastMonth'
,SUM([nmCOUNT]) as 'TotNameCount'
From Table
Group by NAME, [nmCOUNT]
)
,cteFirst as
(
Select
NAME
,[nmCOUNT]
,[TotNameCount]
,[txtMONTH] as 'ansFirst'
From cte
Where FirstMonth = 1
)
,cteLast as
(
Select
NAME
,[txtMONTH] as 'ansLast'
From cte
Where LastMonth = 1
Select c.NAME, c.nmCount, c.ansFirst, l.ansLast, c.TotNameCount
From cteFirst c
LEFT JOIN cteLast l on c.NAME = l.NAME

SQL query with grouping and MAX

I have a table that looks like the following but also has more columns that are not needed for this instance.
ID DATE Random
-- -------- ---------
1 4/12/2015 2
2 4/15/2015 2
3 3/12/2015 2
4 9/16/2015 3
5 1/12/2015 3
6 2/12/2015 3
ID is the primary key
Random is a foreign key but i am not actually using table it points to.
I am trying to design a query that groups the results by Random and Date and select the MAX Date within the grouping then gives me the associated ID.
IF i do the following query
select top 100 ID, Random, MAX(Date) from DateBase group by Random, Date, ID
I get duplicate Randoms since ID is the primary key and will always be unique.
The results i need would look something like this
ID DATE Random
-- -------- ---------
2 4/15/2015 2
4 9/16/2015 3
Also another question is there could be times where there are many of the same date. What will MAX do in that case?
You can use NOT EXISTS() :
SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
WHERE s.random = t.random
AND s.date > t.date)
This will select only those who doesn't have a bigger date for corresponding random value.
Can also be done using IN() :
SELECT * FROM YourTable t
WHERE (t.random,t.date) in (SELECT s.random,max(s.date)
FROM YourTable s
GROUP BY s.random)
Or with a join:
SELECT t.* FROM YourTable t
INNER JOIN (SELECT s.random,max(s.date) as max_date
FROM YourTable s
GROUP BY s.random) tt
ON(t.date = tt.max_date and s.random = t.random)
In SQL Server you could do something like the following,
select a.* from DateBase a inner join
(select Random,
MAX(dt) as dt from DateBase group by Random) as x
on a.dt =x.dt and a.random = x.random
This method will work in all versions of SQL as there are no vendor specifics (you'll need to format the dates using your vendor specific syntax)
You can do this in two stages:
The first step is to work out the max date for each random:
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
Now you can join back onto your table to get the max ID for each combination:
SELECT MAX(e.ID) AS ID
,e.DateField AS DateField
,e.Random
FROM Example AS e
INNER JOIN (
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
) data
ON data.MaxDateField = e.DateField
AND data.Random = e.Random
GROUP BY DateField, Random
SQL Fiddle example here: SQL Fiddle
To answer your second question:
If there are multiples of the same date, the MAX(e.ID) will simply choose the highest number. If you want the lowest, you can use MIN(e.ID) instead.

In SQL, I need to generate a ranking (1st, 2nd, 3rd) column, getting stuck on "ties"

I have a query that calculates points based on multiple criteria, and then orders the result set based on those points.
SELECT * FROM (
SELECT
dbo.afunctionthatcalculates(Something, Something) AS Points1
,dbo.anotherone(Something, Something) AS Points2
,dbo.anotherone(Something, Something) AS Points3
,[TotalPoints] = dbo.function(something) + dbo.function(something)
) AS MyData
ORDER BY MyData.TotalPoints
So my first stab at adding placement, rankings.. was this:
SELECT ROW_NUMBER() OVER(MyData.TotalPoints) AS Ranking, * FROM (
SELECT same as above
) AS MyData
ORDER BY MyData.TotalPoints
This adds the Rankings column, but doesn't work when the points are tied.
Rank | TotalPoints
--------------------
1 100
2 90
3 90
4 80
Should be:
Rank | TotalPoints
--------------------
1 100
2 90
2 90
3 80
Not really sure about how to resolve this.
Thank you for your help.
You should use the DENSE_RANK() function which takes the ties into account, as described here: http://msdn.microsoft.com/en-us/library/ms173825.aspx
DENSE_RANK() instead of ROW_NUMBER()