SQL Query with ORDER BY Part 2

SQL Query with ORDER BY Part 2 - sql

This is a followup question to:
SQL Query with ORDER BY
But I think the SQL logic is going to be quite different, so I am posting it as separate question.
I am trying to extend my sql SELECT query it and having some trouble:
I have the table:
id type radius
-------------------------
1 type1 0.25
2 type2 0.59
3 type1 0.26
4 type1 0.78
5 type3 0.12
6 type2 0.45
7 type3 0.22
8 type3 0.98
and I am trying to learn how to SELECT the second smallest radius for each given type. So the returned recordset should look like:
id type radius
-------------------------
3 type1 0.26
2 type2 0.59
7 type3 0.22
(Note: in the referenced question, I was looking for the lowest radius, not the second lowest radius).
I am assuming I have to use LIMIT and OFFSET, but if I use the MIN() won't that return a distinct record containing the minimum radius?
Does anyone have any thoughts on how to attack this?
Many thanks,
Brett

You didn't mention your DBMS, so I'll post a solution that works with DBMS that support the standard windowing functions:
SELECT *
FROM (
SELECT id,
type,
radius,
dense_rank() OVER (PARTITION BY type ORDER BY radius ASC) as radius_rank
FROM radius_table
) t
WHERE radius_rank = 2
You can easily pick the 3rd lowest or 14th lowest as well by adjusting the WHERE condition
This solution will also work if you have more than one row that qualifies for 2nd lowest (the LIMIT solutions would only show one of them)

This query gives you the 2nd position of a given type
SELECT *
FROM `test`.`rads`
WHERE type = 'type wanted'
ORDER BY `radius` ASC
LIMIT 1, 1
You can mix this in a subquery to fetche a whole list, like this query
SELECT id, type, radius
FROM `test`.`rads` t
WHERE id = (
SELECT id
FROM `test`.`rads` ti
WHERE ti.type = t.type
ORDER BY `radius` ASC
LIMIT 1, 1)
ORDER BY radius ASC, id DESC
With this query you can vary the position by changing the LIMIT first parameter

I would use the SQL query from your previous answer and add a WHERE instrution in it removing all records containing the 'id' of the matching '1st lowest radius'.
SELECT t1.id,t1.type,t1.radius FROM table t1
WHERE radius = (
SELECT MIN(radius) FROM table
WHERE radius = t1.radius
AND id not IN (
SELECT t2.id FROM table t2
WHERE radius = (
SELECT MIN(radius) FROM table
WHERE radius = t2.radius
)
)
)

Related

Using SQL, how do I select which column to add a value to, based on the contents of the row?

I'm having a difficult time phrasing the question, so I think the best thing to do is to give some example tables. I have a table, Attribute_history, I'm trying to pull data from that looks like this:
ID Attribute_Name Attribute_Val Time Stamp
--- -------------- ------------- ----------
1 Color Red 2022/09/28 01:00
2 Color Blue 2022/09/28 01:30
1 Length 3 2022/09/28 01:00
2 Length 4 2022/09/28 01:30
1 Diameter 5 2022/09/28 01:00
2 Diameter 10 2022/09/28 01:30
2 Diameter 11 2022/09/28 01:32
I want to create a table that pulls the attributes of each ID, and if the same ID and attribute_name has been updated, pull the latest info based on Time Stamp.
ID Color Length Diameter
---- ------ ------- --------
1 Red 3 5
2 Blue 4 11
I've achieved this by nesting several select statements, adding one column at a time. I achieved selecting the latest date using this stack overflow post. However, this code seems inefficient, since I'm selecting from the same table multiple times. It also only chooses the latest value for an attribute I know is likely to have been updated multiple times, not all the values I'm interested in.
SELECT
COLOR, DIAMETER, DATE_
FROM
(
SELECT
COLORS.COLOR, ATTR.ATTRIBUTE_NAME AS DIAMETER, ATTR.TIME_STAMP AS DATE_, RANK() OVER (PARTITION BY COLORS.COLOR ORDER BY ATTR.TIME_STAMP DESC) DATE_RANK -- https://stackoverflow.com/questions/3491329/group-by-with-maxdate
FROM
(
SELECT
ATTRIBUTE_HISTORY.ATTRIBUTE_VAL
FROM
ATTRIBUTE_HISTORY
WHERE
ATTRIBUTE_HISTORY.ATTRIBUTE_NAME = 'Color'
GROUP BY ATTRIBUTE_HISTORY.ID
) COLORS
INNER JOIN ATTRIBUTE_HISTORY ATTR ON COLORS.ID = ATTR.ID
WHERE
ATTR.ATTRIBUTE_NAME = 'DIAMETER'
)
WHERE
DATE_RANK = 1
(I copied my real query and renamed values with Find+Replace to obscure the data so this code might not be perfect, but it gets across the idea of how I'm achieving my goal now.)
How can I rewrite this query to be more concise, and pull the latest date entry for each attribute?

For MS SQL Server
Your Problem has 2 parts:
Identify the latest Attribute value based on Time Stamp Column
Convert the Attribute Names to columns ( Pivoting ) in the final
result.
Solution:
;with CTEx as
(
select
row_number() over(partition by id, Attr_name order by Time_Stamp desc) rnum,
id,Attr_name, Attr_value, time_stamp
from #temp
)
SELECT * FROM
(
SELECT id,Attr_name,Attr_value
FROM CTEx
where rnum = 1
) t
PIVOT(
max(Attr_value)
FOR Attr_name IN (Color,Diameter,[Length])
) AS pivot_table;
First part of the problem is taken care of by the CTE with the help of ROW_NUMBER() function. Second part is achieved by using PIVOT() function.
Definition of #temp for reference
Create table #temp(id int, Attr_name varchar(200), Attr_value varchar(200), Time_Stamp datetime)

How can I find the variation in strings in a single column using Snowflake SQL?

Say I have a table like this:
Person1
Person2
Dave
Fred
Dave
Dave
Dave
Mike
Fred
Dave
Dave
Mike
Dave
Jeff
In column 'Person1' clearly Dave is the most popular input, so I'd like to produce a 'similarity score' or 'variation within column' score that would reflect that in SQL (Snowflake).
In contrast, for the column 'Person2' there is more variation between the strings and so the similarity score would be lower, or variation within column higher. So you might end up with a similarity score output as something like: 'Person1': 0.9, 'Person2': 0.4.
If this is just row-wise Levenshtein Distance (LD), how can I push EDITDISTANCE across these to get a score for each column please? At the moment I can only see how to get the LD between 'Person1' and 'Person2', rather than within 'Person1' and 'Person2'.
Many thanks

You proposed values of 0.9 and 0.4 seem like ratio's of sameness, so that can be calculated with a count and ratio_of_report like so:
with data(person1, person2) as (
select * from values
('Dave','Fred'),
('Dave','Dave'),
('Dave','Mike'),
('Fred','Dave'),
('Dave','Mike'),
('Dave','Jeff')
), p1 as (
select
person1
,count(*) as c_p1
,ratio_to_report(c_p1) over () as q
from data
group by 1
qualify row_number() over(order by c_p1 desc) = 1
), p2 as (
select
person2
,count(*) as c_p2
,ratio_to_report(c_p2) over () as q
from data
group by 1
qualify row_number() over(order by c_p2 desc) = 1
)
select
p1.q as p1_same,
p2.q as p2_same
from p1
cross join p2
;
giving:
P1_SAME
P2_SAME
0.833333
0.333333
Editdistance:
So using a full cross join, we can calculate the editdistance of all values, and find the ratio of this to the total count:
with data(person1, person2) as (
select * from values
('Dave','Fred'),
('Dave','Dave'),
('Dave','Mike'),
('Fred','Dave'),
('Dave','Mike'),
('Dave','Jeff')
), combo as (
select
editdistance(da.person1, db.person1) as p1_dist
,editdistance(da.person2, db.person2) as p2_dist
from data as da
cross join data as db
)
select count(*) as c
,sum(p1_dist) as s_p1_dist
,sum(p2_dist) as s_p2_dist
,c / s_p1_dist as p1_same
,c / s_p2_dist as p2_same
from combo
;
But given editdistance gives a result of zero for same and positive value for difference, the scaling of these does not align with the desired result...
JAROWINKLER_SIMILARITY:
Given the Jarowinklet similarity result is already scaled between 0 - 100, it makes more sense to be able to average this..
select
avg(JAROWINKLER_SIMILARITY(da.person1, db.person1)/100) as p1_dist
,avg(JAROWINKLER_SIMILARITY(da.person2, db.person2)/100) as p2_dist
from data as da
cross join data as db;
P1_DIST
P2_DIST
0.861111111111
0.527777777778

Oracle select similar values [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
i have a database table with a lot of values like this: 340.13 and 232.89.
Now i want to select the value with the best match with a comparison value.
Is this possible without great effort?

This will match values that are within +-10% of the search value and, if there are multiple values, will find the closest match by absolute difference.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TABLE_NAME ( VALUE ) AS
SELECT 340.13 FROM DUAL UNION ALL
SELECT 232.89 FROM DUAL UNION ALL
SELECT 224.73 FROM DUAL UNION ALL
SELECT 100.00 FROM DUAL;
Query 1:
WITH search_values ( search_value ) AS (
SELECT 330 FROM DUAL UNION ALL
SELECT 230 FROM DUAL
)
SELECT search_value,
value
FROM (
SELECT search_value,
value,
RANK() OVER ( PARTITION BY Search_value
ORDER BY ABS( value - search_value ) ) AS rnk
FROM table_name t
INNER JOIN
search_values v
ON ( t.value BETWEEN search_value * 0.9 AND search_value * 1.1 )
)
WHERE Rnk = 1
Results:
| SEARCH_VALUE | VALUE |
|--------------|--------|
| 230 | 232.89 |
| 330 | 340.13 |

This is a pretty basic and common task so here is the general approach.
First you need to decide on "best-match-criteria". Basically it as a function of value stored in row and input value. So you can implement this function and evaluate it calling something like MATCH_RATING(COLUMN, :value) for each row. Now that you have this rating for every row, you can sort rows in any way you like and filter the most fitting one (ROWNUM is great for this as are analytic functions like RANK or ROW_NUMBER).
SELECT *
FROM (
SELECT VALUE,
MATCH_RATING(VALUE, :input_value) RATING
FROM YOUR_TABLE
ORDER BY RATING DESC)
WHERE ROWNUM = 1
Then a good idea is to check whether your chosen criteria are implemented in language because if they are, using SQL features will surely be bettter performance-wise.
For example, if distance between two numbers is the only thing that concerns you, SQL will look something like this.
SELECT VALUE
FROM (
SELECT VALUE,
ABS(VALUE - :input_value) DISTANCE
FROM YOUR_TABLE
ORDER BY DISTANCE)
WHERE ROWNUM = 1
If your function assumes 0 value on some interval meaning some rows should never get into your resultset then you should also use WHERE clause filtering useless rows (WHERE MATCH_RATING(COLUMN, :value) > 0).
Back to our distance example: let's accept distance not more than 5% of input value.
SELECT VALUE
FROM (
SELECT VALUE,
ABS(VALUE - :input_value) DISTANCE
FROM YOUR_TABLE
WHERE VALUE BETWEEN 0.95 * :input_value AND 1.05 * :input_value
ORDER BY DISTANCE)
WHERE ROWNUM = 1
By the way, index on YOUR_TABLE.VALUE will surely be helpful for this example.

In SQL, I need to generate a ranking (1st, 2nd, 3rd) column, getting stuck on "ties"

I have a query that calculates points based on multiple criteria, and then orders the result set based on those points.
SELECT * FROM (
SELECT
dbo.afunctionthatcalculates(Something, Something) AS Points1
,dbo.anotherone(Something, Something) AS Points2
,dbo.anotherone(Something, Something) AS Points3
,[TotalPoints] = dbo.function(something) + dbo.function(something)
) AS MyData
ORDER BY MyData.TotalPoints
So my first stab at adding placement, rankings.. was this:
SELECT ROW_NUMBER() OVER(MyData.TotalPoints) AS Ranking, * FROM (
SELECT same as above
) AS MyData
ORDER BY MyData.TotalPoints
This adds the Rankings column, but doesn't work when the points are tied.
Rank | TotalPoints
--------------------
1 100
2 90
3 90
4 80
Should be:
Rank | TotalPoints
--------------------
1 100
2 90
2 90
3 80
Not really sure about how to resolve this.
Thank you for your help.

You should use the DENSE_RANK() function which takes the ties into account, as described here: http://msdn.microsoft.com/en-us/library/ms173825.aspx

DENSE_RANK() instead of ROW_NUMBER()

SQL: How to get the AVG(MIN(number))?

I am looking for the AVERAGE (overall) of the MINIMUM number (grouped by person).
My table looks like this:
Rank Name
1 Amy
2 Amy
3 Amy
2 Bart
1 Charlie
2 David
5 David
1 Ed
2 Frank
4 Frank
5 Frank
I want to know the AVERAGE of the lowest scores. For these people, the lowest scores are:
Rank Name
1 Amy
2 Bart
1 Charlie
2 David
1 Ed
2 Frank
Giving me a final answer of 1.5 - because three people have a MIN(Rank) of 1 and the other three have a MIN(Rank) of 2. That's what I'm looking for - a single number.
My real data has a couple hundred rows, so it's not terribly big. But I can't figure out how to do this in a single, simple statement. Thank you for any help.

Try this:
;WITH MinScores
AS
(
SELECT
"Rank",
Name,
ROW_NUMBER() OVER(PARTITION BY Name ORDER BY "Rank") row_num
FROM Table1
)
SELECT
CAST(SUM("Rank") AS DECIMAL(10, 2)) /
COUNT("Rank")
FROM MinScores
WHERE row_num = 1;
SQL Fiddle Demo

Selecting the set of minimum values is straightforward. The cast() is necessary to avoid integer division later. You could also avoid integer division by casting to float instead of decimal. (But you should be aware that floats are "useful approximations".)
select name, cast(min(rank) as decimal) as min_rank
from Table1
group by name
Now you can use the minimums as a common table expression, and select from it.
with minimums as (
select name, cast(min(rank) as decimal) as min_rank
from Table1
group by name
)
select avg(min_rank) avg_min_rank
from minimums
If you happen to need to do the same thing on a platform that doesn't support common table expressions, you can a) create a view of minimums, and select from that view, or b) use the minimums as a derived table.

You might try using a derived table to get the minimums, then get the average minimum in the outer query, as in:
-- Get the avg min rank as a decimal
select avg(MinRank * 1.0) as AvgRank
from (
-- Get everyone's min rank
select min([Rank]) as MinRank
from MyTable
group by Name
) as a

I think the easiest one will be
for max
select name , max_rank = max(rank)
from table
group by name;
for average
select name , avg_rank = avg(rank)
from table
cgroup by name;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Query with ORDER BY Part 2 - sql

Related

Using SQL, how do I select which column to add a value to, based on the contents of the row?

How can I find the variation in strings in a single column using Snowflake SQL?

Oracle select similar values [closed]

In SQL, I need to generate a ranking (1st, 2nd, 3rd) column, getting stuck on "ties"

SQL: How to get the AVG(MIN(number))?

Categories

Resources