Lookup values based solely on minumum value of range - sql

I'm wanting to place values within a range given only a minimum value, similar to using a VLOOKUP/HLOOKUP in Excel using the "FALSE" criteron.
As seen below, TableScore lists the low-end cutpoints (CutpointVal) for a value to be assigned a specific number of points (the minimum value in a range). The below SQL code accomplishes this in two steps, with the first query generating a datasheet that includes a high value for each low value, thus creating a full range.
However, this is a somewhat clunky way of doing this, especially when trying to iterate this many times. The original table (TableScore) cannot be altered to include high values. Is there a way to accomplish a similar mechanism with only one query?
Main
ID Score
72625 2.5
78261 3.2
82766 4.7
58383 0.3
TableScore
CutpointVal Points
0 0
0.3 1
1.2 2
2.7 3
3.4 4
Upper and lower range query (RangeQry):
SELECT a.CutpointVal AS LowVal, Val(Nz((SELECT TOP 1 [CutpointVal]-0.001
FROM TableScore b
WHERE b.Points > a.Points
ORDER BY b.Points ASC),9999)) AS HighVal, a.Points
FROM TableScore AS a
ORDER BY a.Points;
Range assignment query:
SELECT Main.ID, Main.Score, RangeQry.LowVal, RangeQry.HighVal, RangeQry.Points AS PTS
FROM RangeQry, Main
WHERE (((Main.Score) Between [RangeQry].[LowVal] And [RangeQry].[HighVal]));
Desired output:
ID Score Points
72625 2.5 2
78261 3.2 3
82766 4.7 4
58383 0.3 1

Consider:
SELECT Main.ID, Main.Score, (
SELECT Max(Points) FROM TableScore WHERE CutpointVal<=Main.Score) AS Pts
FROM Main;
Or
SELECT Main.ID, Main.Score, (
SELECT TOP 1 Points FROM TableScore
WHERE CutpointVal <= Main.Score
ORDER BY Points DESC) AS Pts
FROM Main;
Or
SELECT Main.ID, Main.Score, DMax("Points","TableScore","CutpointVal<=" & [Score]) AS Pts
FROM Main;

Related

Fetching a minimum of N rows, plus all peers of the last row

I have a sample table named assets which looks like this:
id
name
block_no
1
asset1
2
2
asset2
2
3
asset3
3
There can be any number of assets in a specific block. I need a minimum of 100 rows from the table, and containing all the data from the block_no. Like, if there are 95 rows to block_no 2 and around 20 on block_no 3, I need all 20 of block_no 3 as if I am fetching data in packets based on block_no.
Is this possible and feasible?
Postgres 13 or later
There is a dead simple solution using WITH TIES in Postgres 13 or later:
SELECT *
FROM assets
WHERE block_no >= 2 -- your starting block
ORDER BY block_no
FETCH FIRST 100 ROWS WITH TIES;
This will return at least 100 rows (if enough qualify), plus all peers of the 100th row.
If your table isn't trivially small, an index on (block_no) is essential for performance.
See:
Get top row(s) with highest value, with ties
Older versions
Use the window function rank() in a subquery:
SELECT (a).*
FROM (
SELECT a, rank() OVER (ORDER BY block_no) AS rnk
FROM assets a
) sub
WHERE rnk <= 100;
Same result.
I use a little trick with the row type to strip the added rnk from the result. That's an optional addition.
See:
PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?

Find neighboring polygons with maximum of 3 other polygons

I have a case like the following picture
Say I have 9 polygons, and want to get a polygon that is maximum neighbors with 3 other polygons such as polygons 1, 3, 7, 9 (yellow)
I think this is done using ST_Touches in postgis, but I just come up with represent it in postgis code like
select a.poly_name, b.poly_name from tb a, tb b where ST_Touches(a.geom, b.geom)
And say I want to output this like:
poly_name poly_name
1 2
1 4
1 5
So how I get idea to done with this?
Your hint with ST_Touches is correct, however to get the amount of neighbor cells from one column related to other records in the same table you either need to run a subquery or call the table twice in the FROM clause.
Given the following grid on a table called tb ..
.. you can filter the cells with three neighbor cells or less like this:
SELECT * FROM tb q1
WHERE (
SELECT count(*) FROM tb q2
WHERE ST_Touches(q2.geom,q1.geom)) <=3;
If you want to also list which are the neighbor cells you might wanna first join the cells that touch in the WHERE clause and in a subquery or CTE count the results:
WITH j AS (
SELECT
q1.poly_name AS p1,q2.poly_name p2,
COUNT(*) OVER (PARTITION BY q1.poly_name) AS qt
FROM tb q1, tb q2
WHERE ST_Touches(q2.geom,q1.geom))
SELECT * FROM j
WHERE qt <= 3;
Demo: db<>fiddle
Further reading:
Create Hexagons (maybe relevant for your project)
Window Functions

fetch aggregate value along with data

I have a table with the following fields
ID,Content,QuestionMarks,TypeofQuestion
350, What is the symbol used to represent Bromine?,2,MCQ
758,What is the symbol used to represent Bromine? ,2,MCQ
2425,What is the symbol used to represent Bromine?,3,Essay
2080,A quadrilateral has four sides, four angles ,1,MCQ
2614,A circular cone has a curved surface area of ,2,MCQ
2520,Two triangles have sides 5 cm, 11 cm, 2 cm . ,2,MCQ
2196,Life supporting process mediated by water? ,2,Essay
I would like to get random questions where total marks is an input number.
For example if I say 25, the result should be all the random questions whose Sum(QuestionMarks) is 25(+/-1)
Is this really possible using a SQL
select content,id,questionmarks,sum(questionmarks) from quiz_question
group by content,id,questionmarks;
Expected Input 25
Expected Result (Sum of Question Marks =25)
Update:
How do I ensure I get atleast 2 Essay Type Questions (this is just an example) I would extend this for other conditions. Thank you for all the help
S-Man's cumulative sum is the right approach. For your logic, though, I think you want to get up to the first row that is 24 or more. That logic is:
where total - questionmark < 24
If you have enough questions, then you could get exactly 25 using:
with q25 as (
select *
from (select t.*,
sum(questionmark) over (order by random()) as running_questionmark
from t
) t
where running_questionmark < 25
)
select q.ID, q.Content, q.QuestionMarks, q.TypeofQuestion
from q25 q
union all
(select t.ID, t.Content, t.QuestionMarks, t.TypeofQuestion
from t cross join
(select sum(questionmark) as questionmark_25 from q25) x
where not exists (select 1 from q25 where q25.id = t.id)
order by abs(questionmark - (25 - questionmark_25))
limit 1
)
This selects questions up to 25 but not at 25. It then tries to find one more to make the total 25.
Supposing, questionmark is of type integer. Then you want to get some records in random order whose questionmark sum is not more than 25:
You can use the consecutive SUM() window function. The order is random. The consecutive SUM() adds every current value to the previous sum. So, you could filter where SUM() <= <your value>:
demo:db<>fiddle
SELECT
*
FROM (
SELECT
*,
SUM(questionmark) OVER (ORDER BY random()) as total
FROM
t
)s
WHERE total <= 25
Note:
This returns a records list with no more than 25, but as close as possible to it with an random order.
To find an exact match of your value is some sort of combinatorical problem which shouldn't be solved in a database. Especially when there's a random factor. What if your current SUM is 22 and the next randomly chosen value is 4. Would you retry maybe until infinity to randomly find a value = 3? Or are you trying to remove an already counted record with value = 1?

More efficient query to count group where values in range

I am working with data that essentially looks like this.
table:processed_data
sensor_id, reading, time_stamp
1,0.1,1234567890
1,0.3,1234567891
1,0.9,1234567892
1,0.32,1234567893
...
what I want to do is make a query that can make one loop through the data and count how many readings are in each category. Simple example,
categories (0-0.5,0.5-0.7,0.7-1) (I am actually planning on breaking them into 10 categories with 0.1 increments though).
This is essentially what I want, even though it isn't valid sql:
select count(reading between 0 and 0.5), count(reading between 0.5 and 0.7), count(reading between 0.7 and 1) from processed_data;
The only way I can think of doing it though, is to do an O*N operation, rather than a 1 time loop.
select count(*) as low from processed_data where reading between 0 and 0.5
union
select count(*) as med from processed_data where reading between 0.5 and 0.7
union
select count(*) as high from processed_data where reading between 0.7 and 1;
I might just resort to doing the processing in php and scan the data once, but I would prefer to have sql do it, if it can be smart enough.
You can derive the category from the value, and use that for grouping:
SELECT CAST(reading * 10 AS INTEGER),
COUNT(*)
FROM processed_data
GROUP BY CAST(reading * 10 AS INTEGER);

Trouble writing a query to select one row per "date", given certain conditions

I am having trouble writing a query to select one row per "date", given certain conditions. My table has this structure:
ID date expiration callput iv delta
1 1/1/2009 1/20/2009 C 0.4 0.61
2 1/1/2009 1/20/2009 C 0.3 0.51
3 1/1/2009 2/20/2009 C 0.2 0.41
I would like to write a query with the following characteristics:
For each row, calculate the "days", i.e. the expiration date minus the date. For instance, for row one, the "days" is 19 (1/20 minus 1/1)
The result set should only have rows with a "days" of between 15 and 50
The "callput" value must be "C"
For each date, show only one row. That row should have the following characteristics:
The delta should be greater than 0.5
The delta should be the smallest number greater than 0.5
If there are two rows, the row with the lower days should be selected
Here is 'days' for the sample data above:
ID date expiration days callput iv delta
1 1/1/2009 1/20/2009 19 C 0.4 0.61
2 1/1/2009 1/20/2009 19 C 0.3 0.51
3 1/1/2009 2/20/2009 50 C 0.2 0.41
For my sample dataset, the answer should be row 2, because row 2's "delta" is above 0.5, row 2's delta of 0.51 is closer to 0.5 than row 1's 0.61, and row 2's "days" of 19 is less than row 3's "days" of 50.
This is the query I've written so far:
SELECT date, Min(delta) AS MaxOfdelta, [expiration]-[date] AS days
FROM RAWDATA
WHERE (((delta)>0.5) AND ((callput)="C") AND (([expiration]-[date])>=15 And ([expiration]-[date])<=50))
GROUP BY date, [expiration]-[date]
ORDER BY date;
This works somewhat, but sometimes, there are multiple rows for one date, because two rows on a given day can have a "days" between 15 and 50. I can't get my query to obey the rule "If there are two rows, the row with the lower days should be selected". I would also like the "iv" value for that row to be present in my query result set.
I happen to be using Microsoft Access, but syntax for any SQL engine would be appreciated! :-)
What you can do is select the right rows in a subquery. This query should find the rows you're looking for:
select [date], min([expiration]-[date])
from rawdata
where delta > 0.5
and callput = 'C'
and [expiration]-[date] between 15 and 50
group by [date]
To find the delta that belongs to these rows, put it in a subquery and join on it:
select *
from rawdata
inner join (
select [date]
, min([expiration]-[date]) as days
from rawdata
where delta > 0.5
and callput = 'C'
and [expiration]-[date] between 15 and 50
group by [date]
) as filter
on filter.date = rawdata.date
and filter.days = rawdata.[expiration] - rawdata.[date]
where delta > 0.5
and callput = 'C'
To search for the lowest delta within rows with identical "days", you could add another subquery:
select
SubDaysDelta.date
, SubDaysDelta.MinDays
, SubDaysDelta.MinDelta
, min(rawdata.iv) as MinIv
from rawdata
inner join (
select
SubDays.date
, SubDays.MinDays
, min(delta) as MinDelta
from rawdata
inner join (
select [date]
, min([expiration]-[date]) as MinDays
from rawdata
where delta > 0.5
and callput = 'C'
and [expiration]-[date] between 15 and 50
group by [date]
) as SubDays
on SubDays.date = rawdata.date
and SubDays.MinDays = rawdata.[expiration] - rawdata.[date]
where delta > 0.5
and callput = 'C'
group by SubDays.date, SubDays.MinDays
) as SubDaysDelta
on SubDaysDelta.date = rawdata.date
and SubDaysDelta.MinDays = rawdata.[expiration] - rawdata.[date]
and SubDaysDelta.MinDelta = rawdata.delta
where delta > 0.5
and callput = 'C'
group by SubDaysDelta.date, SubDaysDelta.MinDays, SubDaysDelta.MinDelta
The first subquery "SubDays" searches for rows with the lowest "days". The second subquery "SubDaysDelta" searches for the lowest delta within the "SubDays" set. The outer query filters any duplicates remaining.
It would be more readable and maintainable if you'd use views. The first view could filter on callput and the 15-20 "days" limit. That'd make it a lot easier.
VBA!
I wish I could be as thorough, dedicated and helpful a servant as Andomar. I can only up-vote his answer in sheer awe of him.
However ... I would point out there are perhaps compelling reasons to switch to VBA. Even if you are new to VBA, the benefits in control and trouble shooting may put you ahead. And I'd guess any new learning will help elsewhere in your project.
I wish I would provide a complete answer as Andomar did. But give it a whack.