SQLite query compare distance within rows - sql

I would like to compare and get only stations which are within a distance range ONE TO THE OTHER. Let say I have 3 stations A-B-C they all have a position x-y-z. I would like to get the stations that are distant from 30 meters (I have a function to compute the distance so let's call it distance(x,y)).
SELECT * FROM Station WHERE distance(Station1, Station2) < 30
My problem is how can you compare distance of two different rows Station1 and Station2?
Thanks!!!

You could do something like this:
select a.*
from station a
inner join station b
on distance(a.station_id, b.station_id) < 30;

Related

Calculating the mode/median/most frequent observation in categorical variables in SQL impala

I would like to calculate the mode/median or better, most frequent observation of a categorical variable within my query.
E.g, if the variable has the following string values:
dog, dog, dog, cat, cat and I want to get dog since its 3 vs 2.
Is there any function that does that? I tried APPX_MEDIAN() but it only returns the first 10 characters as median and I do not want that.
Also, I would like to get the most frequent observation with respect to date if there is a tie-break.
Thank you!
the most frequent observation is mode and you can calculate it like this.
Single value mode can be calculated like this on a value column. Get the count and pick up row with max count.
select count(*),value from mytable group by value order by 1 desc limit 1
now, in case you have multiple modes, you need to join back to the main table to find all matches.
select orig.value from
(select count(*) c, value v from mytable) orig
join (select count(*) cmode from mytable group by value order by 1 desc limit 1) cmode
ON orig.c= cmode.cmode
This will get all count of values and then match them based on count. Now, if one value of count matches to max count, you will get 1 row, if you have two value counts matches to max count, you will get 2 rows and so on.
Calculation of median is little tricky - and it will give you middle value. And its not most frequent one.

Find neighboring polygons with maximum of 3 other polygons

I have a case like the following picture
Say I have 9 polygons, and want to get a polygon that is maximum neighbors with 3 other polygons such as polygons 1, 3, 7, 9 (yellow)
I think this is done using ST_Touches in postgis, but I just come up with represent it in postgis code like
select a.poly_name, b.poly_name from tb a, tb b where ST_Touches(a.geom, b.geom)
And say I want to output this like:
poly_name poly_name
1 2
1 4
1 5
So how I get idea to done with this?
Your hint with ST_Touches is correct, however to get the amount of neighbor cells from one column related to other records in the same table you either need to run a subquery or call the table twice in the FROM clause.
Given the following grid on a table called tb ..
.. you can filter the cells with three neighbor cells or less like this:
SELECT * FROM tb q1
WHERE (
SELECT count(*) FROM tb q2
WHERE ST_Touches(q2.geom,q1.geom)) <=3;
If you want to also list which are the neighbor cells you might wanna first join the cells that touch in the WHERE clause and in a subquery or CTE count the results:
WITH j AS (
SELECT
q1.poly_name AS p1,q2.poly_name p2,
COUNT(*) OVER (PARTITION BY q1.poly_name) AS qt
FROM tb q1, tb q2
WHERE ST_Touches(q2.geom,q1.geom))
SELECT * FROM j
WHERE qt <= 3;
Demo: db<>fiddle
Further reading:
Create Hexagons (maybe relevant for your project)
Window Functions

fetch aggregate value along with data

I have a table with the following fields
ID,Content,QuestionMarks,TypeofQuestion
350, What is the symbol used to represent Bromine?,2,MCQ
758,What is the symbol used to represent Bromine? ,2,MCQ
2425,What is the symbol used to represent Bromine?,3,Essay
2080,A quadrilateral has four sides, four angles ,1,MCQ
2614,A circular cone has a curved surface area of ,2,MCQ
2520,Two triangles have sides 5 cm, 11 cm, 2 cm . ,2,MCQ
2196,Life supporting process mediated by water? ,2,Essay
I would like to get random questions where total marks is an input number.
For example if I say 25, the result should be all the random questions whose Sum(QuestionMarks) is 25(+/-1)
Is this really possible using a SQL
select content,id,questionmarks,sum(questionmarks) from quiz_question
group by content,id,questionmarks;
Expected Input 25
Expected Result (Sum of Question Marks =25)
Update:
How do I ensure I get atleast 2 Essay Type Questions (this is just an example) I would extend this for other conditions. Thank you for all the help
S-Man's cumulative sum is the right approach. For your logic, though, I think you want to get up to the first row that is 24 or more. That logic is:
where total - questionmark < 24
If you have enough questions, then you could get exactly 25 using:
with q25 as (
select *
from (select t.*,
sum(questionmark) over (order by random()) as running_questionmark
from t
) t
where running_questionmark < 25
)
select q.ID, q.Content, q.QuestionMarks, q.TypeofQuestion
from q25 q
union all
(select t.ID, t.Content, t.QuestionMarks, t.TypeofQuestion
from t cross join
(select sum(questionmark) as questionmark_25 from q25) x
where not exists (select 1 from q25 where q25.id = t.id)
order by abs(questionmark - (25 - questionmark_25))
limit 1
)
This selects questions up to 25 but not at 25. It then tries to find one more to make the total 25.
Supposing, questionmark is of type integer. Then you want to get some records in random order whose questionmark sum is not more than 25:
You can use the consecutive SUM() window function. The order is random. The consecutive SUM() adds every current value to the previous sum. So, you could filter where SUM() <= <your value>:
demo:db<>fiddle
SELECT
*
FROM (
SELECT
*,
SUM(questionmark) OVER (ORDER BY random()) as total
FROM
t
)s
WHERE total <= 25
Note:
This returns a records list with no more than 25, but as close as possible to it with an random order.
To find an exact match of your value is some sort of combinatorical problem which shouldn't be solved in a database. Especially when there's a random factor. What if your current SUM is 22 and the next randomly chosen value is 4. Would you retry maybe until infinity to randomly find a value = 3? Or are you trying to remove an already counted record with value = 1?

Postgres SQL How can I use group by and exclude the order by clause in it

I created a similar post and thought I had the answer but I didn't . What I would like to do is to get the nearest cities/towns given a certain latitudes and longitude while making the location column distinct . If you can see in my screenshot below the location of Orlando pops up twice, I would like to make that column distinct so that it can ignore the 2nd Orlando record . I am getting the nearest cities/towns correctly given the point of (28.458414,-81.395258) . The issue is that I have many records for big cities that have different coordinates within the same city, if you see I have slightly different coordinates for the 2 Orlando records . Any suggestion would be great . I am using postgres 10
SELECT location,ABS(28.458414 - latitudes) + ABS(-81.395258-longitudes) as distance FROM zips
group by location,latitudes,longitudes
ORDER BY distance asc limit 5
I have also done
SELECT distinct location, ABS(28.458414 - latitudes) + ABS(-81.395258-longitudes) as distance FROM zips
ORDER BY distance asc limit 5
This is being used to show users options of near by towns and it would look wrong to show 2 towns/cities twice .
Not sure on what basis you want to ignore the second Orlando record. If there was an ID column indicating the order of rows, you could get the highest or least using row_number() or DISTINCT ON. Using MIN or MAX may be an option you could try, removing latitudes and longitudes from group by
SELECT location,MIN(distance) as distance
FROM
(
SELECT location, ABS(28.458414 - latitudes) + ABS(-81.395258 -longitudes )
as distance FROM zips
) group by location
ORDER BY distance asc limit 5;

QGis SQL Query - "Deleting almost duplicates entries"

I have a table with a distance matrix between all the points of an other table. On the distance matrix, I just kept the lignes with a distance less than 100m.
I call the points placed less than 100 m away from eachother duplicates entries. But on the distance matrix, each duplicates entry takes 2 lines
The distance matrix presents like this :
InputID TargetID Distance
1 2 75
1 3 35
2 1 75
3 1 35
I’d like to keep just one of those duplicates entry, which means that on the previous exemple I’d like to keep only the ligne of the 1, because the 2 and the 3 are placed less than 100m away of the 1. But if I only keep the 1 on the distance matrix, I also need to keep only the 1 on my original table.
I use the SQL Query tool of QGis but I don’t really know how to program. Can anyone help me please ?
Thanks !
You could use some subquery in join for retrive the value to delete
delete from my_table m2
inner join (
select m.distance, min(m.InputId) min_id
from my_table m.
inner join (
select distance, count(*)
from my_table
group by Distance
having count(*) > 1
) t on t.distance = m.distance
group by distance
) t2 on t2.distance = m2.distance and t2.min_id = m2.InputId