QGis SQL Query - "Deleting almost duplicates entries" - sql

I have a table with a distance matrix between all the points of an other table. On the distance matrix, I just kept the lignes with a distance less than 100m.
I call the points placed less than 100 m away from eachother duplicates entries. But on the distance matrix, each duplicates entry takes 2 lines
The distance matrix presents like this :
InputID TargetID Distance
1 2 75
1 3 35
2 1 75
3 1 35
I’d like to keep just one of those duplicates entry, which means that on the previous exemple I’d like to keep only the ligne of the 1, because the 2 and the 3 are placed less than 100m away of the 1. But if I only keep the 1 on the distance matrix, I also need to keep only the 1 on my original table.
I use the SQL Query tool of QGis but I don’t really know how to program. Can anyone help me please ?
Thanks !

You could use some subquery in join for retrive the value to delete
delete from my_table m2
inner join (
select m.distance, min(m.InputId) min_id
from my_table m.
inner join (
select distance, count(*)
from my_table
group by Distance
having count(*) > 1
) t on t.distance = m.distance
group by distance
) t2 on t2.distance = m2.distance and t2.min_id = m2.InputId

Related

Find neighboring polygons with maximum of 3 other polygons

I have a case like the following picture
Say I have 9 polygons, and want to get a polygon that is maximum neighbors with 3 other polygons such as polygons 1, 3, 7, 9 (yellow)
I think this is done using ST_Touches in postgis, but I just come up with represent it in postgis code like
select a.poly_name, b.poly_name from tb a, tb b where ST_Touches(a.geom, b.geom)
And say I want to output this like:
poly_name poly_name
1 2
1 4
1 5
So how I get idea to done with this?
Your hint with ST_Touches is correct, however to get the amount of neighbor cells from one column related to other records in the same table you either need to run a subquery or call the table twice in the FROM clause.
Given the following grid on a table called tb ..
.. you can filter the cells with three neighbor cells or less like this:
SELECT * FROM tb q1
WHERE (
SELECT count(*) FROM tb q2
WHERE ST_Touches(q2.geom,q1.geom)) <=3;
If you want to also list which are the neighbor cells you might wanna first join the cells that touch in the WHERE clause and in a subquery or CTE count the results:
WITH j AS (
SELECT
q1.poly_name AS p1,q2.poly_name p2,
COUNT(*) OVER (PARTITION BY q1.poly_name) AS qt
FROM tb q1, tb q2
WHERE ST_Touches(q2.geom,q1.geom))
SELECT * FROM j
WHERE qt <= 3;
Demo: db<>fiddle
Further reading:
Create Hexagons (maybe relevant for your project)
Window Functions

SQL Server: Two COUNTs in one query multiplying with one another in output

I have a query is used to display information in a queue and part of that information is showing the amount of child entities (packages and labs) that belong to the parent entity (change). However instead of showing the individual counts of each type of child, they multiply with one another.
In the below case, there are supposed to be 3 labs and 18 packages, however the the multiply with one another and the output is 54 of each.
Below is the offending portion of the query.
SELECT cef.ChangeId, COUNT(pac.PackageId) AS 'Packages', COUNT(lab.LabRequestId) AS 'Labs'
FROM dbo.ChangeEvaluationForm cef
LEFT JOIN dbo.Lab
ON cef.ChangeId = Lab.ChangeId
LEFT JOIN dbo.Package pac
ON (cef.ChangeId = pac.ChangeId AND pac.PackageStatus != 6 AND pac.PackageStatus !=7)
WHERE cef.ChangeId = 255
GROUP BY cef.ChangeId
I feel like this is obvious but it's not occurring to me how to fix it so the two counts are independent of one another like to me they should be. There doesn't seem to be a scenario like this in any of my research either. Can anyone guide me in the right direction?
Because you do multiply source rows by each left join. So sometimes you have more likely cross join here.
SELECT cef.ChangeId, p.Packages, l.Labs
FROM dbo.ChangeEvaluationForm cef
OUTER APPLY(
SELECT COUNT(*) as Labs
FROM dbo.Lab
WHERE cef.ChangeId = Lab.ChangeId
) l
OUTER APPLY(
SELECT COUNT(*) AS Packages
FROM dbo.Package pac
WHERE (cef.ChangeId = pac.ChangeId AND pac.PackageStatus != 6 AND pac.PackageStatus !=7)
) p
WHERE cef.ChangeId = 255
GROUP BY cef.ChangeId
perhaps GROUP BY is not needed now.
From you question its difficult to derive what result do you expect from your query. So I presume you want following result:
+----------+----------+------+
| ChangeId | Packages | Labs |
+----------+----------+------+
| 255 | 18 | 3 |
+----------+----------+------+
Try below query if you are looking for above mentioned result.
SELECT cef.ChangeId, ISNULL(pac.PacCount, 0) AS 'Packages', ISNULL(Lab.LabCount, 0) AS 'Labs'
FROM dbo.ChangeEvaluationForm cef
LEFT JOIN (SELECT Lab.ChangeId, COUNT(*) LabCount FROM dbo.Lab GROUP BY) Lab
ON cef.ChangeId = Lab.ChangeId
LEFT JOIN (SELECT pac.ChangeId, COUNT(*) PacCount FROM dbo.Package pac WHERE pac.PackageStatus != 6 AND pac.PackageStatus !=7 GROUP BY pac.ChangeId) pac
ON cef.ChangeId = pac.ChangeId
WHERE cef.ChangeId = 255
Query Explanation:
In your query you didn't use group by, so it ended up giving you 54 as count which is Cartesian product.
In this query I tried to group by 'ChangeId' and find aggregate before joining tables. So 3 labs and 18 packages will be counted before join.
Your will also notice that I have moved PackageStatus filter before group by in pac table. So unwanted record won't mess with our count.
You start with a particular ChangeId from the dbo.ChangeEvaluationForm table (ChangeId = 255 from your example), then join to the dbo.Lab table. This join makes your result go from 1 row to 3, considering there are 3 Labs with ChangeId = 255. Your problem is on the next join, you are joining all 3 resulting rows from the previous join with the dbo.Package table, which has 18 rows for ChangeId = 255. The resulting count for columns pac.PackageId and lab.LabRequestId will then be 3 x 18 = 54.
To get what you want, there are 2 easy solutions:
Use COUNT DISTINCT instead of COUNT. This will just count the different values of pac.PackageId and lab.LabRequestId and not the repeated ones.
Split the joins into 2 subqueries and join their result (by ChangeId)

How to filter the max value and write to row?

Postgres 9.3.5, PostGIS 2.1.4.
I have tow tables (polygons and points) in a database.
I want to find out how many points are in each polygon. There will be 0 points per polygon or more than 200000. The little hick up is the following.
My point table looks the following:
x y lan
10 11 en
10 11 fr
10 11 en
10 11 es
10 11 en
- #just for demonstration/clarification purposes
13 14 fr
13 14 fr
13 14 es
-
15 16 ar
15 16 ar
15 16 ps
I do not simply want to count the number of points per polygon. I want to know what is the most often occuring lan in each polygon. So, assuming each - indicates that the points are falling into a new polygon, my results would look the following:
Polygon table:
polygon Count lan
1 3 en
2 2 fr
3 2 ar
This is what I got so far.
SELECT count(*), count.language AS language, hexagons.gid AS hexagonsWhere
FROM hexagonslan AS hexagons,
points_count AS france
WHERE ST_Within(count.geom, hexagons.geom)
GROUP BY language, hexagonsWhere
ORDER BY hexagons DESC;
It gives me the following:
Polygon Count language
1 3 en
1 1 fr
1 1 es
2 2 fr
2 1 es
3 2 ar
3 1 ps
Two things remain unclear.
How to get only the max value?
How will cases be treated where there are by any chance the max values identical?
Answer to 1.
To get the most common language and its count per Polygon, you could use a simple DISTINCT ON query:
SELECT DISTINCT ON (h.gid)
h.gid AS polygon, count(c.geom) AS ct, c.language
FROM hexagonslan h
LEFT JOIN points_count c ON ST_Within(c.geom, h.geom)
GROUP BY h.gid, c.language
ORDER BY h.gid, count(c.geom) DESC, c.language; -- language name is tiebreaker
Select first row in each GROUP BY group?
But for the data distribution you describe (up to 200.000 points per polygon), this should be substantially faster (hoping to make better use of an index on c.geom):
SELECT h.gid AS polygon, c.ct, c.language
FROM hexagonslan h
LEFT JOIN LATERAL (
SELECT c.language, count(*) AS ct
FROM points_count c
WHERE ST_Within(c.geom, h.geom)
GROUP BY 1
ORDER BY 2 DESC, 1 -- again, language name is tiebreaker
LIMIT 1
) c ON true
ORDER BY 1;
Optimize GROUP BY query to retrieve latest record per user
LEFT JOIN LATERAL .. ON true preserves polygons not containing any points.
Call a set-returning function with an array argument multiple times
In cases where there are by any chance the max values identical, the alphabetically first language is picked in the example, by way of the added ORDER BY item. If you want all languages that happen to share the maximum count, you have to do more:
Answer to 2.
SELECT h.gid AS polygon, c.ct, c.language
FROM hexagonslan h
LEFT JOIN LATERAL (
SELECT c.language, count(*) AS ct
, rank() OVER (ORDER BY count(*) DESC) AS rnk
FROM points_count c
WHERE ST_Within(c.geom, h.geom)
GROUP BY 1
) c ON c.rnk = 1
ORDER BY 1, 3 -- language only as additional sort critieria
Using the window function rank() here, (not row_number()!). We can get the count or points and the ranking of the count in a single SELECT. Consider the sequence of events:
Best way to get result count before LIMIT was applied

please help me in building a query for the below mentioned table in sql

i have a table name conversion and i have these below mentioned columns in it i want to multiply Length\width row elements l*w of 'dimension' values and display them in another new table
Please let me know if anything changes for the same logic in ms access
probably it is simple but i dont know exact query to solve the problem waiting for your solutions
ID area length/width dimensions **new column(L*W) here**
1 1 l 3 3*5=15
2 1 w 5
3 2 l 4
4 2 w 8
5 3 l 6
6 3 w 10
7 4 l 12
8 4 w 13
9 4 W 10
waiting for your reply
You could query the table twice: once for lengths and once for widths and then join by area and multiply the values:
select length.area, length.dimension * width.dimension
from
(select area, dimension from conversion where lenwidth = 'l') length
inner join
(select area, dimension from conversion where lenwidth = 'w') width
on length.area = width.area;
Two remarks:
I suppose that it is a typo that you have two width entries for area 4? Otherwise you would have to decide which value to take in above select statement.
It would not be a good idea to keep the old table and have a new table holding the results. What if you change a value? You would have to remember to change the result accordingly every time. So either ditch the old table or use a view instead of a new table.
Try this
select *,
dimensions*(lead(dimensions) over(order by id)) product
from table1;
Or if you want for the set of area then
select *,
case when length_width='l' and (lead(length_width) over(order by id))='w'
then dimensions*(lead(dimensions) over(order by id))
else 0
end as product
from table1;
fiddle

Calculating relative frequencies in SQL

I am working on a tag recommendation system that takes metadata strings (e.g. text descriptions) of an object, and splits it into 1-, 2- and 3-grams.
The data for this system is kept in 3 tables:
The "object" table (e.g. what is being described),
The "token" table, filled with all 1-, 2- and 3-grams found (examples below), and
The "mapping" table, which maintains associations between (1) and (2), as well as a frequency count for these occurrences.
I am therefore able to construct a table via a LEFT JOIN, that looks somewhat like this:
SELECT mapping.object_id, mapping.token_id, mapping.freq, token.token_size, token.token
FROM mapping LEFT JOIN
token
ON (mapping.token_id = token.id)
WHERE mapping.object_id = 1;
object_id token_id freq token_size token
+-----------+----------+------+------------+--------------
1 1 1 2 'a big'
1 2 1 1 'a'
1 3 1 1 'big'
1 4 2 3 'a big slice'
1 5 1 1 'slice'
1 6 3 2 'big slice'
Now I'd like to be able to get the relative probability of each term within the context of a single object ID, so that I can sort them by probability, and see which terms are most probably (e.g. ORDER BY rel_prob DESC LIMIT 25)
For each row, I'm envisioning the addition of a column which gives the result of freq/sum of all freqs for that given token_size. In the case of 'a big', for instance, that would be 1/(1+3) = 0.25. For 'a', that's 1/3 = 0.333, etc.
I can't, for the life of me, figure out how to do this. Any help is greatly appreciated!
If I understood your problem, here's the query you need
select
m.object_id, m.token_id, m.freq,
t.token_size, t.token,
cast(m.freq as decimal(29, 10)) / sum(m.freq) over (partition by t.token_size, m.object_id)
from mapping as m
left outer join token on m.token_id = t.id
where m.object_id = 1;
sql fiddle example
hope that helps