How to get around a GEOS error when doing st_union? - sql

I have a big layer with lines, and a view that needs to calculate the length of these lines without counting their overlaps
A working query that does half the job (but does not account for the overlap, so overestimates the number)
select name, sum(st_length(t.geom)) from mytable t where st_isvalid(t.geom) group by name
The intended query that returns SQL Error [XX000]: ERROR: GEOSUnaryUnion: TopologyException: found non-noded intersection between LINESTRING (446659 422287, 446661 422289) and LINESTRING (446659 422288, 446660 422288) at 446659.27944086661 422288.0015405959
select name,st_length(st_union(t.geom)) from mytable t where st_isvalid(t.geom) group by name
The thing is that the later works fine for the first 200 rows, it's only when I try to export the entire view that I get the error
Would there be a way to use the preferred query first, and if it returns an error on a row use the other one? Something like:
case when st_length(st_union(t.geom)) = error then sum(st_length(t.geom))
else st_length(st_union(t.geom)) end

Make sure your geometries are valid before union by wrapping them in ST_MakeValid(). You can also query their individual validity using select id, ST_IsValid(t.geom) from mytable; to maybe filter out or correct the affected ones. In cases where one of you geometries is itself invalid in this way, it'll help. This will still leave cases where the invalidity appears after combining multiple valid geometries together.
See if ST_UnaryUnion(ST_Collect(ST_MakeValid(t.geom))) changes anything. It will try to dissolve and node the component linestrings.
When really desperate, you can make a PL/pgSQL wrapper around both of your functions and switch to the backup one in the exception block.
At the expense of some precision and with the benefit of a bit higher performance, you could try snapping them to grid ST_Union(ST_SnapToGrid(t.geom,1e-7)), gradually increasing the grid size to 1e-6, 1e-5. Some geometries could be not actually intersecting, but be so close, PostGIS can't tell at the precision it operates at. You can also try applying this only to your problematic geometries, if you can pinpoint them.
As reminded by #dr_jts PostGIS 3.1.0 includes a new overlay engine, so if your select postgis_full_version(); shows anything below that and GEOS 3.9.0, it might be worth upgrading. The upcoming PostGIS 3.2.0 with GEOS 3.10.1 should also provide some iprovement in validity checks.
Here's a related thread.

Related

sdo_relate ORA-13343: a polygon geometry has fewer than four coordinates

I am trying to do a SDO_relate however it is returning an error.
My code:
ON sdo_relate (f.tls_da_location, ntp.boundary, 'MASK=ANYINTERACT') = 'TRUE'
WHERE ntp.boundary IS NOT NULL
I have tried this and sdo_util.getnumvertices (ntp.boundary) > 4
However it still returns the error below:
ORA-29902: error in executing ODCIIndexStart() routine
ORA-13249: Internal error: Memory Resident R-tree
ORA-13343: a polygon geometry has fewer than four coordinates
ORA-06512: at "MDSYS.SDO_INDEX_METHOD_10I", line 333
12801. 00000 - "error signaled in parallel query server %s"
*Cause: A parallel query server reached an exception condition.
*Action: Check the following error message for the cause, and consult
your error manual for the appropriate action.
*Comment: This error can be turned off with event 10397, in which
case the server's actual error is signaled instead.
Does anyone have any other suggestions to ignore these polygons that don't have 4 points?
There are several aspects here ...
First: the optimizer is free to apply the predicates in any order. In your case, it looks like it applies the spatial filter first, then applies the selector on number of vertices. Which means you get the exception before the test on number of vertices.
That the optimizer does this is natural: changing the order would mean a full table scan to only return the geometries with 4 points or more, then pass the result through the spatial filter. That would be very slow, and the optimizer rightfully prefers using the index first.
There is no mechanism (hints or otherwise) to control this behavior. Using a subquery or a view will not make any difference: the optimizer will flatten the query into a simple one. Possibly a subquery with a NO_MERGE hint could work: but it would have the above effect of forcing a full table scan and a full pass of all geometries through the spatial filter. Not a good thing.
Second: polygons with less than 4 vertices are incorrect. The simplest polygon is a triangle. It has three points (A-B-C), but all polygons must close, i.e. be encoded as four vertices: A-B-C-A. That is one of the rules defined by the OGC Simple Features for SQL specification. There are others that polygons must adhere to:
Absence of redundant vertices
Orientation (counter-clockwise for outer rings, clockwise for inner rings, i.e. holes)
Absence of self touching rings
Ordering of the rings (an outer ring must be followed by its inner rings)
Shapes that do not adhere to the rules are invalid. What happens when you use invalid shapes is actually undefined. Depending on the nature of the error and the action you do on this shape (query, measure, buffer, clip, merge ...), you may get any of the following behaviors:
the error is ignored and you get a correct result
you get an exception (that is your case)
you get no error, but the result is incorrect
The worst possible outcome is #3: you cannot trust the results of your application. It may return the wrong area in m2 of a parcel. Or it may say that two adjacent parcels do not overlap, when in reality they do ... This is very bad.
Data quality is of prime importance when manipulating and processing spatial data. Note that errors are generally not visible: most map-mapping tools are resilient enough to still show the shapes, and defects are for the most impossible to detect visually.
The solution is simple: make sure your data is valid. For that you can use the SDO_GEOM.VALIDATE_GEOMETRY_WITH_CONTEXT(). Run it over each shape. It will tell you which shapes are incorrect, and what the error is.
There is also SDO_UTIL.RECTIFY_GEOMETRY(). This one will attempt to correct the most common errors:
Removes redundant vertices
Reorients and re-orders rings
Corrects some self-orientations
It does not correct the errors you see (less than four points) because it is uncertain of what is actually wrong. You need to look at what those shapes are, and more important where they came from. Then either correct them or remove them.

Librato composite error: What does: Unable to execute composite: ["error": "Requested MD data from SD endpoint"]. mean?

I want to create an alert that triggers whenever one of the following counter statistics is not zero:
a.b.c.failed
a.b.e.failed
I already use these statistics separately on a dashboard page, but as they occur rarely, I'd like an alert.
It appears I have to make a sum composite so that I can trigger the alert when the sum is above zero. I think the composite would look something like:
sum(series("a.b.*.failed",{}))
However, every attempt I make gives the error:
Unable to execute composite: ["error": "Requested MD data from SD endpoint"]
There is another thread that suggested replacing the {} with "*" (including the quotes). This no longer gives an error, but gives a bizarre result (it's above zero all the time, even though there only very rarely any 'failed' statistics above zero).
The correct expression for my case is:
sum(derive(series("a.b.*.failed","*")))
Using "*" works to select the source.
Derive gives the change of each statistic instead of the cumulative total (but I'm not sure why the cumulative total was showing up - it is not shown normally for these statistics).
Sum adds the change of the different statistics.
I don't understand why {} doesn't work - I think that is related to the mystery of the meaning the error message that uses undocumented terminology (MD and SD endpoints). Librato documentation of their composite statistics function language is very minimal and provides few examples and few explanations of the meaning of terms and technical foundations.

SQL Server 2008+ : Best method for detecting if two polygons overlap?

We have an application that has a database full of polygons (currently stored as points) that a .net app pulls out and checks if they overlap.
I occurred to me that it would be much nicer to convert these point arrays to polygon / polyline objects within the database and use sql to get a bool of weather they overlap or not.
I have seen different methods suggested to do this but non of the examples given were quite in-line with my needs.
I would be very happy to receive input from those kind enough to offer their experience.
Additional:
In response to questions: It is indeed 2D. and yes any crossover of the two is considered true. The polygons have n points and can be concave. The polygons will be saved as 1 per row (after data conversion task) as polygons (i.e. the polygon type .. it might be called something else spatial / geom my memory is not on my side right now)
You can use .STIntersection with .STAsText() to test for overlapping polygons. (I really hate the terminology Microsoft has used (or whoever set the standard terms). "Touching," in my mind, should be a test for whether or not two geometry/geography shapes overlap at all, not just share a border.)
Anyway....
If #RadiusGeom is a geometry representing a radius from a point, the following will return a list of any two polygons where an intersection (a geometry that represents the area where two geometries overlap) is not empty.
SELECT CT.ID AS CTID, CT.[Geom] AS CensusTractGeom
FROM CensusTracts CT
WHERE CT.[Geom].STIntersection(#RadiusGeom).STAsText() <> 'GEOMETRYCOLLECTION EMPTY'
If your geometry field is spatially indexed, this runs pretty quickly. I ran this on 66,000 US CT records in about 3 seconds. There may be a better way, but since no one else had an answer, this was my attempt at an answer for you. Hope it helps!
Calculate and store the bounding rectangle of each polygon in a set of new fields within the row which is associated with that polygon. (I assume you have one; if not, create one.) When your dotnet app has a polygon and is looking for overlapping polygons, it can fetch from the database only those polygons whose bounding rectangles overlap, using a relatively simple SQL SELECT statement. Those polygons should be relatively few, so this will be efficient. Then, your dotnet app can perform the finer polygon overlap calculations in order to determine which ones of those really overlap.
Okay, I got another idea, so I am posting it as a different answer. I think my previous answer with the bounding polygons probably has some merit on its own, even if it was to reduce the number of polygons fetched from the database by a small percentage, but this one is probably better.
MSSQL supports integration with the CLR since version 2005. This means that you can define your own data type in an assembly, register the assembly with MSSQL, and from that moment on MSSQL will be accepting your user-defined data type as a valid type for a column, and it will be invoking your assembly to perform operations with your user-defined data type.
An example article for this technique on the CodeProject: Creating User-Defined Data Types in SQL Server 2005
I have never used this mechanism, so I do not know details about it, but I presume that you should be able to either define a new operation on your data type, or perhaps overload some existing operation like "less-than", so that you can check if one polygon intersects another. This is likely to speed things up a lot.

Efficient way to compute accumulating value in sqlite3

I have an sqlite3 table that tells when I gain/lose points in a game. Sample/query result:
SELECT time,p2 FROM events WHERE p1='barrycarter' AND action='points'
ORDER BY time;
1280622305|-22
1280625580|-9
1280627919|20
1280688964|21
1280694395|-11
1280698006|28
1280705461|-14
1280706788|-13
[etc]
I now want my running point total. Given that I start w/ 1000 points,
here's one way to do it.
SELECT DISTINCT(time), (SELECT
1000+SUM(p2) FROM events e WHERE p1='barrycarter' AND action='points'
AND e.time <= e2.time) AS points FROM events e2 WHERE p1='barrycarter'
AND action='points' ORDER BY time
but this is highly inefficient. What's a better way to write this?
MySQL has #variables so you can do things like:
SELECT time, #tot := #tot+points ...
but I'm using sqlite3 and the above isn't ANSI standard SQL anyway.
More info on the db if anyone needs it: http://ccgames.db.94y.info/
EDIT: Thanks for the answers! My dilemma: I let anyone run any
single SELECT query on "http://ccgames.db.94y.info/". I want to give
them useful access to my data, but not to the point of allowing
scripting or allowing multiple queries with state. So I need a single
SQL query that can do accumulation. See also:
Existing solution to share database data usefully but safely?
SQLite is meant to be a small embedded database. Given that definition, it is not unreasonable to find many limitations with it. The task at hand is not solvable using SQLite alone, or it will be terribly slow as you have found. The query you have written is a triangular cross join that will not scale, or rather, will scale badly.
The most efficient way to tackle the problem is through the program that is making use of SQLite, e.g. if you were using Web SQL in HTML5, you can easily accumulate in JavaScript.
There is a discussion about this problem in the sqlite mailing list.
Your 2 options are:
Iterate through all the rows with a cursor and calculate the running sum on the client.
Store sums instead of, or as well as storing points. (if you only store sums you can get the points by doing sum(n) - sum(n-1) which is fast).

long running queries: observing partial results?

As part of a data analysis project, I will be issuing some long running queries on a mysql database. My future course of action is contingent on the results I obtain along the way. It would be useful for me to be able to view partial results generated by a SELECT statement that is still running.
Is there a way to do this? Or am I stuck with waiting until the query completes to view results which were generated in the very first seconds it ran?
Thank you for any help : )
In general case the partial result cannot be produced. For example, if you have an aggregate function with GROUP BY clause, then all data should be analysed, before the 1st row is returned. LIMIT clause will not help you, because it is applied after the output is computed. Maybe you can give a concrete data and SQL query?
One thing you may consider is sampling your tables down. This is good practice in data analysis in general to get your iteration speed up when you're writing code.
For example, if you have table create privelages and you have some mega-huge table X with key unique_id and some data data_value
If unique_id is numeric, in nearly any database
create table sample_table as
select unique_id, data_value
from X
where mod(unique_id, <some_large_prime_number_like_1013>) = 1
will give you a random sample of data to work your queries out, and you can inner join your sample_table against the other tables to improve speed of testing / query results. Thanks to the sampling your query results should be roughly representative of what you will get. Note, the number you're modding with has to be prime otherwise it won't give a correct sample. The example above will shrink your table down to about 0.1% of the original size (.0987% to be exact).
Most databases also have better sampling and random number methods than just using mod. Check the documentaion to see what's available for your version.
Hope that helps,
McPeterson
It depends on what your query is doing. If it needs to have the whole result set before producing output - such as might happen for queries with group by or order by or having clauses, then there is nothing to be done.
If, however, the reason for the delay is client-side buffering (which is the default mode), then that can be adjusted using "mysql-use-result" as an attribute of the database handler rather than the default "mysql-store-result". This is true for the Perl and Java interfaces: I think in the C interface, you have to use an unbuffered version of the function that executes the query.