remove_duplicate_vertices in Oracle Spatial converted polygon/multipolygon types to collections, How do I convert these back to multipolygons? - oracle-spatial

I have run REMOVE_DUPLICATE_VERTICES on a spatial table which contained polygons and multipolygons, now there are 6 geometries tagged as Collection (2004). How can I safely change these back to multipolygons?
Example of one (with shortened coodinate array)
MDSYS.SDO_GEOMETRY(2004,8307,NULL,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1,5,1003,1,13,1003,1,21,1003,1,279,2003,1,581,2003,1,961,2003,1,1551,2003,1,2073,2003,1,2215,2003,1,2277,2003,1,2349,2003,1,2379,2003,1,2671,2003,1,2847,2003,1,3033,2003,1,3145,2003,1,3263,2003,1,3271,2003,1,3403,2003,1),MDSYS.SDO_ORDINATE_ARRAY(-89.60549292,-1.30359069399998,...,-89.6571104179999,-0.900517332999925))

One way to do this would be:
declare wgeom mdsys.sdo_geometry; outgeom mdsys.sdo_geometry;
begin
for i in 1 .. nvl(sdo_util.getNumElem(ingeom) ,0)
loop
wgeom := sdo_util.extract(ingeom, i, 0);
if wgeom.get_gtype() in (3,7) then
outgeom := sdo_util.append(outgeom, wgeom);
end if;
end loop;
It loops through the geometry's elements and for each one that is a polygon appends it to a geometry variable. You can put it in a function which you 'feed' with your geometry (which should be a valid geometry) and it outputs the filtered outgeom.
I use this approach because it is easy to put logic in the loop to filter out e.g. elements of under a certain area e.t.c. I use in production, with millions of geometries.
If you need further assistance, let me know.

Related

How to speed up custom window function with arrays and loops in PostgreSQL?

I'm currently learning UDFs and wrote the PostgreSQL UDF below to calculate the mean average deviation (MAD). It is the average absolute difference between the mean and the current value over any window.
In python pandas/numpy, to find the MAD, we could write something like this:
series_mad = abs(series - series.mean()).mean()
Where series is a set of numbers and series_mad is a single numeric value representing the MAD of the series.
I'm trying to write this in PostgreSQL using Windows and UDF. So far, this is what I have:
CREATE TYPE misc_tuple AS (
arr_store numeric[],
ma_period integer
);
CREATE OR REPLACE FUNCTION mad_func(prev misc_tuple, curr numeric, ma_period integer)
RETURNS misc_tuple AS $$
BEGIN
IF curr is null THEN
RETURN (null::numeric[], -1);
ELSEIF prev.arr_store is null THEN
RETURN (ARRAY[curr]::numeric[], ma_period);
ELSE
-- accumulate new values in array
prev.arr_store := array_append(prev.arr_store, curr);
RETURN prev;
END IF;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION mad_final(prev misc_tuple)
RETURNS numeric AS $$
DECLARE
total_len integer;
count numeric;
mad_val numeric;
mean_val numeric;
BEGIN
count := 0;
mad_val := 0;
mean_val := 0;
total_len := array_length(prev.arr_store, 1);
-- first loop to find the mean of the series
FOR i IN greatest(1,total_len-prev.ma_period+1)..total_len
LOOP
mean_val := mean_val + prev.arr_store[i];
count := count + 1;
END LOOP;
mean_val := mean_val/NULLIF(count,0);
-- second loop to subtract mean from each value
FOR i IN greatest(1,total_len-prev.ma_period+1)..total_len
LOOP
mad_val := mad_val + abs(prev.arr_store[i]-mean_val);
END LOOP;
RETURN mad_val/NULLIF(count, 0);
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE AGGREGATE mad(numeric, integer) (
SFUNC = mad_func,
STYPE = misc_tuple,
FINALFUNC = mad_final
);
This is how I'm testing the performance:
-- find rolling 12-period MAD
SELECT x,
mad(x, 12) OVER (ROWS 12-1 PRECEDING)
FROM generate_series(0,1000000) as g(x);
Currently, it takes ~45-50 secs on my desktop (i5 4670, 3.4 GHz, 16 GB RAM). I'm still learning UDFs, so I'm not sure what else I could do to my function to make it faster. I have a few other similar UDFs - but ones which don't use arrays and they take <15 secs on the same 1m rows. My guess is maybe I'm not efficiently looping the arrays or something could be done about the 2 loops in the UDF.
Are there any changes I can make to this UDF to make it faster?
Your example code does not work, you have an extra comma in the type definition, and you use an undefined variable cnt in one of the functions.
Why are you specifying 12 as both an argument to the aggregate itself, and in the ROWS PRECEDING? That seems redundant.
Your comparison to numpy doesn't seem very apt, as that is not a sliding window function.
I have a few other similar UDFs - but ones which don't use arrays and they take <15 secs on the same 1m rows
Are they also used as sliding window functions? Also written in plpgsql? Could you show one and it's usage?
pl/pgsql is not generally a very efficient language, especially not when manipulating large arrays. Although in your usage, the arrays never get very large, so I would not expect that to be particularly a problem.
One way to make it more efficient would be to write the code in C rather than pl/pgsql, using INTERNAL datatype rather than an SQL composite type.
Another way to improve this particular usage (a large number of windows each of which is small) might be to implement the MINVFUNC function and friends for this aggregate, so that it doesn't have to keep restarting the aggregation from scratch for every row.
Here is an example inverse function, which doesn't change the output at all, but does cut the run time in about half:
CREATE OR REPLACE FUNCTION mad_invfunc(prev misc_tuple, curr numeric, ma_period integer)
RETURNS misc_tuple AS $$
BEGIN
-- remove prev value
prev.arr_store := prev.arr_store[2:];
RETURN prev;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE AGGREGATE mad(numeric, integer) (
SFUNC = mad_func,
STYPE = misc_tuple,
FINALFUNC = mad_final,
MSFUNC = mad_func,
MSTYPE = misc_tuple,
MFINALFUNC = mad_final,
MINVFUNC = mad_invfunc
);
If I change the type from numeric to double precision everywhere they cuts the run time in half again. So while the loops over the array might not be terribly efficient, when using only 12-member windows they are not the main bottleneck.

How to remove multiple entries of same polygon from a multipolygon?

I'm trying to remove exactly similar polygons from a multipolygon geometry. I tried ST_RemoveRepeatedPoints but it doesn't seem to remove any of the geometry. Can anyone tell me how can I remove them ?
If difficult to do with a single sql query but easy with a plpgsql function.
The function ST_Dump() expand a multigeometry to single geometries. Than you can iterate over the single geometries and check for uniqueness:
CREATE or REPLACE FUNCTION clean_multipoly(input_multipoly geometry)
RETURNS GEOMETRY AS $$
DECLARE
single_poly geometry;
polygons_array geometry[];
poly_array_element GEOMETRY;
is_contained BOOLEAN;
BEGIN
-- initialize the array to a empty array
polygons_array = array[]::geometry[];
-- now iterate over the single polygons
FOR single_poly IN SELECT (ST_Dump(input_multipoly)).geom LOOP
is_contained = false;
-- Now you need the iterate again over the array you are building
-- and check every element if is equal to the actual single polygon.
-- You cannot use a array operator for checking if a element is already contained in the array,
-- because this would eliminate polygons which are different but have the same extent.
-- Only ST_Equals checks properly for geometries equality
FOREACH poly_array_element IN ARRAY polygons_array LOOP
IF ST_equals(single_poly, poly_array_element) THEN
is_contained = TRUE;
END IF;
END LOOP;
IF is_contained = FALSE THEN
polygons_array = array_append(polygons_array, single_poly);
END IF;
END LOOP;
RETURN ST_collect(polygons_array);
END;
$$
LANGUAGE plpgsql;
Use the function so:
SELECT clean_multipoly(your_geom) FROM your_table;

SQL to walk a graph and identify disconnected edges

I have a graph modelled in a table. Each row is an edge and contains two ints, being the from and to vertices.
Given a starting edge, I want to identify the edges which are not walkable - i.e. disconnected edges.
I'd like to do this with a loop function in Postgres, but I'm not sure how to manipulate the arrays while looping.
This is what I have so far. The string literals ending in ??? are the expressions I'm not sure how to do. Additionally, I've not created functions or used loops in PL/pgSQL before so I might have bad assumptions about how they work.
create or replace function find_connected_links(from_link_id int8)
returns table (link_id int8)
language plpgsql
as $$
declare
found_link_ids int8[];
examine_link_ids int8[] = ARRAY[from_link_id];
begin
while array_length(examine_link_ids, 1) > 0 LOOP
found_link_ids := array_prepend(
'get first element from examine_link_ids array ???', found_link_ids);
examine_link_ids := array_cat(
'get tail of examine_link_ids array ??? ',
'get a new array of link_ids that are connected but not already in found_link_ids or examine_link_ids ???');
return query select 'get first element from examine_link_ids array ???';
end LOOP
end $$

SQL not finding results

This query currently is returning no results, and it should. Can you see anything wrong with this query
field title are NEED_2_TARGET, ID, and CARD
NEED_2_TARGET = integer
CARD = string
ID = integer
value of name is 'Ash Imp'
{this will check if a second target is needed}
//**************************************************************************
function TFGame.checkIf2ndTargetIsNeeded(name: string):integer;
//**************************************************************************
var
targetType : integer; //1 is TCard , 2 is TMana , 0 is no second target needed.
begin
TargetType := 0;
Result := targetType;
with adoquery2 do
begin
close;
sql.Clear;
sql.Add('SELECT * FROM Spells WHERE CARD = '''+name+''' and NEED_2_TARGET = 1');
open;
end;
if adoquery2.RecordCount < 1 then
Result := 0
else
begin
Adoquery2.First;
TargetType := adoquery2.FieldByName(FIELD_TARGET_TYPE).AsInteger;
result := TargetType;
end;
end;
sql db looks like below
ID CARD TRIGGER_NUMBER CATEGORY_NUMBER QUANTITY TARGET_NUMBER TYPE_NUMBER PLUS_NUMBER PERCENT STAT_TARGET_NUMBER REPLACEMENT_CARD_NUMBER MAX_RANDOM LIFE_TO_ADD REPLACED_DAMAGE NEED_2_TARGET TYPE_OF_TARGET
27 Ash Imp 2 2 15 14 1 1
There are a number of things that could be going wrong.
First and most important in your trouble-shooting is to take your query and run it directly against your database. I.e. first confirm your query is correct by eliminating possibilities of other things going wrong. More things confirmed working, the less "noise" to distract you from solving the problem.
As others having pointed out if you're not clearing your SQL statement, you could be returning zero rows in your first result set.
Yes I know, you've since commented that you are clearing your previous query. The point is: if you're having trouble solving your problem, how can you be sure where the problem lies? So, don't leave out potentially relevant information!
Which bring us neatly to the second possibility. I can't see the rest of your code, so I have to ask: are you refreshing your data after changing your query? If you don't Close and Open your query, you may be looking at a previous execution's result set.
I'm unsure whether you're even allowed to change your query text while the component is Active, or even whether that depends on exactly which data access component you're using. The point is, it's worth checking.
Is your application connecting to the correct database? Since you're using Access, it's very easy to be connected to a different database file without realising it.
You can check this by changing your query to return all rows (i.e. delete the WHERE clause).
You my want to change the quotes used in your SQL query. Instead of: ...CARD = "'+name+'" ORDER... rather use ...CARD = '''+name+''' ORDER...
As far as I'm aware single quotes is the ANSI standard. Even if some databases permit double quotes, using them limits portability, and may produce unexpected results when passed through certain data access drivers.
Check the datatype of your CARD column. If it's a fixed length string, then the data values will be padded. E.g. if CARD is char(10), then you might actually need to look for 'Ash Imp '.
Similarly, the actual value may contain spaces before / after the words. Use select without WHERE and check the actual value of the column. You could also check whether SELECT * FROM Spells WHERE CARD LIKE '%Ash Imp%' works.
Finally, as others have suggested, you're better off using a parameterised query rather dynamically building the query up yourself.
Your code will be more readable and flexible.
You can make your code strongly typed; and so avoid converting things like numbers and dates into strings.
You won't need to worry about the peculiarities of date formatting.
You eliminate some security concerns.
#GordonLinoff all fields in db are all caps
If that is true then that is your problem. SQL usually performs case sensitive comparisons of character/string values unless you tell it not to do so, such as with STRCMP() (MySQL 4+), LOWER() or UPPER() (SQLServer, Firebird), etc. I would also go as far as wrapping the conditions in parenthesis as well:
sql.Text := 'SELECT * FROM Spells WHERE (NEED_2_TARGET = 1) AND (STRCMP(CARD, "'+name+'") = 0) ORDER by ID';
sql.Text := 'SELECT * FROM Spells WHERE (NEED_2_TARGET = 1) AND (LOWER(CARD) = "'+LowerCase(name)+'") ORDER by ID';
sql.Text := 'SELECT * FROM Spells WHERE (NEED_2_TARGET = 1) AND (UPPER(CARD) = "'+UpperCase(name)+'") ORDER by ID';
This is or was an issue with the
With Adoquery2 do
begin
...
end
when using name in the sql, it was really getting adoquery2.name not the var name. I fixed this by changing name to Cname had no more issues after that.

Oracle SQL and PL/SQL : how to minimize execution time of retrieving members of the object (returned by user's function)

I wrote a function to get a number of values in an Oracle view. As functions can't return more then one value, I have used an object (with a signature of 8 numbers). This works, but not fine...
The execution time of a select query (and selecting from view, based on this query) is proportional to retrieved members number, i.e.:
retrieving 1 attribute consumes 1 second (it's equal to retrieve a WHOLE object, but object value is unusable for report),
retrieving 2 attributes consumes 2 seconds,
and so on...
This looks like Oracle executes PL function to get every member of returned object.
I think that function, returning varray(8) of numbers will not solve the problem too: eight implicit calls must be replaced by eight explicit subqueries. Can anybody solve this problem? (Except to rewrite to use a function returning one string, which I will try myself now...)
Here is the type declaration:
create or replace type "ARD"."PAY_FINE_FR_12_" AS object
(fed1 number
, reg1 number
, fed_nach number
, reg_nach number
, fed_upl number
, reg_upl number
, fed2 number
, reg2 number);
I will assume you have given meaningful names to your type's attributes. In which case you are returning not eight numbers but four pairs of numbers. This suggests a possible way of improving things. Whether it could actually solve your problem will depend on the precise details of your situation (which you have not provided).
Here is a type representing those number pairs, and a nested table type we can use for array processing.
create or replace type pay_pair as object
( pay_cat varchar2(4)
, fed number
, reg number )
/
create or replace type pay_pair_nt as table of pay_pair
/
This is a function which populates an array with four pairs of numbers. In the absence of any actual business rule I have plumped for the simplest possible example.
create or replace function get_pay_pairs
return pay_pair_nt
is
return_value pay_pair_nt;
begin
select
pay_pair (
case col1
when 1 then 'one'
when 2 then 'nach'
when 3 then 'upl'
when 4 then 'two'
else null;
end
, fed
, pay )
bulk collect into return_value
from v23;
return return_value;
end;
/
If you need the signature of the original type you can rewrite your function like this:
create or replace function get_pay_fine
return PAY_FINE_FR_12_
is
return_value PAY_FINE_FR_12_;
l_array pay_pair_nt;
begin
l_array := get_pay_pairs;
for i in 1..4 loop
case l_array(i).pay_cat
when 'one' then
return_value.fed1 := l_array(i).fed;
return_value.reg1 := l_array(i).reg;
when 'nach' then
return_value.fed_nach := l_array(i).fed;
return_value.reg_nach := l_array(i).reg;
when 'upl' then
return_value.fed_upl := l_array(i).fed;
return_value.reg_upl := l_array(i).reg;
else
return_value.fed2 := l_array(i).fed;
return_value.reg2 := l_array(i).reg;
end case;
end loop;
return return_value;
end;
I'll repeat, this is a demonstration of available techniques rather than a proposed solution. The crux is how your view supplies the values.