st_within as a condition of insert - sql

I have built myself a tracker, and as part of the spec I gave myself for security reasons, i dont want people knowing where I leave my car overnight.
SO I have a concept of exclusion zones, I have the web map only showing data outside said exclusion zones, but I also only want to save data when transmitted that isnt within an exclusion zone (there can be more than one so am thinking subquery
can anyone help?
It is possible, if not preferrable that this be a stored proc, any ideas (I am useless when it comes to subqueries hence asking)
the SQL I am using to get the data (retrospective exclusion zones) is thus
SELECT geom
FROM public.data
WHERE layer = %layer_id% and not exists(
SELECT *
FROM public.exclusion_zone
WHERE layer = %layer_id% and ST_CONTAINS(the_geom, geom))

For example, this code returns all geometries (say, points) from public.data, which are not completely inside geometries (say, polygons) from public.exclusion_zone:
SELECT *
FROM public.data
WHERE the_geom NOT IN (
SELECT d.the_geom
FROM public.data d, public.exclusion_zone e
WHERE ST_Within (d.the_geom, e.the_geom)
);
or even better (assuming that operations with integer IDs are faster than to compare geometries):
SELECT * FROM public.data
WHERE id NOT IN (
SELECT d.id
FROM public.data d, public.exclusion_zone e
WHERE ST_Within (d.the_geom, e.the_geom)
);
See more: ST_Within

Related

Teiid not performing optimal join

For our Teiid Springboot project we use a row filter in a where clause to determine what results a user gets.
Example:
SELECT * FROM very_large_table WHERE id IN ('01', '03')
We want the context in the IN clause to be dynamic like so:
SELECT * FROM very_large_table WHERE id IN (SELECT other_id from very_small_table)
The problem now is that Teiid gets all the data from very_large_table and only then tries to filter with the where clause, this makes the query 10-20 times slower. The data in this very_small_tableis only about 1-10 records and it is based on the user context we get from Java.
The very_large_table is located on a Oracle database and the very_small_table is on the Teiid Pod/Container. Somehow I can't force Teiid to ship the data to Oracle and perform filtering there.
Things that I have tried:
I have specified the the foreign data wrappers as follows
CREATE FOREING DATA WRAPPER "oracle_override" TYPE "oracle" OPTIONS (EnableDependentsJoins 'true');
CREATE SERVER server_name FOREIGN DATA WRAPPER "oracle_override";
I also tried, exists statement or instead of a where clause use a join clause to see if pushdown happened. Also hints for joins don't seem to matter.
Sadly the performance impact at the moment is that high that we can't reach our performance targets.
Are there any cardinalities on very_small_table and very_large_table? If not the planner will assume a default plan.
You can also use a dependent join hint:
SELECT * FROM very_large_table WHERE id IN /*+ dj */ (SELECT other_id from very_small_table)
Often, exists performs better than in:
SELECT vlt.*
FROM very_large_table vlt
WHERE EXISTS (SELECT 1 FROM very_small_table vst WHERE vst.other_id = vlt.id);
However, this might end up scanning the large table.
If id is unique in vlt and there are no duplicates in vst, then a JOIN might optimize better:
select vlt.*
from very_small_table vst join
very_large_table vlt
on vst.other_id = vlt.id;

How to find tree nodes that don't have child nodes

Firebird Db stores chart accounts records in table:
CREATE TABLE CHARTACC
(
ACCNTNUM Char(8) NOT NULL, -- Account ID (Primary Key)
ACCPARNT Char(8), -- Parent ID
ACCCOUNT Integer, -- account count
ACCORDER Integer, -- order of children in nodes
ACCTITLE varchar(150),
ACDESCRP varchar(4000),
DTCREATE timestamp -- date and time of creation
)
I must write query which selects from table only last nodes e.g.nodes which haven't child nodes(child2, child3, subchild1, subchild2, subchild3 and subchild4).
The not in approach suggested by Jerry typically works quite slow in Interbase/Firebird/Yaffil/RedDatabase family, no indices used, etc.
Same goes for another possible representation Select X from T1 where
NOT EXISTS ( select * from t2 where t2.a = t1.b) - it can turn out really slow too.
I agree that those queries better represent what human wanted and hence are more readable, but still they're not recommended on Firebird. I was badly bitten in 1990-s when doing Herbalife-like app, I chosen this type of request wrapped in a loop to do monthly bottom-up tallying - update ... where not exists ... - and every iteration scaled as o(n^2) in Interbase 5.5. Granted, Firebird 3 made a long way since then, but this "direct" approach is still not recommended.
More SQL-traditional and FB-friendly way to express it, albeit less direct and harder to read, would be Select t1.x from t1 LEFT JOIN t2 on t1.a=t2.b WHERE t2.y IS NULL
Your query needs to work something like:
select * from CHARTACC where ACCNTNUM not in (select ACCPARNT from CHARTACC)
To put it into terms, select items from this table where its identifier is not found in the same table anywhere in its parent field.

Reusing results from a SQL query in a following query in Sqlite

I am using a recursive with statement to select all child from a given parent in a table representing tree structured entries. This is in Sqlite (which now supports recursive with).
This allows me to select very quickly thousands of record in this tree whithout suffering the huge performance loss due to preparing thousands of select statements from the calling application.
WITH RECURSIVE q(Id) AS
(
SELECT Id FROM Entity
WHERE Parent=(?)
UNION ALL
SELECT m.Id FROM Entity AS m
JOIN Entity ON m.Id=q.Parent
)
SELECT Id FROM q;
Now, suppose I have related data to these entities in an arbitrary number of other tables, that I want to subsequently load. Due to the arbitrary number of them (in a modular fashion) it is not possible to include the data fetching directly in this one. They must follow it.
But, if for each related tables I then do a SELECT statement, all the performance gain from selecting all the data from the tree directly inside Sqlite is almost useless because I will still stall on thousands of subsequent requests which will each prepare and issue a select statement.
So two questions :
The better solution is to formulate a similar recursive statement for each of the related tables, that will recursively gather the entities from this tree again, and this time select their related data by joining it.
This sounds really more efficient, but it's really tricky to formulate such a statement and I'm a bit lost here.
Now the real mystery is, would there be an even more efficient solution, which would be to somehow keep these results from the last query cached somewhere (the rows with the ids from the entity tree) and join them to the related tables in the following statement without having to recursively iterate over it again ?
Here is a try at the first option, supposing I want to select a field Data from related table Component : is the second UNION ALL legal ?
WITH RECURSIVE q(Data) AS
(
SELECT Id FROM Entity
WHERE Parent=(?)
UNION ALL
SELECT m.Id FROM Entity AS m
JOIN Entity ON m.Id=q.Parent
UNION ALL
SELECT Data FROM Component AS c
JOIN Component ON c.Id=q.Id
)
SELECT Data FROM q;
The documentation says:
 2. The table named on the left-hand side of the AS keyword must appear exactly once in the FROM clause of the right-most SELECT statement of the compound select, and nowhere else.
So your second query is not legal.
However, the CTE behaves like a normal table/view, so you can just join it to the related table:
WITH RECURSIVE q(Id) AS
( ... )
SELECT q.Id, c.Data
FROM q JOIN Component AS c ON q.Id = c.Id
If you want to reuse the computed values in q for multiple queries, there's nothing you can do with CTEs, but you can store them in a temporary table:
CREATE TEMPORARY TABLE q_123 AS
WITH RECURSIVE q(Id) AS
( ... )
SELECT Id FROM q;
SELECT * FROM q_123 JOIN Component ...;
SELECT * FROM q_123 JOIN Whatever ...;
DROP TABLE q_123;

Will a LEFT JOIN View with GROUP BY inside of it do the right thing?

Can I count on SQLite on doing "the right thing (TM)"?
CREATE TABLE IF NOT EXISTS user.Log (
term TEXT,
seen DATETIME
);
CREATE INDEX IF NOT EXISTS user.Log_term ON Log(term);
CREATE VIEW IF NOT EXISTS user.History AS
SELECT term, COUNT(1) as timesseen, MAX(seen) as lastseen
FROM user.Log GROUP BY term;
And then later
INNER JOIN History h ON h.term = t.term
Log could be in the 100,000s. I would like to know if SQLite will pass the h.term = t.term in to the View so that it only groups by terms which match the ON instead of grouping the whole table, and then applying the ON.
If this is a bad idea, a better way is requested. (Maybe the better way is to keep two tables, the Log and the summarized history.)
Typically, a group by clause will make a statement lose track of the originating row, so it won't use an index if you constrain your view with a join: you'll read the whole table regardless.
What you need to do is constrain it and then group. Avoid aggregate functions in views.

Sub-query Optimization Talk with an example case

I need advises and want to share my experience about Query Optimization. This week, I found myself stuck in an interesting dilemma.
I'm a novice person in mySql (2 years theory, less than one practical)
Environment :
I have a table that contains articles with a column 'type', and another table article_version that contain a date where an article is added in the DB, and a third table that contains all the article types along with types label and stuffs...
The 2 first tables are huge (800000+ fields and growing daily), the 3rd one is naturally small sized. The article tables have a lot of column, but we will only need 'ID' and 'type' in articles and 'dateAdded' in article_version to simplify things...
What I want to do :
A Query that, for a specified 'dateAdded', returns the number of articles for each types (there is ~ 50 types to scan).
What was already in place is 50 separate count, one for each document types oO ( not efficient, long(~ 5sec in general), ).
I wanted to do it all in one query and I came up with that :
SELECT type,
(SELECT COUNT(DISTINCT articles.ID)
FROM articles
INNER JOIN article_version
ON article_version.ARTI_ID = legi_arti.ID
WHERE type = td.NEW_ID
AND dateAdded = '2009-01-01 00:00:00') AS nbrArti
FROM type_document td
WHERE td.NEW_ID != ''
GROUP BY td.NEW_ID;
The external select (type_document) allow me to get the 55 types of documents I need.
The sub-Query is counting the articles for each type_document for the given date '2009-01-01'.
A common result is like :
* type * nbrArti *
*************************
* 123456 * 23 *
* 789456 * 5 *
* 16578 * 98 *
* .... * .... *
* .... * .... *
*************************
This query get the job done, but the join in the sub-query is making this extremely slow, The reason, if I'm right, is that a join is made by the server for each types, so 50+ times, this solution is even more slower than doing the 50 queries independently for each types, awesome :/
A Solution
I came up with a solution myself that drastically improve the performance with the same result, I just created a view corresponding to the subQuery, making the join on ids for each types... And Boom, it's f.a.s.t.
I think, correct me if I'm wrong, that the reason is the server only runs the JOIN statement once.
This solution is ~5 time faster than the solution that was already there, and ~20 times faster than my first attempt. Sweet
Questions / thoughts
With yet another view, I'll now need to check if I don't loose more than win when documents get inserted...
Is there a way to improve the original Query, by getting the JOIN statement out of the sub-query? (And getting rid of the view)
Any other tips/thoughts? (In Server Optimizing for example...)
Apologies for my approximating English, it'is not my primary language.
You cannot create a single index on (type, date_added), because these fields are in different tables.
Without the view, the subquery most probably selects article as a leading table and the index on type which is not very selective.
By creating the view, you force the subquery to calculate the sums for all types first (using a selective the index on date) and then use a JOIN BUFFER (which is fast enough for only 55 types).
You can achieve similar results by rewriting your query as this:
SELECT new_id, COALESCE(cnt, 0) AS cnt
FROM type_document td
LEFT JOIN
(
SELECT type, COUNT(DISTINCT article_id) AS cnt
FROM article_versions av
JOIN articles a
ON a.id = av.article_id
WHERE av.date = '2009-01-01 00:00:00'
GROUP BY
type
) q
ON q.type = td.new_id
Unfortunately, MySQL is not able to do table spools or hash joins, so to improve the performance you'll need to denormalize your tables: add type to article_version and create a composite index on (date, type).