postgis spatial query assistance - indexing

Assuming I have three tables:
A. Municipalities (MultiPolygon)
B. Postcode centroids (Point)
C. User data (Point)
Entries from (C) match entries on (B) with FK (code).
I am looking for an efficient way to:
Count number of user data (C) in municipalities (A) using ST_Contains.
BUT
here is the catch:
If an entry in C is NULL (or matches another condition) use if exists the matching entry in B using the FK !!!
Currently I have tried various patterns and although spatially querying A & B and A & C both are sub-second, once I add them all together in one query (goal) the outcome is over 4secs
Sample of what I've tried:
This is the worse (60+ secs):
SELECT
A.*,
(SELECT FROM
(SELECT CASE WHEN C.GEOM IS NULL THEN B.GEOM ELSE C.GEOM END
FROM C LEFT JOIN B ON C.ID=B.ID) AS b
WHERE ST_CONTAINS(A.GEOM, b.GEOM)
) count
FROM
A
This is 15 sec:
SELECT
A.ID, ..., -- other A fields
COUNT(B.GEOM)
FROM
A,
(SELECT CASE WHEN C.GEOM IS NULL THEN B.GEOM ELSE C.GEOM END
FROM C LEFT JOIN B ON C.ID=B.ID) AS b
WHERE
ST_Contains(A.GEOM, b.GEOM)
GROUP BY
A.ID, ... -- other A fields
As I said
SELECT COUNT(*) FROM A LEFT JOIN B ON ST_Contains(A.GEOM, B.GEOM)
and
SELECT COUNT(*) FROM A LEFT JOIN C ON ST_Contains(A.GEOM, C.GEOM)
both return in under a second.
All indexes are in place for the foreign key as well (B.ID = C.ID)
Thanks

Did you make indexes for A.geom and B.geom?
It would be
CREATE INDEX idx_A ON A USING GIST ( GEOM );
VACUUM ANALYZE A (GEOM);
CREATE INDEX idx_B ON B USING GIST ( GEOM );
VACUUM ANALYZE B (GEOM);

Related

Wrong plan when inner-joining a view/subquery that has left join

I'm trying to build a query that inner joins a view (which exists for reusability), but apparently the fact that this view has an internal left join is somehow messing up the optimizer, and I can't really understand why (indices statistics are updated).
Below is an MCVE. It's actually very simple. You can picture it as a simple customer (B) - order (C) design where customer's address (optional) is in another table (A). And then we have a view to join the customer to it's address (vw_B).
Metadata and example data:
create table A (
id int not null,
fieldA char(10) not null,
constraint pk_A primary key (id)
);
create table B (
id int not null,
fieldB char(10) not null,
idA int,
constraint pk_B primary key (id),
constraint fk_A foreign key (idA) references A (id)
);
create view VW_B as
select b.*, a.fieldA from B
left join A on a.id = b.idA;
create table C (
id int not null,
mydate date not null,
idB int not null,
constraint pk_C primary key (id),
constraint fk_B foreign key (idB) references B (id)
);
create index ix_C on C (mydate);
insert into A (id, fieldA)
with recursive n as (
select 1 as n from rdb$database
union all
select n.n + 1 from n
where n < 10
)
select n.n, 'A' from n;
SET STATISTICS INDEX PK_A;
insert into B (id, fieldB, idA)
with recursive n as (
select 1 as n from rdb$database
union all
select n.n + 1 from n
where n < 100
)
select n.n, 'B', IIF(MOD(n.n, 5) = 0, null, MOD(n.n, 10)+1) from n;
SET STATISTICS INDEX PK_B;
SET STATISTICS INDEX FK_A;
insert into C (id, mydate, idB)
with recursive n as (
select 1 as n from rdb$database
union all
select n.n + 1 from n
where n < 1000
)
select n.n, cast('01.01.2020' as date) + 100*rand(), mod(n.n, 100)+1 from n;
SET STATISTICS INDEX PK_C;
SET STATISTICS INDEX FK_B;
SET STATISTICS INDEX IX_C;
With this design, I want to have a query that can join all tables in such a way that I can efficiently search orders by date (c.mydate) or any indexed customer information (table B). The obvious choice is an inner join between B and C, and it works fine. But if I want to add customer's address to the result, by using vw_B instead of B, the optimizer no longer selects the best plan.
Here are some queries to show this:
Manually joining all tables and filtering by date. Optimizer works fine.
select c.*, b.fieldB, a.fieldA from C
inner join B on b.id = c.idB
left join A on a.id = b.idA
where c.mydate = '01.01.2020'
PLAN JOIN (JOIN (C INDEX (IX_C), B INDEX (PK_B)), A INDEX (PK_A))
Reusing vw_B to have A table joined automatically. Optimizer selects a NATURAL plan on (VW_B B).
select c.*, b.fieldB, b.fieldA from C
inner join VW_B b on b.id = c.idB
where c.mydate = '01.01.2020'
PLAN JOIN (JOIN (B B NATURAL, B A INDEX (PK_A)), C INDEX (FK_B, IX_C))
Why does that happen? I thought these two queries should produce the exact same operation in the engine. Now, this is a very simple MVCE, and I have much more complex views that are very reusable, and with larger tables joining with those views is causing performance issues.
Do you have any suggestions to improve performance/PLAN selection, but preserving the convenience of reusability that views provide?
Server version is WI-V3.0.4.33054.
The Firebird optimizer is not intelligent enough to consider the queries equivalent.
Your query with view is equivalent to:
select c.*, b.fieldB, a.fieldA from C
inner join (B left join A on a.id = b.idA)
on b.id = c.idB
where c.mydate = '01.01.2020'
This will produce (almost) the same plan. So, the problem is not with the use of views or not itself, but with how table expressions are nested. This changes how they are evaluated by the engine, and which reordering of joins the engine thinks are possible.
As BrakNicku indicated in the comments, there is no general solution for this.

SELECT JOIN Table Operations

In my query below, I joined tables b, c, d to a by the base_id column.
However, I need to do some operations in my SELECT statement.
Since b, c, and d are not joined to each other,
is my (a.qty - (b.qty - (c.qty + d.qty))) formula computing only those tables that have the same base_id column?
SELECT (a.qty - (b.qty - (c.qty + d.qty))) AS qc_in
FROM receiving a
LEFT JOIN (
SELECT SUM(qty) AS qty, base_id
FROM quality_control bb
WHERE location_to = 6
AND is_canceled = 0
GROUP BY base_id
) b
ON b.base_id = a.base_id
LEFT JOIN (
SELECT SUM(qty) AS qty, base_id
FROM quality_control ba
WHERE location_from = 6
AND is_canceled = 0
GROUP BY base_id
) c
ON c.base_id = a.base_id
LEFT JOIN (
SELECT SUM(qty) AS qty, base_id
FROM issuance
WHERE location_from = 6
AND is_canceled = 0
GROUP BY base_id
) d
ON d.base_id = a.base_id
WHERE a.is_canceled = 0
I think you're confused by how joining works (if I'm reading the question correctly). If you have:
select *
from table1 a
join table2 b
on a.Id = b.Id
join table3c
on a.Id = c.Id
Then yes a is joined to b, and b is joined to c, but that also means that a is joined to c. One way to think of it is as one giant in-memory table that has all of the a columns then all of the b columns and then all of the c columns in the one result.
If a.Id is 1, and you select the row from b where Id is the same (1) and then you join to c where the Id is the same as a (1), then a, b and c all have the same id.
So yes, (a.qty - (b.qty - (c.qty + d.qty))) will only be doing that calculation for rows where a, b, c and d all have the same base_id .
Surely. In the nested statements you join all the tables on base_id. That means, as the result of those joins you will get a huge table containing columns from all of the joined tables with a common column you join on (base_id in your case).

Joining tables based on value

I'm having some hard time doing the join function on those two tables. I have simplified the example dataset as there are additional where-clauses involved for the first table however that doesn't seem to be a problem.
I would write the query for joining the two tables below:
select a.prod_code, a.prod_name, b.ref_value from Product_code a
left join Product_reference b on a.prod_code = b.pref_code
where a.prod_code <> 'CURTAIN' and b.ref_value = 'MAN'
The problem I'm facing is that I want to join tables kind of conditionally. I.e. if the ref_type value is 'MAN' in Product_reference table, I do want to join it, otherwise not.
For an example this query would not include "Chair" in the result as it does not have an ref_type 'MAN' available in the "Product_name". What I'd need though is still show it in the query result, just without joined value from the Product_reference table (given that value with ref_type 'MAN' does not exist for it), not leave it out altogether.
Meanwhile Product_name table record 'CURTAIN' should be left off (regardless if Product_reference ref_type 'MAN' exists or not)
Any recommendations?
Product_code
prod_code prod_name
A Table
B Chair
C Window
D Door
E Curtain
Product_reference
pref_code ref_type ref_value
A MAN x
A AUTO y
B AUTO z
C AUTO z1
C MAN x1
D AUTO zxc
E AUTO abc
E MAN cba
Move b.ref_value = 'MAN' to the join predicate:
SELECT a.prod_code, a.prod_name, b.ref_value
FROM Product_code a
LEFT JOIN Product_reference b ON a.prod_code = b.pref_code AND b.ref_value = 'MAN'
WHERE a.prod_code <> 'CURTAIN'
This will accomplish what you want, which is only left joining the data from table b where b.ref_value = 'MAN', instead of removing all other rows from the result set altogether.
Side note, thanks for including your query and sample data in your very well made question. We appreciate it.
you could use a inner join on the distinct product that have 'MAN'
select
a.prod_code
, a.prod_name
, b.ref_value
from Product_code a
inner join (
select distinct pref_code
from Product_reference
where ref_type = 'MAN') t2 on t2.pref_code = a.prod_code
and a.prod_code <> 'CURTAIN'

SQL query construction: checking if query result is subset of another

Hi Guys I have a table relation which works like this (legacy)
A has many B and B has many C;
A has many C as well
Now I am having trouble coming up with a SQL which will help me to get all B (Id of B to make it simple) mapped to certain A(by Id) AND any B which has a collection of C that's a subset of Cs of that A.
I have failed to come up with a decent sql specially for the second part and was wondering if I can get any tips / suggestions re how I can do that.
Thanks
EDIT:
Table A
Id |..
------------
1 |..
Table B
Id |..
--------------
2 |..
Table A_B_rel
A_id | B_id
-----------------
1 | 2
C is a strange table. The data of C (single column) is actually just duped in 2 rel table for A and B. so its like this
Table B_C_Table
B_Id| C_Value
-----------------
2 | 'Somevalue'
Table A_C_Table
A_Id| C_Value
-------------
1 | 'SomeValue'
So I am looking for Bs the C_Values of which are subset of certain A_C_Values.
Yes, the second part of your problem is a bit tricky. We've got B_C_Table on the one hand, and a subset of A_C_Table where A_ID is a specific ID, on the other.
Now, if we use an outer join, we'll be able to see which rows in B_C_Table have no match in A_C_Table:
SELECT *
FROM B_C_Table bc
LEFT JOIN A_C_Table ac ON bc.C_Value = ac.C_Value AND ac.A_ID = #A_ID
Note that it is important to put the ac.A_ID = #A_ID into the ON clause rather than into WHERE, because in the latter case we would be filtering out non-matching rows of #A_ID, which is not what we want.
The next step (to achieving the final query) would be to group rows by B and count rows. Now, we will calculate both the total number of rows and the number of matching rows.
SELECT
bc.B_ID,
COUNT(*) AS TotalCount,
COUNT(ac.A_ID) AS MatchCount
FROM B_C_Table bc
LEFT JOIN A_C_Table ac ON bc.C_Value = ac.C_Value AND ac.A_ID = #A_ID
GROUP BY bc.B_ID
As you can see, to count matches, we simply count ac.A_ID values: in case of no match the corresponding column will be NULL and thus not counted. And if indeed some rows in B_C_Table do not match any rows in the subset of A_C_Table, we will see different values of TotalCount and MatchCount.
And that logically leads us towards the final step: comparing those counts. (For, obviously, if we can obtain values, we can also compare them.) But not in the WHERE clause, of course, because aggregate functions aren't allowed in WHERE. It's the HAVING clause that is used to compare values of grouped rows, including aggregated values too. So...
SELECT
bc.B_ID,
COUNT(*) AS TotalCount,
COUNT(ac.A_ID) AS MatchCount
FROM B_C_Table bc
LEFT JOIN A_C_Table ac ON bc.C_Value = ac.C_Value AND ac.A_ID = #A_ID
GROUP BY bc.B_ID
HAVING COUNT(*) = COUNT(ac.A_ID)
The count values aren't really needed, of course, and when you drop them you will be able to UNION the above query with the one selecting B_ID from A_B_rel:
SELECT B_ID
FROM A_B_rel
WHERE A_ID = #A_ID
UNION
SELECT bc.B_ID
FROM B_C_Table bc
LEFT JOIN A_C_Table ac ON bc.C_Value = ac.C_Value AND ac.A_ID = #A_ID
GROUP BY bc.B_ID
HAVING COUNT(*) = COUNT(ac.A_ID)
Sounds like you need to think in terms of double negation, i.e. there should not exist any B_C that does not have a matching A_C (and I'm guessing there should be at least one B_C).
So, try something like
select B.B_id
from Table_B B
where exists (select 1 from B_C_Table BC
where BC.B_id = B.B_id)
and not exists (select 1 from B_C_Table BC
where BC.B_id = B.B_id
and not exists(select 1 from B_C_Table AC
join A_B_Rel ABR on AC.A_id = ABR.A_id
where ABR.B_id = B.B_id
and BC.C_Value = AC.C_Value))
Perhaps this is what you're looking for:
SELECT B_id
FROM A_B_rel
WHERE A_id = <A ID>
UNION
SELECT a.B_Id
FROM B_C_Table a
LEFT JOIN A_C_Table b ON a.C_Value = b.C_Value AND b.A_Id = <A ID>
GROUP BY a.B_Id
HAVING COUNT(CASE WHEN b.A_Id IS NULL THEN 1 END) = 0
The first SELECT gets all B's which are mapped to a particular A (<A ID> being the input parameter for the A ID), then we tack onto that result set any additional B's whose entire set of C_Value's are within the subset of the C_Value's of the particular A (again, <A ID> being the input parameter).

Combining tables SQL Server 2005

Table 1: LocID, Prod_ID, Metric_ID, Metric_Data
Table 2: LocID, Metric_ID, Metric_Data
I need a Result Table as:
LocID
Prod_ID
Metric_ID
Metric_Data with the following conditions being met:
When Metric_ID's match, the Metric_Data will be added
When Metric_ID do not match, the relevant Metric_Data will be shown (meaning the one which has value)
Please note only a some common and some different Metric_ID's exist in Table 1 and Table 2.
How do you generate this 3rd table? I have tried all kinds of joins - full, left, right, etc.
EDIT
select
A.LocID,
A.Prod_ID,
B.Metric_ID,
coalesce(C.Metric_Data + D.Metric_Data, C.Metric_Data, D.Metric_Data) Metric_Data
from (
select LocID, Prod_ID from table1 group by LocID, Prod_ID) A
inner join (
select LocID, Metric_ID from table1 group by LocID
union
select LocID, Metric_ID from table2 group by LocID) B on A.LocID = B.LocID
left join table1 C on C.LocID = A.LocID and C.Prod_ID = A.Prod_ID and C.Metric_ID = B.Metric_ID
left join table2 D on D.LocID = A.LocID and D.Metric_ID = B.Metric_ID
Notes:
A: produces all the location and ProdID combinations
B: produces, for each location, all the possible MetricIDs from both tables
C and D: left joins to the data tables to get the Metric Data
Coalesce: returns either C + D, or if one of them is null, return the other
select
coalesce(a.LocID, b.LocID) LocID,
a.Prod_ID,
coalesce(a.Metric_ID, b.Metric_ID) Metric_ID,
coalesce(a.Metric_Data + b.Metric_Data, a.Metric_Data, b.Metric_Data) Metric_Data
from table1 a
full outer join table2 b
on a.LocID = b.LocID and a.Metric_ID = b.Metric_ID
This assumes
You are matching by the tuple (LocID, Metric_ID)
It is possible for either A or B not to have (LocID,Metric_ID) that exists in the other
The result of Metric_Data is either A+B (if both exist), or A or B if only one exists for a (LocID, Metric_ID) combination