How to find tree nodes that don't have child nodes - sql

Firebird Db stores chart accounts records in table:
CREATE TABLE CHARTACC
(
ACCNTNUM Char(8) NOT NULL, -- Account ID (Primary Key)
ACCPARNT Char(8), -- Parent ID
ACCCOUNT Integer, -- account count
ACCORDER Integer, -- order of children in nodes
ACCTITLE varchar(150),
ACDESCRP varchar(4000),
DTCREATE timestamp -- date and time of creation
)
I must write query which selects from table only last nodes e.g.nodes which haven't child nodes(child2, child3, subchild1, subchild2, subchild3 and subchild4).

The not in approach suggested by Jerry typically works quite slow in Interbase/Firebird/Yaffil/RedDatabase family, no indices used, etc.
Same goes for another possible representation Select X from T1 where
NOT EXISTS ( select * from t2 where t2.a = t1.b) - it can turn out really slow too.
I agree that those queries better represent what human wanted and hence are more readable, but still they're not recommended on Firebird. I was badly bitten in 1990-s when doing Herbalife-like app, I chosen this type of request wrapped in a loop to do monthly bottom-up tallying - update ... where not exists ... - and every iteration scaled as o(n^2) in Interbase 5.5. Granted, Firebird 3 made a long way since then, but this "direct" approach is still not recommended.
More SQL-traditional and FB-friendly way to express it, albeit less direct and harder to read, would be Select t1.x from t1 LEFT JOIN t2 on t1.a=t2.b WHERE t2.y IS NULL

Your query needs to work something like:
select * from CHARTACC where ACCNTNUM not in (select ACCPARNT from CHARTACC)
To put it into terms, select items from this table where its identifier is not found in the same table anywhere in its parent field.

Related

st_within as a condition of insert

I have built myself a tracker, and as part of the spec I gave myself for security reasons, i dont want people knowing where I leave my car overnight.
SO I have a concept of exclusion zones, I have the web map only showing data outside said exclusion zones, but I also only want to save data when transmitted that isnt within an exclusion zone (there can be more than one so am thinking subquery
can anyone help?
It is possible, if not preferrable that this be a stored proc, any ideas (I am useless when it comes to subqueries hence asking)
the SQL I am using to get the data (retrospective exclusion zones) is thus
SELECT geom
FROM public.data
WHERE layer = %layer_id% and not exists(
SELECT *
FROM public.exclusion_zone
WHERE layer = %layer_id% and ST_CONTAINS(the_geom, geom))
For example, this code returns all geometries (say, points) from public.data, which are not completely inside geometries (say, polygons) from public.exclusion_zone:
SELECT *
FROM public.data
WHERE the_geom NOT IN (
SELECT d.the_geom
FROM public.data d, public.exclusion_zone e
WHERE ST_Within (d.the_geom, e.the_geom)
);
or even better (assuming that operations with integer IDs are faster than to compare geometries):
SELECT * FROM public.data
WHERE id NOT IN (
SELECT d.id
FROM public.data d, public.exclusion_zone e
WHERE ST_Within (d.the_geom, e.the_geom)
);
See more: ST_Within

Does EXCEPT execute faster than a JOIN when the table columns are the same

To find all the changes between two databases, I am left joining the tables on the pk and using a date_modified field to choose the latest record. Will using EXCEPT increase performance since the tables have the same schema. I would like to rewrite it with an EXCEPT, but I'm not sure if the implementation for EXCEPT would out perform a JOIN in every case. Hopefully someone has a more technical explanation for when to use EXCEPT.
There is no way anyone can tell you that EXCEPT will always or never out-perform an equivalent OUTER JOIN. The optimizer will choose an appropriate execution plan regardless of how you write your intent.
That said, here is my guideline:
Use EXCEPT when at least one of the following is true:
The query is more readable (this will almost always be true).
Performance is improved.
And BOTH of the following are true:
The query produces semantically identical results, and you can demonstrate this through sufficient regression testing, including all edge cases.
Performance is not degraded (again, in all edge cases, as well as environmental changes such as clearing buffer pool, updating statistics, clearing plan cache, and restarting the service).
It is important to note that it can be a challenge to write an equivalent EXCEPT query as the JOIN becomes more complex and/or you are relying on duplicates in part of the columns but not others. Writing a NOT EXISTS equivalent, while slightly less readable than EXCEPT should be far more trivial to accomplish - and will often lead to a better plan (but note that I would never say ALWAYS or NEVER, except in the way I just did).
In this blog post I demonstrate at least one case where EXCEPT is outperformed by both a properly constructed LEFT OUTER JOIN and of course by an equivalent NOT EXISTS variation.
In the following example, the LEFT JOIN is faster than EXCEPT by 70%
(PostgreSQL 9.4.3)
Example:
There are three tables. suppliers, parts, shipments.
We need to get all parts not supplied by any supplier in London.
Database(has indexes on all involved columns):
CREATE TABLE suppliers (
id bigint primary key,
city character varying NOT NULL
);
CREATE TABLE parts (
id bigint primary key,
name character varying NOT NULL,
);
CREATE TABLE shipments (
id bigint primary key,
supplier_id bigint NOT NULL,
part_id bigint NOT NULL
);
Records count:
db=# SELECT COUNT(*) FROM suppliers;
count
---------
1281280
(1 row)
db=# SELECT COUNT(*) FROM parts;
count
---------
1280000
(1 row)
db=# SELECT COUNT(*) FROM shipments;
count
---------
1760161
(1 row)
Query using EXCEPT.
SELECT parts.*
FROM parts
EXCEPT
SELECT parts.*
FROM parts
LEFT JOIN shipments
ON (parts.id = shipments.part_id)
LEFT JOIN suppliers
ON (shipments.supplier_id = suppliers.id)
WHERE suppliers.city = 'London'
;
-- Execution time: 3327.728 ms
Query using LEFT JOIN with table, returned by subquery.
SELECT parts.*
FROM parts
LEFT JOIN (
SELECT parts.id
FROM parts
LEFT JOIN shipments
ON (parts.id = shipments.part_id)
LEFT JOIN suppliers
ON (shipments.supplier_id = suppliers.id)
WHERE suppliers.city = 'London'
) AS subquery_tbl
ON (parts.id = subquery_tbl.id)
WHERE subquery_tbl.id IS NULL
;
-- Execution time: 1136.393 ms

Optimising CTE for recursive queries

I have a table with self join. You can think of the structure as standard table to represent organisational hierarchy. Eg table:-
MemberId
MemberName
RelatedMemberId
This table consists of 50000 sample records. I wrote CTE recursive query and it works absolutely fine. However the time it takes to process just 50000 records is round about 3 minutes on my machine (4GB Ram, 2.4 Ghz Core2Duo, 7200 RPM HDD).
How can I possibly improve the performance because 50000 is not so huge number. Over time it will keep on increasing. This is the query which is exactly what I have in my Stored Procedure. The query's purpose is to select all the members that come under a specific member. Eg. Under Owner of the company each and every person comes. For Manager, except Owner all of the records gets returned. I hope you understand the query's purpose.
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
Alter PROCEDURE spGetNonVirtualizedData
(
#MemberId int
)
AS
BEGIN
With MembersCTE As
(
Select parent.MemberId As MemberId, 0 as Level
From Members as parent Where IsNull(MemberId,0) = IsNull(#MemberId,0)
Union ALL
Select child.MemberId As MemberId , Level + 1 as Level
From Members as child
Inner Join MembersCTE on MembersCTE.MemberId = child.RelatedMemberId
)
Select Members.*
From MembersCTE
Inner Join Members On MembersCTE.MemberId = Members.MemberId
option(maxrecursion 0)
END
GO
As you can see to improve the performance, I have even made the Joins at the last step while selecting records so that all unnecessary records do not get inserted into temp table. If I made joins in my base step and recursive step of CTE (instead of Select at the last step) the query takes 20 minutes to execute!
MemberId is primary key in the table.
Thanks in advance :)
In your anchor condition you have Where IsNull(MemberId,0) = IsNull(#MemberId,0) I assume this is just because when you pass NULL as a parameter = doesn't work in terms of bringing back IS NULL values. This will cause a scan rather than a seek.
Use WHERE MemberId = #MemberId OR (#MemberId IS NULL AND MemberId IS NULL) instead which is sargable.
Also I'm assuming that you can't have an index on RelatedMemberId. If not you should add one
CREATE NONCLUSTERED INDEX ix_name ON Members(RelatedMemberId) INCLUDE (MemberId)
(though you can skip the included column bit if MemberId is the clustered index key as it will be included automatically)

sql join question

I have the following tables
nid timestamp title
82 1245157883 Home
61 1245100302 Minutes
132 1245097268 Sample Form
95 1245096985 Goals & Objectives
99 1245096952 Members
AND
pid src dst language
70 node/82 department/34-section-2
45 node/61/feed department/22-section-2/feed
26 node/15 department/department1/15-department1
303 node/101 department/101-section-4
These are fragments of the tables, and is missing the rest of the data (they are both quite large), but I am trying to join the dst column from the second table into the first one. They should match up on their "nid", but the second table has node/[nid] which makes this more complicated. I also want to ignore the ones that end in "feed" since they are not needed for what I am doing.
Much thanks
EDIT: I feel bad for not mentioning this, but the first table is an sql result from
select nid, MAX(timestamp) as timestamp, title from node_revisions group by nid ORDER BY timestamp DESC LIMIT 0,5
The second table has the name "url_alias"
try
select * from table1 inner join table2 on src=concat('node/',nid)
Edit
edited to reflect change in OP
select `nid`, MAX(`timestamp`) as `timestamp`, `title` from `node_revisions` inner join `url_alias` on `src`=concat('node/',`nid`) group by `nid` ORDER BY `timestamp` DESC LIMIT 0,5
I don't know what database you are using. However, I suggest you write a parsing function that returns the nid from that column. Then, you can have this kind of query (assuming GET_NID is the function you defined):
SELECT * from T1, T2
WHERE T1.nid = GET_NID( T2.node)
You have a few options.
write a function that converts src to an nid and join on t1.nid = f(t2.src) -- you didn't say what DBMS you use, but most have a way to do that. It will be slow, but that depends on how big the tables are.
Similar to that, make a view that has a computed field using that function -- same speed, but might be easier to understand.
Create a new nid field in t2 and use the function to populate it. Make insert and update triggers to keep it up to date, then join on that. This is better if you query this frequently.
Convert t2 so that it has a nid field and compute the src from that and another field that is a template that the nid needs to be inserted into.
I'd pull the node id in the second table into a separate column. Otherwise any attempt to join the two tables will result in a table scan with some processing on the src field (I assume you meant the src field and not the dst field) and performance will be problematic.
SELECT *
FROM (SELECT *, 'node/' + nid AS src FROM table1) t1
INNER JOIN table2 t2
ON t1.src = t2.src
You haven't specified with DBMS are you using. Most engines support the SQL-99 standard SIMILAR TO clause which is using regular expression for matching. Some engines also implement this, but use some other keywords instead of SIMILAR TO.
FirebirdSQL:
http://wiki.firebirdsql.org/wiki/index.php?page=SIMILAR+TO
PostgreSQL:
http://www.network-theory.co.uk/docs/postgresql/vol1/SIMILARTORegularExpressions.html
MySQL:
http://dev.mysql.com/doc/refman/5.0/en/regexp.html
Depending on the scenario you want to this for (if for example you are regularly going to be performing this JOIN and your 2nd table is rather large) you may want to look into a Materialized View.
Write a function that performs all the logic to extract the nid into a separate column. Aside from initial m-view creation, the function will only need to run when the basetable changes (insert, update, delete) compared to running the function against every row each time you query.
This allows a fairly simple join to the materialized view with standard benefits of tables such as Indexing.
NB: looks like I was beaten to it while writing :)

Best way to perform dynamic subquery in MS Reporting Services?

I'm new to SQL Server Reporting Services, and was wondering the best way to do the following:
Query to get a list of popular IDs
Subquery on each item to get properties from another table
Ideally, the final report columns would look like this:
[ID] [property1] [property2] [SELECT COUNT(*)
FROM AnotherTable
WHERE ForeignID=ID]
There may be ways to construct a giant SQL query to do this all in one go, but I'd prefer to compartmentalize it. Is the recommended approach to write a VB function to perform the subquery for each row? Thanks for any help.
I would recommend using a SubReport. You would place the SubReport in a table cell.
Depending on how you want the output to look, a subreport could do, or you could group on ID, property1, property2 and show the items from your other table as detail items (assuming you want to show more than just count).
Something like
select t1.ID, t1.property1, t1.property2, t2.somecol, t2.someothercol
from table t1 left join anothertable t2 on t1.ID = t2.ID
#Carlton Jenke I think you will find an outer join a better performer than the correlated subquery in the example you gave. Remember that the subquery needs to be run for each row.
Simplest method is this:
select *,
(select count(*) from tbl2 t2 where t2.tbl1ID = t1.tbl1ID) as cnt
from tbl1 t1
here is a workable version (using table variables):
declare #tbl1 table
(
tbl1ID int,
prop1 varchar(1),
prop2 varchar(2)
)
declare #tbl2 table
(
tbl2ID int,
tbl1ID int
)
select *,
(select count(*) from #tbl2 t2 where t2.tbl1ID = t1.tbl1ID) as cnt
from #tbl1 t1
Obviously this is just a raw example - standard rules apply like don't select *, etc ...
UPDATE from Aug 21 '08 at 21:27:
#AlexCuse - Yes, totally agree on the performance.
I started to write it with the outer join, but then saw in his sample output the count and thought that was what he wanted, and the count would not return correctly if the tables are outer joined. Not to mention that joins can cause your records to be multiplied (1 entry from tbl1 that matches 2 entries in tbl2 = 2 returns) which can be unintended.
So I guess it really boils down to the specifics on what your query needs to return.
UPDATE from Aug 21 '08 at 22:07:
To answer the other parts of your question - is a VB function the way to go? No. Absolutely not. Not for something this simple.
Functions are very bad on performance, each row in the return set executes the function.
If you want to "compartmentalize" the different parts of the query you have to approach it more like a stored procedure. Build a temp table, do part of the query and insert the results into the table, then do any further queries you need and update the original temp table (or insert into more temp tables).