How to get max of "not exists" query - sql

I'm not super great with SQL, but I'm using it for a project. Below is the query I would like to write, but of course it isn't valid SQL
select * from a
where not exists (
select * from b
where a.name = b.name) common
where a.id > max(common.id)
My goal is to get the rows in a that do not join with those in b, but only those with a greater id than any of the ones that do join. The point of this is so that I can begin filling a database with values, stop, and then continue later where I left off.
I'm using sqlite with python; I know I could do two queries with python, but I'm guessing there is a way to do it with SQL (and I'm assuming that's 'better')
Dump:
BEGIN TRANSACTION;
CREATE TABLE "a" (
`Id` INTEGER NOT NULL UNIQUE,
`Name` TEXT,
PRIMARY KEY(`Id`)
);
INSERT INTO a VALUES(16,'Bob');
INSERT INTO a VALUES(17,'George');
INSERT INTO a VALUES(18,'Jimmy');
INSERT INTO a VALUES(19,'Billy');
INSERT INTO a VALUES(20,'Johnny');
INSERT INTO a VALUES(21,'James');
INSERT INTO a VALUES(22,'Bart');
CREATE TABLE "b" (
`Id` INTEGER NOT NULL UNIQUE,
`Name` TEXT NOT NULL,
PRIMARY KEY(`Id`)
);
INSERT INTO b VALUES(16,'Bob');
INSERT INTO b VALUES(19,'Billy');
COMMIT;
There are other columns, so these aren't identical tables, but the query should get Johnny, James, and Bart from table a.

I propose:
make a convenient common table expression for the inner join (first line)
select all from a which are not in ab (second line and fourth line)
restrict according to desired condition (third line)
Note that the ids from ab are identical to those from a which join with b. So their max is the max from ids in a which join.
I simply output *, using only the desired fields is possible should be easy.
Code:
with ab(id, field) as (select id, a.name from a join b using(id))
select * from a
where id > (select max(id) from ab)
except select * from ab;
Output:
20|Johnny
21|James
22|Bart

The following gets all rows from a that are bigger than the largest corresponding value in row b:
select a.*
from a
where a.id > (select max(b.id) from b where a.a_field = b.b_field)
It assumes that at least one record matches in b.
If you want to include all values, even when none exist, then I think this will do:
select a.*
from a
where a.id > all (select b.id from b where a.a_field = b.b_field)

Related

Optimizing sql query: check for all rows in table B if any rows in table C reference the same row in table A

I have 3 tables, A, B and C structured like this
CREATE TABLE a (
id SERIAL NOT NULL PRIMARY KEY
);
CREATE TABLE b (
id SERIAL NOT NULL PRIMARY KEY,
a_id INT REFERENCES a(id) ON DELETE CASCADE
);
CREATE TABLE c (
id SERIAL NOT NULL PRIMARY KEY,
a_id INT REFERENCES a(id) ON DELETE CASCADE
);
Where the relationships are many-to-one. What i want is, for every row in table b, i want to check if any row in table c has a reference to the same row in table a. Now, I already have the query
SELECT
b.id,
true
FROM
b
WHERE EXISTS (
SELECT 1
FROM c
WHERE b.a_id = c.a_id
)
UNION
SELECT
b.id,
false
FROM
b
WHERE NOT EXISTS (
SELECT 1
FROM c
WHERE b.a_id = c.a_id
)
ORDER BY id
Though I am not certain, I think this is doing double work, and going through the table twice, and I am wondering how I could optimize it to only traverse the table once.
Is it possible with a simple query, or do I have to do anything complex?
Simply move the EXISTS clause into your SELECT clause.
SELECT
b.id,
EXISTS (SELECT null FROM c WHERE c.a_id = b.a_id) AS c_exists
FROM b;
The same with an IN clause, which I prefer for being even a tad simpler:
SELECT
id,
a_id IN (SELECT c.a_id FROM c) AS c_exists
FROM b;
This can be done with a subquery, a left join, and a case.
The subquery gets you a list of distinct c.a_id values.
SELECT DISTINCT a_id FROM c;
Then do this
SELECT b.id,
CASE WHEN distinct_ids.a_id IS NULL THEN 'false'
ELSE 'true' END has_c_row
FROM b
LEFT JOIN (
SELECT DISTINCT a_id FROM c;
) distinct_ids ON b.a_id = distinct_ids.a_id
This shape of query is called an antijoin or IS NULL ... LEFT JOIN. It detects the rows in the first table that don't match rows in the second table.
The subquery gives us a view of the data in table c with at most one row per each distinct a_id value. Without the subquery, we might get duplicate rows in the result query.
This eliminates your WHERE EXISTS correlated subqueries; even though PostgreSQL's query planner is pretty smart, sometimes it does the slow thing with subqueries like that.
If it is still too slow for you, create these indexes on the a_id columns.
ALTER TABLE b ADD INDEX a_id (a_id);
ALTER TABLE c ADD INDEX a_id (a_id);
i think i understand what you are after
this is how i would do it
SELECT b.id, ISNULL(res.result,0) as result
FROM b
LEFT JOIN (
SELECT c.id, 1 as result
FROM c
INNER JOIN a on a.id = c.id
) res on b.id = res.id
i dont think you need to worry about distinct if they are all unique ids

SQL query to append values not contained in second table

I have table A and table B with different number of columns but both containing a column with IDs. Table A contains more complete list of IDs and table B contains some of the IDs from the table A.
I would like to return resulting table B with original information plus appended IDs that are missing in B but contained in A. For these appended rows, other columns should be blank while column with IDs in B should just contain missing ID values.
Simple solution UNION ALL, with NOT EXISTS:
select b.id, b.c1, ..., b.cn
from b
UNION ALL
select distinct a.id, null, ..., null -- should be same number of columns as in the above select
from a
where not exists (select 1 from b where b.id = a.id)
I think you described left join:
select *
from b left join
a
using (id)

Postgis/SQL Select tuples such that the first tuple item is unique and the items geometries intersect

This question is particularly for Postgres 9.4
Lets say I have two tables:
CREATE TABLE A(id INT);
CREATE TABLE B(id INT);
I'd like to have all tuples (A, B) with a certain condition such that
among selected tuples all have different A column:
SELECT DISTINCT ON (A.id) A.id, B.id WHERE condition(A,B);
However DISTINCT ON will perform sorting in memory after all the tuples have been selected and I will like to not select tuples with duplicate A.id at all.
How can this be done in an efficient way?
EDIT:
both A and B have unique ids
EDIT2:
Here is the complete setup:
CREATE EXTENSION postgis;
DROP TABLE A;
DROP TABLE B;
CREATE TABLE A(shape Geometry, id INT);
CREATE TABLE B(shape Geometry, id INT, kind INT);
CREATE INDEX ON A USING GIST (shape);`
I would like to do the following:
SELECT A.id, B.id FROM A, B
WHERE B.id = (SELECT B.id FROM B WHERE
ST_Intersects(A.shape, B.shape)
AND ST_Length(ST_Intersection(A.shape, B.shape)) / ST_Length(A.shape) >= 0.5 AND B.kind != 1 LIMIT 1)`
which works (I believe), however is not necessarily the most efficient way. The table A has orders of magnitude more rows than table B. So
I am not even sure if the GiST index is right.
I am also aware that the order of arguments in ST_Intersects can have a significant effect on run time. What should the correct order be?
If you want just one row for each "A", you can use a correlated subquery (or lateral join):
select a.id,
(select b.id
from b
where condition(a, b)
limit 1
) as b_id
from a;
This should stop testing for rows from b when the first one is found -- which I imagine is the best approach performance-wise.
If none are found, you will get a NULL value. You can wrap this in a subquery and filter out NULLs.
Try something like:
WITH distinct_a as (
SELECT DISTINCT a.id
FROM A)
SELECT A.id, B.id
FROM distinct_a, B
WHERE condition(A,B)
The CTE (WITH ...) will select all distinct values first. Then selected values will be used in the next query.

Insert new/Changes from one table to another in Oracle SQL

I have two tables with same number of columns :-Table A and Table B
Every day I insert data from Table B to Table A. now the insert query is working
insert into table_a (select * from table_b);
But by this insert the same data which was inserted earlier that is also getting inserted. I only want those rows which are new or are changed from the old data. How can this be done ?
You can use minus:
insert into table_a
select *
from table_b
minus
select *
from table_a;
This assumes that by "duplicate" you mean that all the columns are duplicated.
If you have a timestamp field, you could use it to limit the records to those created after the last copy.
Another option is, assuming that you have an primary key (id column in my example) that you can use to know whether a record has already been copied, you can create a table c (with the same structure as a and b) and do the following:
insert into table c
select a.* from table a
left join table b on (a.id=b.id)
where b.id is null;
insert into table b select * from table c;
truncate table c;
You need to adjust this query in order to use the actual primary key.
Hope this helps!
If the tables have a primary or unique key, then you could leverage that in an anti-join:
insert into table_a
select *
from table_b b
where not exists (
select null
from table_a a
where
a.pk_field_1 = b.pk_field_1 and
a.pk_field_2 = b.pk_field_2
)
You don't say what your key is. Assuming you have a key ID, that is you only want ID's that are not already in Table A. You can also use Merge-Statement for this:
MERGE INTO A USING B ON (A.ID = B.ID)
WHEN NOT MATCHED THEN INSERT (... columns of A) VALUES (... columns of B)

Issues with SQL Select utilizing Except and UNION All

Select *
From (
Select a
Except
Select b
) x
UNION ALL
Select *
From (
Select b
Except
Select a
) y
This sql statement returns an extremely wrong amount of data. If Select a returns a million, how does this entire statement return 100,000? In this instance, Select b contains mutually exclusive data, so there should be no elimination due to the except.
As already stated in the comment, EXCEPT does an implicit DISTINCT, according to this and the ALL in your UNION ALL cannot re-create the duplicates. Hence you cannot use your approach if you want to keep duplicates.
As you want to get the data that is contained in exactly one of the tables a and b, but not in both, a more efficient way to achieve that would be the following (I am just assuming the tables have columns id and c where id is the primary key, as you did not state any column names):
SELECT CASE WHEN a.id IS NULL THEN 'from b' ELSE 'from a' END as source_table
,coalesce(a.id, b.id) as id
,coalesce(a.c, b.c) as c
FROM a
FULL OUTER JOIN b ON a.id = b.id AND a.c = b.c -- use all columns of both tables here!
WHERE a.id IS NULL OR b.id IS NULL
This makes use of a FULL OUTER JOIN, excluding the matching records via the WHERE conditions, as the primary key cannot be null except if it comes from the OUTER side.
If your tables do not have primary keys - which is bad practice anyway - you would have to check across all columns for NULL, not just the one primary key column.
And if you have records completely consisting of NULLs, this method would not work.
Then you could use an approach similar to your original one, just using
SELECT ...
FROM a
WHERE NOT EXISTS (SELECT 1 FROM b WHERE <join by all columns>)
UNION ALL
SELECT ...
FROM b
WHERE NOT EXISTS (SELECT 1 FROM a WHERE <join by all columns>)
If you're trying to get any data that is in one table and not in the other regardless of which table, I would try something like the following:
select id, 'table a data not in b' from a where id not in (select id from b)
union
select id, 'table b data not in a' from b where id not in (select id from a)