Determining whether a foreign key value exists in any of multiple tables that have thousands of records - sql

When the user clicks a delete button in my UI to delete a product I need to do a fast check to see if it is a foreign key in any of the 4 tables (Table1, Table2, Table3, Table4). If it isn't then I can proceed with the delete. If it is in a single one of them I can't delete it.
Some of these tables have thousands of records and I already learned the hard way that using joins is not the best way because the query takes minutes to complete.
I figured union might be the best way but I am wondering if there is a way I can further enhance it. Or even possibly send back which tables it is involved in so I can give the user a descriptive message on why they can't delete the Product.
Here is what I have so far and it is really fast, but returns thousands of 1's when the product exists all over the place. I suppose I can just do a single or default and if not null then don't let them delete.
select 1
from (
select ProductId from Table1
union all
select ProductId from Table2
union all
select ProductId from Table3
union all
select ProductId from Table4
) tbl
where ProductId = 1000

Here is a method with exists and case:
select (case when exists (select 1 from table1 where productId = 1000) then 1
when exists (select 1 from table2 where productId = 1000) then 1
when exists (select 1 from table3 where productId = 1000) then 1
when exists (select 1 from table4 where productId = 1000) then 1
else 0
end)

One way is to not define any "on delete" clauses when creating the foreign keys. Then just go ahead and try to delete the record. If there is no foreign key in any table referencing that record, the delete will succeed and you go about business as normal. If there is a foreign key in any table referencing that record, the delete will fail. Catch the error and try to give the user a meaningful message: "This customer cannot be deleted as they have at least one order still pending."
This way the system itself will perform the check for you, which will assuredly be more efficient than any check you can perform at the SQL level.

Related

Select rows from table where a certain value in a joined table does not exist

I have two tables, playgrounds and maintenance, which are linked with a foreign key. Whenever there is a maintenance on a playground, it will be saved in the table and connected to the respective playground.
Table A (playgrounds):
playground_number
Table B (maintenance):
playground_number (foreign key),
maintenance_type (3 different types),
date
What I now want is to retrieve all the playgrounds on which a certain type of maintenance has NOT been performed yet IN a certain year. For instance all playgrounds that do not have a maintenance_type = 1 in the year 2022 connected yet, although there could be multiple other maintenance_types because they are more frequent.
This is what I have tried (pseudo):
SELECT DISTINCT A.playground_number
FROM table A
JOIN table B ON A.playground_number = B.playground_number (FK)
WHERE NOT EXISTS (SELECT B.maintenance_type FROM table B
WHERE B.maintenance_type = 1 AND year(B.date) = 2022
However this will return nothing as soon as there is only one entry with maintenance_type 1 within the table.
I am struggling with this query for a while, so would appreciate some thoughts :) Many thanks.
You need to correlate the exists subquery to the outer B table. Also, you don't even need the join.
SELECT DISTINCT a.playground_number
FROM table_a a
WHERE NOT EXISTS (
SELECT 1
FROM table_b b
WHERE b.playground_number = a.playground_number AND
b.maintenance_type = 1 AND
YEAR(b.date) = 2022
);
Please consider this. I don't think you need JOIN.
SELECT DISTINCT A.playground_number
FROM table A
WHERE A.playground_number NOT IN (SELECT B.playground_number FROM table B
WHERE B.maintenance_type = 1 AND year(B.date) = 2022)
Please let me know if I understand it incorrectly.

Optimisation of sql query for deleting duplicate items from large table

Could anyone please help me optimise one of the queries which is taking more than 20 minutes to run against 3 Million data.
Table Structure
-----------------------------------------------------------------------------------------
|id [INT Auto Inc]| name_id (uuid) | name (varchar)| city (varchar) | name_type(varchar)|
-----------------------------------------------------------------------------------------
Query
The purpose of the query is to eliminate the duplicate, here duplicate means having same name_id and name.
DELETE
FROM records
WHERE id NOT IN
(SELECT DISTINCT
ON (name_id, name) id
FROM records);
I would write your delete using exists logic:
DELETE
FROM records r1
WHERE EXISTS (SELECT 1 FROM records r2
WHERE r2.name_id = r1.name_id AND r2.name = r2.name AND
r2.id < r1.id);
This delete query will spare the duplicate having the smallest id value. To speed this up, you may try adding the following index:
CREATE INDEX idx ON records (name_id, name, id);
You probably already have a primary key on the identity column, then you can use it to exclude redundant rows by id in the following way:
WITH cte AS (
SELECT MIN(id) AS id FROM records GROUP BY name_id, name)
DELETE FROM records
WHERE NOT EXISTS (SELECT id FROM cte WHERE id=records.id)
Even without the index, this should work relatively fast, probably because of merge join strategy.

Automatically remove a row without foreign references

I am using sqlite3.
I have one "currencies" table, and two tables that reference the currencies table using a foreign key, as follows:
CREATE TABLE currencies (
currency TEXT NOT NULL PRIMARY KEY
);
CREATE TABLE table1 (
currency TEXT NOT NULL PRIMARY KEY,
FOREIGN KEY(currency)
REFERENCES currencies(currency)
);
CREATE TABLE table2 (
currency TEXT NOT NULL PRIMARY KEY,
FOREIGN KEY(currency)
REFERENCES currencies(currency)
);
I would like to make sure that rows in the "currencies" table that are not referenced by any row from "table1" and "table2" will be removed automatically. This should behave like some kind of ref-counted object. When the reference count reaches zero, the relevant row from the "currencies" table should be erased.
What is the "SQL way" to solve this problem?
I am willing to redesign my tables if it could lead to an elegant solution.
I prefer to avoid solutions that require extra work from the application side, or solutions that require periodic cleanup.
Create an AFTER DELETE TRIGGER in each of table1 and table2:
CREATE TRIGGER remove_currencies_1 AFTER DELETE ON table1
BEGIN
DELETE FROM currencies
WHERE currency = OLD.currency
AND NOT EXISTS (SELECT 1 FROM table2 WHERE currency = OLD.currency);
END;
CREATE TRIGGER remove_currencies_2 AFTER DELETE ON table2
BEGIN
DELETE FROM currencies
WHERE currency = OLD.currency
AND NOT EXISTS (SELECT 1 FROM table1 WHERE currency = OLD.currency);
END;
Every time that you delete a row in either table1 or table2, the trigger involved will check the other table if it contains the deleted currency and if it does not contain it, it will be deleted from currencies.
See the demo.
There is no automatic way of doing this. The reverse can be handling using cascading delete foreign key references. The reverse is that when a currency is deleted all related rows are.
You could schedule a job daily running something like:
delete from currencies c
where not exists (select 1 from table1 t1 where t1.currency = c.currency) and
not exists (select 1 from table2 t2 where t2.currency = c.currency);
If you need an automatic way for doing that, then most dbms provide a trigger mechanism. You can create a trigger on update and delete operations that run the folowing query:
you can use a left join for that:
https://www.w3schools.com/sql/sql_join_left.asp
It return a row for all rows from the left table, even if there is no corresponding row in the right table, replacing the rows form the right with null. You can then check a not null right table field for null with is null. This will filter for the rows the have no counterpart in the right table.
For example:
SELECT currencies.currency FROM currencies LEFT JOIN table1 WHERE table1.currency IS NULL
will show the relevant rows for table1.
You can do the same with table two.
This will give you two queries, that shows which rows have no couterpart.
You can then use intersect on the result, so that you have the rows that have not couterpart in either:
SELECT * FROM query1 INTERSECT SELECT * FROM query2
Now you have the list of currencies to be deleted.
You can finish this by using a subqueried delete:
DELETE FROM currencies WHERE currency IN (SELECT ...)

Append Query Doesn't Append Missing Items

I have 2 tables. Table 1 has data from the bank account. Table 2 aggregates data from multiple other tables; to keep things simple, we will just have 2 tables. I need to append the data from table 1 into table 2.
I have a field in table2, "SrceFk". The concept is that when a record from Table1 appends, it will fill the table2.SrceFk with the table1 primary key and the table name. So record 302 will look like "BANK/302" after it appends. This way, when I run the append query, I can avoid duplicates.
The query is not working. I deleted the record from table2, but when I run the query, it just says "0 records appended". Even though the foreign key is not present.
I am new to SQL, Access, and programming in general. I understand basic concepts. I have googled this issue and looked on stackOverflow, but no luck.
This is my full statement:
INSERT INTO Main ( SrceFK, InvoDate, Descrip, AMT, Ac1, Ac2 )
SELECT Bank.ID &"/"& "BANK", Bank.TransDate, Bank.Descrip, Bank.TtlAmt, Bank.Ac1, Bank.Ac2
FROM Bank
WHERE NOT EXISTS
(
SELECT * FROM Main
WHERE Main.SrceFK = Bank.ID &"/"& "BANK"
);
I expect the query to add records that aren't present in the table, as needed.

Delete duplicates with no primary key

Here want to delete rows with a duplicated column's value (Product) which will be then used as a primary key.
The column is of type nvarchar and we don't want to have 2 rows for one product.
The database is a large one with about thousands rows we need to remove.
During the query for all the duplicates, we want to keep the first item and remove the second one as the duplicate.
There is no primary key yet, and we want to make it after this activity of removing duplicates.
Then the Product columm could be our primary key.
The database is SQL Server CE.
I tried several methods, and mostly getting error similar to :
There was an error parsing the query. [ Token line number = 2,Token line offset = 1,Token in error = FROM ]
A method which I tried :
DELETE FROM TblProducts
FROM TblProducts w
INNER JOIN (
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
)Dup ON w.Product = Dup.Product
The preferred way trying to learn and adjust my code with something similar
(It's not correct yet):
SELECT Product, COUNT(*) TotalCount
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC
--
;WITH cte -- These 3 lines are the lines I have more doubt on them
AS (SELECT ROW_NUMBER() OVER (PARTITION BY Product
ORDER BY ( SELECT 0)) RN
FROM Word)
DELETE FROM cte
WHERE RN > 1
If you have two DIFFERENT records with the same Product column, then you can SELECT the unwanted records with some criterion, e.g.
CREATE TABLE victims AS
SELECT MAX(entryDate) AS date, Product, COUNT(*) AS dups FROM ProductsTable WHERE ...
GROUP BY Product HAVING dups > 1;
Then you can do a DELETE JOIN between ProductTable and Victims.
Or also you can select Product only, and then do a DELETE for some other JOIN condition, for example having an invalid CustomerId, or EntryDate NULL, or anything else. This works if you know that there is one and only one valid copy of Product, and all the others are recognizable by the invalid data.
Suppose you instead have IDENTICAL records (or you have both identical and non-identical, or you may have several dupes for some product and you don't know which). You run exactly the same query. Then, you run a SELECT query on ProductsTable and SELECT DISTINCT all products matching the product codes to be deduped, grouping by Product, and choosing a suitable aggregate function for all fields (if identical, any aggregate should do. Otherwise I usually try for MAX or MIN). This will "save" exactly one row for each product.
At that point you run the DELETE JOIN and kill all the duplicated products. Then, simply reimport the saved and deduped subset into the main table.
Of course, between the DELETE JOIN and the INSERT SELECT, you will have the DB in a unstable state, with all products with at least one duplicate simply disappeared.
Another way which should work in MySQL:
-- Create an empty table
CREATE TABLE deduped AS SELECT * FROM ProductsTable WHERE false;
CREATE UNIQUE INDEX deduped_ndx ON deduped(Product);
-- DROP duplicate rows, Joe the Butcher's way
INSERT IGNORE INTO deduped SELECT * FROM ProductsTable;
ALTER TABLE ProductsTable RENAME TO ProductsBackup;
ALTER TABLE deduped RENAME TO ProductsTable;
-- TODO: Copy all indexes from ProductsTable on deduped.
NOTE: the way above DOES NOT WORK if you want to distinguish "good records" and "invalid duplicates". It only works if you have redundant DUPLICATE records, or if you do not care which row you keep and which you throw away!
EDIT:
You say that "duplicates" have invalid fields. In that case you can modify the above with a sorting trick:
SELECT * FROM ProductsTable ORDER BY Product, FieldWhichShouldNotBeNULL IS NULL;
Then if you have only one row for product, all well and good, it will get selected. If you have more, the one for which (FieldWhichShouldNeverBeNull IS NULL) is FALSE (i.e. the one where the FieldWhichShouldNeverBeNull is actually not null as it should) will be selected first, and inserted. All others will bounce, silently due to the IGNORE clause, against the uniqueness of Product. Not a really pretty way to do it (and check I didn't mix true with false in my clause!), but it ought to work.
EDIT
actually more of a new answer
This is a simple table to illustrate the problem
CREATE TABLE ProductTable ( Product varchar(10), Description varchar(10) );
INSERT INTO ProductTable VALUES ( 'CBPD10', 'C-Beam Prj' );
INSERT INTO ProductTable VALUES ( 'CBPD11', 'C Proj Mk2' );
INSERT INTO ProductTable VALUES ( 'CBPD12', 'C Proj Mk3' );
There is no index yet, and no primary key. We could still declare Product to be primary key.
But something bad happens. Two new records get in, and both have NULL description.
Yet, the second one is a valid product since we knew nothing of CBPD14 before now, and therefore we do NOT want to lose this record completely. We do want to get rid of the spurious CBPD10 though.
INSERT INTO ProductTable VALUES ( 'CBPD10', NULL );
INSERT INTO ProductTable VALUES ( 'CBPD14', NULL );
A rude DELETE FROM ProductTable WHERE Description IS NULL is out of the question, it would kill CBPD14 which isn't a duplicate.
So we do it like this. First get the list of duplicates:
SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1;
We assume that: "There is at least one good record for every set of bad records".
We check this assumption by positing the opposite and querying for it. If all is copacetic we expect this query to return nothing.
SELECT Dups.Product FROM ProductTable
RIGHT JOIN ( SELECT Product, COUNT(*) AS Dups FROM ProductTable GROUP BY Product HAVING Dups > 1 ) AS Dups
ON (ProductTable.Product = Dups.Product
AND ProductTable.Description IS NOT NULL)
WHERE ProductTable.Description IS NULL;
To further verify, I insert two records that represent this mode of failure; now I do expect the query above to return the new code.
INSERT INTO ProductTable VALUES ( "AC5", NULL ), ( "AC5", NULL );
Now the "check" query indeed returns,
AC5
So, the generation of Dups looks good.
I proceed now to delete all duplicate records that are not valid. If there are duplicate, valid records, they will stay duplicate unless some condition may be found, distinguishing among them one "good" record and declaring all others "invalid" (maybe repeating the procedure with a different field than Description).
But ay, there's a rub. Currently, you cannot delete from a table and select from the same table in a subquery ( http://dev.mysql.com/doc/refman/5.0/en/delete.html ). So a little workaround is needed:
CREATE TEMPORARY TABLE Dups AS
SELECT Product, COUNT(*) AS Duplicates
FROM ProductTable GROUP BY Product HAVING Duplicates > 1;
DELETE ProductTable FROM ProductTable JOIN Dups USING (Product)
WHERE Description IS NULL;
Now this will delete all invalid records, provided that they appear in the Dups table.
Therefore our CBPD14 record will be left untouched, because it does not appear there. The "good" record for CBPD10 will be left untouched because it's not true that its Description is NULL. All the others - poof.
Let me state again that if a record has no valid records and yet it is a duplicate, then all copies of that record will be killed - there will be no survivors.
To avoid this can may first SELECT (using the query above, the check "which should return nothing") the rows representing this mode of failure into another TEMPORARY TABLE, then INSERT them back into the main table after the deletion (using transactions might be in order).
Create a new table by scripting the old one out and renaming it. Also script all objects (indexes etc..) from the old table to the new. Insert the keepers into the new table. If you're database is in bulk-logged or simple recovery model, this operation will be minimally logged. Drop the old table and then rename the new one to the old name.
The advantage of this over a delete will be that the insert can be minimally logged. Deletes do double work because not only does the data get deleted, but the delete has to be written to the transaction log. For big tables, minimally logged inserts will be much faster than deletes.
If it's not that big and you have some downtime, and you have Sql Server Management studio, you can put an identity field on the table using the GUI. Now you have the situation like your CTE, except the rows themselves are truly distinct. So now you can do the following
SELECT MIN(table_a.MyTempIDField)
FROM
table_a lhs
join table_1 rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
table_a.MyTempIDField <> table_b.MyTempIDField
GROUP BY
lhs.field1, rhs.field2 etc
This gives you all the 'good' duplicates. Now you can wrap this query with a DELETE FROM query.
DELETE FROM lhs
FROM table_a lhs
join table_b rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
and lhs.MyTempIDField not in (
SELECT MIN(lhs.MyTempIDField)
FROM
table_a lhs
join table_a rhs
on lhs.field1 = rhs.field1
and lhs.field2 = rhs.field2 [etc]
WHERE
lhs.MyTempIDField <> rhs.MyTempIDField
GROUP BY
lhs.field1, lhs.field2 etc
)
Try this:
DELETE FROM TblProducts
WHERE Product IN
(
SELECT Product
FROM TblProducts
GROUP BY Product
HAVING COUNT(*) > 1)
This suffers from the defect that it deletes ALL the records with a duplicated Product. What you probably want to do is delete all but one of each group of records with a given Product. It might be worthwhile to copy all the duplicates to a separate table first, and then somehow remove duplicates from that table, then apply the above, and then copy remaining products back to the original table.