Database consistency check using SQL

Database consistency check using SQL - sql

I have the next problem:
The database in the system has denormalized tables with lets say "CompanyID" field held on almost every table in the database. This was done due to business rules purposes and should be that way. Sometimes the data is inconsistent as the Customer being with CompanyID == 1 has the order with CompanyID == 2.
My suggestion is to write specialized stored procedure which would be fired every once in a while and analyse some basic 'relation chains' on this property (meaning Cusomer with some Company ID should always have only Orders from the same Company ID where the latter has Articles with corresponding Company ID)
Quesqion:
Is there any generic way in SQL to fetch tables having field CompanyID and then checking them on consistency? Any other solutions to this problem?
I get the tables with the given column name using this SQL:
select column_name, c.is_nullable, c.table_schema, c.table_name, t.table_type, t.table_catalog, *
from information_schema.columns c join information_schema.tables t
on c.table_schema = t.table_schema and c.table_name = t.table_name
where column_name = 'CompanyID'
and table_type not in ('VIEW')
and t.table_name not like 'MsMerge%'
order by ordinal_position
After that I have in mind to traverse by foreign keys up the relation tree for record checking the equality of the CompanyID parameter.

I would not do that via a generic sql that runs every view minutes - this is the performance dead for big databases.
Instead you could use a Insert/Update Trigger on every effected table that you would code like that:
CREATE TRIGGER chk_tablename
ON T1 tablename
FOR INSERT,UPDATE
AS
BEGIN
// Your checks are here
// Log inconsistent data
END

example of query:
lets say these are our tables:
create table customer(
id int,
companyID int
)
create table orders (
orderId int,
customerid int,
companyID int --"wrong column"
)
you should tun queries like these:
update orders
set companyID=(select companyID from customer where id=customerid)
to correct the data, but also eliminate all usages of column companyID on table orders.
If you have loads of places where this is happening and you want to create an automate way of running the above query, you can look for the column companyID on the table sys.columns, get the table name from it and build a loop to generate the queries
EDIT (based on your answer on the comments):
so the logic is pretty much the same.
Loop through the sys.columns table to get the tables where the column appears and for each table run:
select *
from orders o
where companyID != (select companyID from customer where id=customerid)

Related

Postgres - How to find id's that are not used in different multiple tables (inactive id's) - badly written query

I have table towns which is main table. This table contains so many rows and it became so 'dirty' (someone inserted 5 milions rows) that I would like to get rid of unused towns.
There are 3 referent table that are using my town_id as reference to towns.
And I know there are many towns that are not used in this tables, and only if town_id is not found in neither of these 3 tables I am considering it as inactive and I would like to remove that town (because it's not used).
as you can see towns is used in this 2 different tables:
employees
offices
and for table * vendors there is vendor_id in table towns since one vendor can have multiple towns.
so if vendor_id in towns is null and town_id is not found in any of these 2 tables it is safe to remove it :)
I created a query which might work but it is taking tooooo much time to execute, and it looks something like this:
select count(*)
from towns
where vendor_id is null
and id not in (select town_id from banks)
and id not in (select town_id from employees)
So basically I said, if vendor_is is null it means this town is definately not related to vendors and in the same time if same town is not in banks and employees, than it will be safe to remove it.. but query took too long, and never executed successfully...since towns has 5 milions rows and that is reason why it is so dirty..
In face I'm not able to execute given query since server terminated abnormally..
Here is full error message:
ERROR: server closed the connection unexpectedly This probably means
the server terminated abnormally before or while processing the
request.
Any kind of help would be awesome
Thanks!

You can join the tables using LEFT JOIN so that to identify the town_id for which there is no row in tables banks and employee in the WHERE clause :
WITH list AS
( SELECT t.town_id
FROM towns AS t
LEFT JOIN tbl.banks AS b ON b.town_id = t.town_id
LEFT JOIN tbl.employees AS e ON e.town_id = t.town_id
WHERE t.vendor_id IS NULL
AND b.town_id IS NULL
AND e.town_id IS NULL
LIMIT 1000
)
DELETE FROM tbl.towns AS t
USING list AS l
WHERE t.town_id = l.town_id ;
Before launching the DELETE, you can check the indexes on your tables.
Adding an index as follow can be usefull :
CREATE INDEX town_id_nulls ON towns (town_id NULLS FIRST) ;
Last but not least you can add a LIMIT clause in the cte so that to limit the number of rows you detele when you execute the DELETE and avoid the unexpected termination. As a consequence, you will have to relaunch the DELETE several times until there is no more row to delete.

You can try an JOIN on big tables it would be faster then two IN
you could also try UNION ALL and live with the duplicates, as it is faster as UNION
Finally you can use a combined Index on id and vendor_id, to speed up the query
CREATE TABLe towns (id int , vendor_id int)
CREATE TABLE
CREATE tABLE banks (town_id int)
CREATE TABLE
CREATE tABLE employees (town_id int)
CREATE TABLE
select count(*)
from towns t1 JOIN (select town_id from banks UNION select town_id from employees) t2 on t1.id <> t2.town_id
where vendor_id is null
count
0
SELECT 1
fiddle

The trick is to first make a list of all the town_id's you want to keep and then start removing those that are not there.
By looking in 2 tables you're making life harder for the server so let's just create 1 single list first.
-- build empty temp-table
CREATE TEMPORARY TABLE TEMP_must_keep
AS
SELECT town_id
FROM tbl.towns
WHERE 1 = 2;
-- get id's from first table
INSERT TEMP_must_keep (town_id)
SELECT DISTINCT town_id
FROM tbl.banks;
-- add index to speed up the EXCEPT below
CREATE UNIQUE INDEX idx_uq_must_keep_town_id ON TEMP_must_keep (town_id);
-- add new ones from second table
INSERT TEMP_must_keep (town_id)
SELECT town_id
FROM tbl.employees
EXCEPT -- auto-distincts
SELECT town_id
FROM TEMP_must_keep;
-- rebuild index simply to ensure little fragmentation
REINDEX TABLE TEMP_must_keep;
-- optional, but might help: create a temporary index on the towns table to speed up the delete
CREATE INDEX idx_towns_town_id_where_vendor_null ON tbl.towns (town_id) WHERE vendor IS NULL;
-- Now do actual delete
-- You can do a `SELECT COUNT(*)` rather than a `DELETE` first if you feel like it, both will probably take some time depending on your hardware.
DELETE
FROM tbl.towns as del
WHERE vendor_id is null
AND NOT EXISTS ( SELECT *
FROM TEMP_must_keep mk
WHERE mk.town_id = del.town_id);
-- cleanup
DROP INDEX tbl.idx_towns_town_id_where_vendor_null;
DROP TABLE TEMP_must_keep;
The idx_towns_town_id_where_vendor_null is optional and I'm not sure if it will actaully lower the total time but IMHO it will help out with the DELETE operation if only because the index should give the Query Optimizer a better view on what volumes to expect.

Copy a dependent table

I have this customers table (I'm using SQL Server):
About 300 customers were registered in this table. I created another table in another database and inserted these customers into the new database.
Here is the new customers table:
But I have an operation table as well and I didn't change this one. The problem is the foreign key here. Since PhoneNumber is no longer the primary key in Customers table, customerId shouldn't be filled with the phone number anymore. I want to know how can I insert about 1000 operations in the new operation table but use each customer's ID as a foreign key instead of phone number in customerId.
Here is the operations table:

You can use following query to update the old data in operation table:
UPDATE OperationTable AS OT SET CustomerID =
(SELECT ID FROM CustomerTable AS CT WHERE CT.PhoneNumber = OT.CustomerID)

Assuming Counter in the old database is same as CustomerId in new database.
In your previous operations table of the old database, write a query like below
select 'Insert into OperationsTable values(
' + CT.Counter + ',
' + OT.TypeID + ',
' + OT.Amount + ',
' + OT.DateTime + ')' as newquery from OperationTable OT left outer join Customers CT on CT.PhoneNumber=OT.PhoneNumber
This will give you the insert queries which can be copied and run on the new OperationsTable in the new database
Now - if the above assumption is not correct - then -
You will have to write a query on the new customer table to insert the new Id from new customer database into the Counter Column (or a new CID column) of the old customer database, and then repeat the above step.

If we assume that AFDB is the old database and StoreDB is the new one, the following query helped me out:
INSERT INTO StoreDB.dbo.Operations
SELECT StoreDB.dbo.Customers.Id, TypeID, Amount, [DateTime]
FROM AFDB.dbo.Operations JOIN StoreDB.dbo.Customers
ON StoreDb.dbo.Customers.PhoneNumber = AFDB.dbo.Operations.CustomerId;

Create a field in Firebird which displays data from another table

I didn't find a working solution for creating a "lookup column" in a Firebird database.
Here is an example:
Table1: Orders
[OrderID] [CustomerID] [CustomerName]
Table2: Customers
[ID] [Name]
When I run SELECT * FROM ORDERS I want to get OrderID, CustomerID and CustomerName....but CustomerName should automatically be computed by looking for the "CustomerID" in the "ID" column of "Customer" Table, returning the content of the "Name" column.

Firebird has calculated fields (generated always as/computed by), and these allow selecting from other tables (contrary to an earlier version of this answer, which stated that Firebird doesn't support this).
However, I suggest you use a view instead, as I think it performs better (haven't verified this, so I suggest you test this if performance is important).
Use a view
The common way would be to define a base table and an accompanying view that gathers the necessary data at query time. Instead of using the base table, people would query from the view.
create view order_with_customer
as
select orders.id, orders.customer_id, customer.name
from orders
inner join customer on customer.id = orders.customer_id;
Or you could just skip the view and use above join in your own queries.
Alternative: calculated fields
I label this as an alternative and not the main solution, as I think using a view would be the preferable solution.
To use calculated fields, you can use the following syntax (note the double parentheses around the query):
create table orders (
id integer generated by default as identity primary key,
customer_id integer not null references customer(id),
customer_name generated always as ((select name from customer where id = customer_id))
)
Updates to the customer table will be automatically reflected in the orders table.
As far as I'm aware, the performance of this option is less than when using a join (as used in the view example), but you might want to test that for yourself.
FB3+ with function
With Firebird 3, you can also create calculated fields using a trigger, this makes the expression itself shorter.
To do this, create a function that selects from the customer table:
create function lookup_customer_name(customer_id integer)
returns varchar(50)
as
begin
return (select name from customer where id = :customer_id);
end
And then create the table as:
create table orders (
id integer generated by default as identity primary key,
customer_id integer not null references customer(id),
customer_name generated always as (lookup_customer_name(customer_id))
);
Updates to the customer table will be automatically reflected in the orders table. This solution can be relatively slow when selecting a lot of records, as the function will be executed for each row individually, which is a lot less efficient than performing a join.
Alternative: use a trigger
However if you want to update the table at insert (or update) time with information from another table, you could use a trigger.
I'll be using Firebird 3 for my answer, but it should translate - with some minor differences - to earlier versions as well.
So assuming a table customer:
create table customer (
id integer generated by default as identity primary key,
name varchar(50) not null
);
with sample data:
insert into customer(name) values ('name1');
insert into customer(name) values ('name2');
And a table orders:
create table orders (
id integer generated by default as identity primary key,
customer_id integer not null references customer(id),
customer_name varchar(50) not null
)
You then define a trigger:
create trigger orders_bi_bu
active before insert or update
on orders
as
begin
new.customer_name = (select name from customer where id = new.customer_id);
end
Now when we use:
insert into orders(customer_id) values (1);
the result is:
id customer_id customer_name
1 1 name1
Update:
update orders set customer_id = 2 where id = 1;
Result:
id customer_id customer_name
1 2 name2
The downside of a trigger is that updating the name in the customer table will not automatically be reflected in the orders table. You would need to keep track of these dependencies yourself, and create an after update trigger on customer that updates the dependent records, which can lead to update/lock conflicts.

No need here a complex lookup field.
No need to add a persistant Field [CustomerName] on Table1.
As Gordon said, a simple Join is enough :
Select T1.OrderID, T2.ID, T2.Name
From Customers T2
Join Orders T1 On T1.IDOrder = T2.ID
That said, if you want to use lookup Fields (as we do it on a Dataset) with SQL you can use some thing like :
Select T1.OrderID, T2.ID,
( Select T3.YourLookupField From T3 where (T3.ID = T2.ID) )
From Customers T2 Join Orders T1 On T1.IDOrder = T2.ID
Regards.

Call function that returns table in a view in SQL Server 2000

SQL Server - Compatibility Level 2000
Person table - PersonId, PersonName, etc.. (~1200 records)
Two user functions - GetPersonAddress(#PersonId), GetPaymentAddress(#PersonId)
These two functions return data in a table with Street, City etc...(one record in the return table for the PersonId)
I have to create a view that joins the person table with these two user functions by passing in the person id.
Limitations:
Cross Apply is not supported on a function in SQL Server 2000
Cursor, temp table and temp variables are not supported in views so that I can loop upon the person table and call the functions.
Can someone help?

You could create functions GetPeopleAddresses() and GetPaymentsAddresses() which return PersonId as a field and then you can use them in JOIN:
SELECT t.PersonId, PersonName, etc..., a1.Address, a2.Address
FROM YourTable t
LEFT JOIN GetPeopleAddresses() a1 ON a1.PersonId = t.PersonId
LEFT JOIN GetPaymentsAddresses() a2 ON a2.PersonId = t.PersonId
Of course, your functions have to return only unique records

I'm afraid that you can't do that with a view in SQL Server 2000 because of the limitations you listed. The next best option as suggested in the comments is a stored procedure that returns the rows that would return the view.
If you need to use the results of the procedure in another query, you can insert the values returned by the procedure in a temporal table. Is not pretty, and you have to make two DB calls (one for creating/populating the temp table, and the other for using it), but it works. For example:
create table #TempResults (
PersonID int not null,
Name varchar(100),
Street varchar(100),
City varchar(100),
<all the other fields>
constraint primary key PK_TempResults (PersonID)
)
insert into #TempResults
exec spTheProcedureThatReplaceTheView #thePersonID
go -- end of the first DB call
select <fields>
from AnotherTable
join #TempResults on <condition>
-- don't forget to drop table when you don't need its current data anymore
drop table #TempResults

How to get an attribute value from a table to use in a query?

I have a table to store table names (lets call it "CUSTOM_TABLES").
I have a table to register data tracking (call it "CONSUMPTIONS").
I have tables that the user was created and I don't know its names (created at runtime) so, the system create the table ( execute the DDL) and store its name in "CUSTOM_TABLES". Lets call it "USER_TABLE" for now.
When a data is produced in a table "USER_TABLE", I register in the tracking table ("CONSUMPTIONS") the row ID of data and the ID of the "USER_TABLE" found in "CUSTOM_TABLES".
Now, I need to find, given a consumption, what table and what row the data is. Remember: in "CONSUMPTIONS" table I have only an ID (FK) pointing to "CUSTOM_TABLES".
CREATE TABLE consumptions
(
id_consumption serial NOT NULL,
id_row integer,
id_table integer
)
CREATE TABLE custom_tables
(
id_table serial NOT NULL,
description character varying(250),
name character varying(150)
)
The query I need is here:
select * from consumptions c
join insurance i on c.id_row = i.index_id
In this case, "insurance" is the "USER_TABLE".
But I don't know "insurance". I need to find it in "CUSTOM_TABLES" by using its ID.
select name from custom_tables where
id_table = consumption.id_table
The final query must be something like:
select * from consumptions c
join
(select name from custom_tables t where t.id_table = c.id_table) i
on c.id_row = i.index_id
I guarantee the "user_tables" have "index_id" attribute as its PK.
I prefer do not use functions.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas