I have a column in one of my tables which is suppose to be the total sum for from the rows of a number of tables. Is there a way i can have a default query which runs on the total sum column so that every time a row is added to the other table an update is made in the total sum column.
Thanks
You might want to look at using a view instead of a table for this, something like the following might help.
Select table.*, sum(otherTable.column)
from table
inner join otherTable on table.something = otherTable.something
Sounds like you want to add a trigger.
http://dev.mysql.com/doc/refman/5.0/en/triggers.html
You want to update the total sum column every time one of the columns in the other tables is changed? Then a trigger may serve your purposes.
Create Trigger For Insert, Update, Delete
On OtherTable
As
Update SumTable Set
SumColumn =
(Select Sum(Column)
From OtherTable
Where something = s.Something)
From SumTable s
Where Something In
(Select Distinct something From inserted
Union
Select Distinct Something From deleted)
or, you can separate the code for a delete from the code for an insert or update by writing separate triggers, or by:
Create Trigger For Insert, Update, Delete
On OtherTable
As
If Exists(Select * From inserted) And Update(Column)
Update SumTable Set
SumColumn =
(Select Sum(Column)
From OtherTable
Where something = s.Something)
From SumTable
Where Something In
(Select Distinct Something
From Inserted)
Else If Exists(Select * From deleted)
Update SumTable Set
SumColumn =
(Select Sum(Column)
From OtherTable
Where something = s.Something)
From SumTable
Where Something In
(Select Distinct Something
From deleted)
As Charles said, a trigger works well in this situation. If the sum of rows from other tables changes frequently however, I'm not sure if a trigger would cause performance issues. There are two other approaches:
Views - A view is essentially a saved query, and you query on it just like a table. If the sum data is only needed for reporting-type stuff, you may be better off removing the sum column from your main table and using the view for reporting
Stored Procedure - If you prefer to keep the column in the main table, you could run a stored procedure on a regular basis that keeps the sum information up-to-date for all rows.
I would compare performance between the view idea and the trigger idea before deciding which to use. Do this against the full data set you expect the view to have, not just a small test set of data. Make sure to index the view if it is possible.
Related
I always had a query in the form of:
UPDATE
users
SET
col = 1
WHERE
user_id IN (
SELECT
user_id
FROM
users
WHERE
...
LIMIT 1
FOR UPDATE
);
And I was pretty sure that it generates a lock on the affected row until the update is done.
Now I wrote the same query using CTE and doing
WITH query AS (
select
user_id
FROM
users
WHERE
...
LIMIT 1
FOR UPDATE
)
UPDATE
users
SET
col = 1
WHERE
user_id IN (
SELECT
user_id
FROM
query
);
I’m actually having some doubts that it is applying a row lock because of the results I get, but I couldn’t find anything documented about this.
Can someone make it clear? Thanks
Edit:
I managed to find this:
If specific tables are named in FOR UPDATE or FOR SHARE, then only rows coming from those tables are locked; any other tables used in the SELECT are simply read as usual. A FOR UPDATE or FOR SHARE clause without a table list affects all tables used in the statement. If FOR UPDATE or FOR SHARE is applied to a view or sub-query, it affects all tables used in the view or sub-query. However, FOR UPDATE/FOR SHARE do not apply to WITH queries referenced by the primary query. If you want row locking to occur within a WITH query, specify FOR UPDATE or FOR SHARE within the WITH query.
https://www.postgresql.org/docs/9.0/sql-select.html#SQL-FOR-UPDATE-SHARE
So I guess it should work only if the for update is in the with and not in the query that is using the with?
Consider two very large tables, Table A with 20 million rows in, and Table B which has a large overlap with TableA with 10 million rows. Both have an identifier column and a bunch of other data. I need to move all items from Table B into Table A updating where they already exist.
Both table structures
- Identifier int
- Date DateTime,
- Identifier A
- Identifier B
- General decimal data.. (maybe 10 columns)
I can get the items in Table B that are new, and get the items in Table B that need to be updated in Table A very quickly, but I can't get an update or a delete insert to work quickly. What options are available to merge the contents of TableB into TableA (i.e. updating existing records instead of inserting) in the shortest time?
I've tried pulling out existing records in TableB and running a large update on table A to update just those rows (i.e. an update statement per row), and performance is pretty bad, even with a good index on it.
I've also tried doing a one shot delete of the different values out of TableA that exist in TableB and performance of the delete is also poor, even with the indexes dropped.
I appreciate that this may be difficult to perform quickly, but I'm looking for other options that are available to achieve this.
Since you deal with two large tables, in-place updates/inserts/merge can be time consuming operations. I would recommend to have some bulk logging technique just to load a desired content to a new table and the perform a table swap:
Example using SELECT INTO:
SELECT *
INTO NewTableA
FROM (
SELECT * FROM dbo.TableB b WHERE NOT EXISTS (SELECT * FROM dbo.TableA a WHERE a.id = b.id)
UNION ALL
SELECT * FROM dbo.TableA a
) d
exec sp_rename 'TableA', 'BackupTableA'
exec sp_rename 'NewTableA', 'TableA'
Simple or at least Bulk-Logged recovery is highly recommended for such approach. Also, I assume that it has to be done out of business time since plenty of missing objects to be recreated on a new tables: indexes, default constraints, primary key etc.
A Merge is probably your best bet, if you want to both inserts and updates.
MERGE #TableB AS Tgt
USING (SELECT * FROM #TableA) Src
ON (Tgt.Identifier = SRc.Identifier)
WHEN MATCHED THEN
UPDATE SET Date = Src.Date, ...
WHEN NOT MATCHED THEN
INSERT (Identifier, Date, ...)
VALUES (Src.Identifier, Src.Date, ...);
Note that the merge statement must be terminated with a ;
I want to update multiple rows. I have a lot of ids that specify which row to update (around 12k ids).
What would be the best way to achieve this?
I know I could do
UPDATE table SET col="value" WHERE id = 1 OR id = 24 OR id = 27 OR id = ....repeatx10000
But I figure that would give bad performance, right? So is there a better way to specify which ids to update?
Postgresql version is 9.1
In terms of strict update performance not much will change. All rows with given IDs must be found and updated.
One thing that may simplify your call is to use the in keyword. It goes like this:
UPDATE table SET col="value" WHERE id in ( 1,24,27, ... );
I would also suggest making sure that the ID's are in the same order like the index on the id suggests, probably ascending.
Put your IDs in a table. Then do something like this:
UPDATE table SET col="value" WHERE id in (select id from table_of_ids_to_update)
Or if the source of your ids is some other query, use that query to get the ids you want to update.
UPDATE table SET col="value" WHERE id in (
select distinct id from some_other_table
where some_condition_for_updating is true
... etc. ...
)
For more complex cases of updating based on another table, this question gives a good example.
UPDATE table SET col="value" WHERE id in ( select id from table);
Also make indexing on your id field so, you will get better performance.
It's worth noting that if you do reference a table as #dan1111 suggests, don't use in (select ..), and certainly avoid distinct! Instead, use exists -
update table
set col = value
where exists (
select from other_table
where other_table.id = table.id
)
This ensures that the reference table is only scanned as much as it is needed.
I have two tables with following columns:
SUMMARY(sum_id, sum_number) and DETAILS(det_id, det_number, sum_id)
I want to delete rows from table DETAILS with det_id in list of IDs, which can be done by:
DELETE FROM details WHERE det_id in (1,2,3...)
BUT
At the same time I need to update table SUMMARY if summary.sum_id=details.sum_id
UPDATE summary SET sum_number-=somefunction(details.det_number)
WHERE summary.sum_id=details.sum_id
More over, afterwards it would be totally great to delete rows from SUMMARY table if sum_number<=0
How to do all this in an intelligent way?
What if i know, from the very beginning, both IDs: details.det_id (to delete) AND summary.sum_id which correspond to details.det_id
You did not specify a DBMS so I'm assuming PostgreSQL.
You can do this with a single statement using the new writeable CTE feature:
with deleted as (
delete from details
where det_id in (1,2,3...)
returning details.*
),
new_summary as (
update summary
set sum_number = some_function(deleted.det_number)
from deleted
where delete.sum_id = summary.sum_id
returning summary.sum_id
)
delete from summary
where sum_number <= 0
and sum_id in (select sum_id from new_summary);
The in condition in the outer delete is not strictly necessary, but you may not have CTE definitions that you don't use, so the condition ensures that the new_summary CTE is actually used in the statement. Additionally it might improve performance a bit, because only the changed summary rows are checked (not all).
It is not possible to perform all of these operations in a single statement. You would have to do something like this:
UPDATE summary SET sum_number = somefunction(details.det_number)
FROM summary INNER JOIN details ON summary.sum_id = details.sum_id
DELETE FROM details WHERE det_id IN (1,2,3,...)
DELETE FROM summary WHERE sum_number <= 0
I would use a trigger... then the database is responsible for the deletes.
Using an update trigger, once/if the Update is successfull if will fire the trigger which can do as much or as little as you need... i.e. it can do you're 2 deletes.
For an example have a read of this tutorial:
http://www.mysqltutorial.org/create-the-first-trigger-in-mysql.aspx this answer (http://stackoverflow.com/questions/6296313/mysql-trigger-after-update-only-if-row-has-changed) from stackoverflow also provides a good example.
Many years ago, I was asked during a phone interview to delete duplicate rows in a database. After giving several solutions that do work, I was eventually told the restrictions are:
Assume table has one VARCHAR column
Cannot use rowid
Cannot use temporary tables
The interviewer refused to give me the answer. I've been stumped ever since.
After asking several colleagues over the years, I'm convinced there is no solution. Am I wrong?!
And if you did have an answer, would a new restriction suddenly present itself? Since you mention ROWID, I assume you were using Oracle. The solutions are for SQL Server.
Inspired by SQLServerCentral.com http://www.sqlservercentral.com/scripts/T-SQL/62866/
while(1=1) begin
delete top (1)
from MyTable
where VarcharColumn in
(select VarcharColumn
from MyTable
group by VarcharColumn
having count(*) > 1)
if ##rowcount = 0
exit
end
Deletes one row at a time. When the second to last row of a set of duplicates disappears then the remaining row won't be in the subselect on the next pass through the loop. (BIG Yuck!)
Also, see http://www.sqlservercentral.com/articles/T-SQL/63578/ for inspiration. There RBarry Young suggests a way that might be modified to store the deduplicated data in the same table, delete all the original rows, then convert the stored deduplicated data back into the right format. He had three columns, so not exactly analogous to what you are doing.
And then it might be do-able with a cursor. Not sure and don't have time to look it up. But create a cursor to select everything out of the table, in order, and then a variable to track what the last row looked like. If the current row is the same, delete, else set the variable to the current row.
This is a completely Jacked up way to do it, but given the assanine requirements, here is a workable solution assuming SQL 2005 or later:
DELETE from MyTable
WHERE ROW_NUMBER() over(PARTITION BY [MyField] order by MyField)>1
I would put a unique number of fixed size in the VARCHAR column for the duplicated rows, then parse out the number and delete all but the minimum row. Maybe that's what his VARCHAR constraint is for. But that stinks because it assumes that your unique number will fit. Lame question. You didn't want to work there anyway. ;-)
Assume you are implementing the DELETE statement for a SQL engine. how will you delete two rows from a table that are exactly identical? You need something to distinguish one from the other!
You actually cannot delete entirely duplicate rows (ALL columns being equal) under the following constraints(as provided to you)
No use of ROWID or ROWNUM
No Temporary Table
No procedural code
It can, however be done even if one of the conditions is relaxed. Here are solutions using at least one of the three conditions
Assume table is defined as below
Create Table t1 (
col1 vacrchar2(100),
col2 number(5),
col3 number(2)
);
Duplicate rows identification:
Select col1, col2, col3
from t1
group by col1, col2, col3
having count(*) >1
Duplicate rows can also be identified using this:
select c1,c2,c3, row_number() over (partition by (c1,c2,c3) order by c1,c2,c3) rn from t1
NOTE: The row_number() analytic function cannot be used in a DELETE statement as suggested by JohnFx at least in Oracle 10g.
Solution using ROWID
Delete from t1 where row_id > ( select min(t1_inner.row_id) from t1 t1_innner where t1_inner.c1=t1.c1 and t1_inner.c2=t1.c2 and t1_inner.c3=t1.c3))
Solution using temp table
create table t1_dups as (
//write query here to find the duplicate rows as liste above//
)
delete from t1
where t1.c1,t1.c2,t1.c3 in (select * from t1.dups)
insert into t1(
select c1,c2,c3 from t1_dups)
Solution using procedural code
This will use an approach similar to the case where we use a temp table.
create table temp as
select c1,c2
from table
group by c1,c2
having(count(*)>1 or count(*)=1);
Now drop the base table .
Rename the temp table to base table.
Mine was resolved using this query:
delete from where in (select from group by having count(*) >1)
in PLSQL