How to Detect database data changes - sql

In Sql Server database, is there a way to detect data changes, i tried to do that using checksum & checksum_agg but on a single table, all i want is to have the entire checksum (hash) for all tables
the query i used for a signle table is
select isnull(CHECKSUM_AGG(CHECKSUM(*)),0) from mytable

It sounds like you need INSERT/UPDATE/DELETE triggers.
I'm not sure that this will be efficient. As the table grows the computation will become more costly. Are you sure you need to do this every time data changes? How often are changes made? You're throwing a lot of work away with every operation. Maybe you can do it on request when needed rather than with every write operation.

You can use the 'SP_MSFOREACHTABLE' command to run a loop through all of your tables and aggregate the sums

SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) as test FROM yor_table_name WITH (NOLOCK);
You can detect the change in sql server with this code, there is a number pool here and a number is selected for each change, a number does not come twice in a row.

Related

Select part of a table for later use

I'm currently trying to optimize my program. I have a large database which consists of data which are timestamped. The data I need to update is only data for the current day, so I don't want to search the entire database more than once to find only the entries of today. Is there a way to select something and then use it later in several different (MERGE INTO) commands?
I want to select all the data of today, then run a while loop (in java) on every single entry of data for today updating them all. So is this even possible? Or do I have to traverse the entire database for each while-loop iteration?
If you are optimizing your program and your database is timestamped. Then the first thing you can do is to create index for the timestamps field. This will reduce your query execution time because your filter criteria is related to that time-stamp field.
Use a proper data caching technology, like memcached in order to minimize database hits for read-heavy, slowly changing data.

How to find number of rows inserted/deleted in MySQL

Is there a way to find out the number of rows inserted/deleted in a table in MySQL? Is this kind of statistics kept somewhere in the database? If not, what would be the best way to implement something to keep track of these statistics?
When I say how many, I mean within a certain period (last 24 hours, or since server was up, or last week etc)
When I need to keep track of deleted things, I just don't delete.
I change a column value that excludes it from normal user results.
If space is an issue, you can set it's contents you no longer care about to empty.
Inserted you can user COUNT()
The Binary Log contains records of all queries that update or insert data. I don't know if it stores the number of affected rows, however.
There is also a General Query Log, which tracks all queries that were run.
(Information current for MySQL 5.0. If you're using an older version ymmv)
If I want to handle logging my SQL queries, I have 2 possibilities:
Turning the MySQL Log function on
Writting my own 'trace' class
I prefer doing number 2.
Why?
Because it is more controllable. You can easily differ from INSERT DELETE UPDATE and so on queries.
But that is not the only advantage of your own trace class, because creating trace files (so called "logs") makes administrative tasks much more easier.
You can structure the trace output, put it into a separate database, store it into some XML or JSON file.
You can order things as you want them to be.

how to automatically determine which tables need a vacuum / reindex in postgresql

i've written a maintenance script for our database and would like to run that script on whichever tables most need vacuuming/reindexing during our down time each day. is there any way to determine that within postgres?
i would classify tables needing attention like this:
tables that need vacuuming
tables that need reindexing (we find this makes a huge difference to performance)
i see something roughly promising here
It sounds like you are trying to re-invent auto-vacuum. Any reason you can't just enable that and let it's do it's job?
For the actual information you want, look at pg_stat_all_tables and pg_stat_all_indexes.
For a good example of how to use the data in it, look at the source for auto-vacuum. It doesn't query the views directly, but it uses that information.
I think you really should consider auto-vacuum.
However, if i did understood right your needs, that's what i'll do:
For every table (how many tables do you have?) define the criterias;
For example, talbe 'foo' need to be reindex every X new records and vacuum every X update, delete or insert
Write out your own application to do that.
Every day it check the tables status, save it in a log (to compare the rows difference over the time), and then reindex/vacuum the tables whose match yours criterias.
Sounds a little hacking, but i think is a good way to do an custom-autovacuum-with-custom-'triggers'-criteria
How about adding the same trigger function that runs after any CRUD action to all the tables.
The function will receives table name, checks the status of the table, and then run vacuum or reindex on that table.
Should be a "simple" pl/sql trigger, but then those are never simple...
Also, if your DB machine is strong enough, and your downtime long enough, just run a script every night to reindex it all, and vacuum it all... that way even if your criteria was not met at test time (night) but the was close to it (few records less than your criteria), it will not pose an issue the next day when it does reach the criteria...

SQL, selecting and updating

I am trying to select 100s of rows at a DB that contains 100000s of row and update those rows afters.
the problem is I don't want to go to DB twice for this purpose since update only marks those rows as "read".
is there any way I can do this in java using simple jdbc libraries? (hopefully without using stored procedures)
update: ok here is some clarification.
there are a few instance of same application running on different servers, they all need to select 100s of "UNREAD" rows sorted according to creation_date column, read blob data within it, write it to file and ftp that file to some server. (I know prehistoric but requirements are requirements)
The read and update part is for to ensure each instance getting diffent set of data. (in order, tricks like odds and evens wont work :/)
We select data for update. the data transfers through the wire (we wait and wait) and then we update them as "READ". then release lock for reading. this entire thing takes too long. By reading and updating at the same time, I would like to reduce lock time (from time we use select for update to actual update) so that using multiple instances would increase read rows per second.
Still have ideas?
It seems to me there might be more than one way to interpret the question here.
You are selecting the rows for the
sole purpose of updating them and
not reading them.
You are selecting the rows to show
to somebody, and marking them as
read either one at a time or all as a group.
You want to select the rows and mark
them as read at the time you select
them.
Let's take Option 1 first, as that seems to be the easiest. You don't need to select the rows in order to update them, just issue an update with a WHERE clause:
update table_x
set read = 'T'
where date > sysdate-1;
Looking at option 2, you want to mark them as read when a user has read them (or a down stream system has received it, or whatever). For this, you'll probably have to do another update. If you query for the primary key, in addition to the other columns you'll need in the first select, you will probably have an easier time of updating, as the DB won't have to do table or index scans to find the rows.
In JDBC (Java) there is a facility to do a batch update, where you execute a set of updates all at once. That's worked out well when I need to perform a lot of updates that are of the exact same form.
Option 3, where you want to select and update all in one shot. I don't find much use for this, personally, but that doesn't mean others don't. I suppose some kind of stored procedure would reduce the round trips. I'm not sure what db you are working with here and can't really offer specifics.
Going to the DB isn't so bad. If you aren't returning anything 'across the wire' then an update shouldn't do you too much damage and its only a few hundred thousand rows. What is your worry?
If you're doing a SELECT in JDBC and iterating over the ResultSet to UPDATE each row, you're doing it wrong. That's an (n+1) query problem that will never perform well.
Just do an UPDATE with a WHERE clause that determines which of those rows needs to be updated. It's a single network round trip that way.
Don't be too code-centric. Let the database do the job it was designed for.
Can't you just use the same connection without closing it?

Oracle SQL technique to avoid filling trans log

Newish to Oracle programming (from Sybase and MS SQL Server). What is the "Oracle way" to avoid filling the trans log with large updates?
In my specific case, I'm doing an update of potentially a very large number of rows. Here's my approach:
UPDATE my_table
SET a_col = null
WHERE my_table_id IN
(SELECT my_table_id FROM my_table WHERE some_col < some_val and rownum < 1000)
...where I execute this inside a loop until the updated row count is zero,
Is this the best approach?
Thanks,
The amount of updates to the redo and undo logs will not at all be reduced if you break up the UPDATE in multiple runs of, say 1000 records. On top of it, the total query time will be most likely be higher compared to running a single large SQL.
There's no real way to address the UNDO/REDO log issue in UPDATEs. With INSERTs and CREATE TABLEs you can use a DIRECT aka APPEND option, but I guess this doesn't easily work for you.
Depends on the percent of rows almost as much as the number. And it also depends on if the update makes the row longer than before. i.e. going from null to 200bytes in every row. This could have an effect on your performance - chained rows.
Either way, you might want to try this.
Build a new table with the column corrected as part of the select instead of an update. You can build that new table via CTAS (Create Table as Select) which can avoid logging.
Drop the original table.
Rename the new table.
Reindex, repoint contrainst, rebuild triggers, recompile packages, etc.
you can avoid a lot of logging this way.
Any UPDATE is going to generate redo. Realistically, a single UPDATE that updates all the rows is going to generate the smallest total amount of redo and run for the shortest period of time.
Assuming you are updating the vast majority of the rows in the table, if there are any indexes that use A_COL, you may be better off disabling those indexes before the update and then doing a rebuild of those indexes with NOLOGGING specified after the massive UPDATE statement. In addition, if there are any triggers or foreign keys that would need to be fired/ validated as a result of the update, getting rid of those temporarily might be helpful.