skip-lock-tables and mysqldump - locking

Daily we run mysql dumps on about 50 individual databases, package them up and then store them offsite. Some of these databases are rather large and contain myisam tables (which CANNOT be changed so suggesting it is pointless).. I have been reading up on using the skip-lock-tables option when doing a dump but have not read what the downside would be. All I see are basically different iterations of "it could have adverse effects if data is inserted to a table while it is dumping."
What are these adverse effects? Does it just mean we will miss those queries upon a restore or will it mean the dump file will be broken and useless? I honestly could care less if we lose NEW data posted after the dump has started as I am just looking for a snapshot in time.
Can I rely on these database dumps to contain all the data that was saved before issuing the dump.

--skip-lock-tables parameter instructs the mysqldump utility not to issue a LOCK TABLES command before obtaining the dump which will acquire a READ lock on every table. All tables in the database should be locked, for improved consistency in case of a backup procedure. Even with skip-lock-tables, while a table is dumped, will not receive any INSERTs or UPDATEs whatsoever, as it will be locked due the SELECT required to obtain all records from the table. It looks like this
SELECT SQL_NO_CACHE * FROM my_large_table
and you can see it in the process list with the SHOW PROCESSLIST command.
If you are using the MyISAM engine which is non-transactional, locking the tables will not guarantee referential integrity and data consistency in any case, I personally use the --skip-lock-tables parameter almost always. In InnoDB use the --single-transaction parameter for the expected effect.
Hope this helps.

Related

DB2: Working with concurrent DDL operations

We are working on a data warehouse using IBM DB2 and we wanted to load data by partition exchange. That means we prepare a temporary table with the data we want to load into the target table and then use that entire table as a data partition in the target table. If there was previous data we just discard the old partition.
Basically you just do "ALTER TABLE target_table ATTACH PARTITION pname [starting and ending clauses] FROM temp_table".
It works wonderfully, but only for one operation at a time. If we do multiple loads in parallel or try to attach multiple partitions to the same table it's raining deadlock errors from the database.
From what I understand, the problem isn't necessarily with parallel access to the target table itself (locking it changes nothing), but accesses to system catalog tables in the background.
I have combed through the DB2 documentation but the only reference to the topic of concurrent DDL statements I found at all was to avoid doing them. The answer to this question, can't be to simply not attempt it?
Does anyone know a way to deal with this problem?
I tried to have a global, single synchronization table to lock if you want to attach any partitions, but it didn't help either. Either I'm missing something (implicit commits somewhere?) or some of the data catalog updates even happen asynchronously, which makes the whole problem much worse. If that is the case, is there are any chance at all to query if the attach is safe to perform at any given moment?

How to lock a table in SQL Server

How to lock a table in SQL Server ? I found running queries with lock and also read transactions but
confused how to use these.
I have two processes which are reading a table first then updating data in it . I want only one to update and other get this update in its read . working of my processes is as follows:-
Lock table
read data
update data if it is not updated by other process.
release Lock.
thanks
You can use TABLOCKX hint to lock entire table, but locking entire table is usually a bad idea, you might want to reconsider if you really need it.
If you want to ensure you're updating latest data, you can use rowversion column, and double check before update instead of locking entire table for reading.
In your select statement you can provide a "select for update" table hint: with (updlock). Depending on what percentage of records you are updating and their physical distribution this might perform better than a table lock.
But as Fedor Hajdu pointed out, what you probably want is an optimistic locking scheme. Check out the documentation for the READ COMMITTED SNAPSHOT isolation level. You might also find this article useful as an introduction.

Automatically dropping PostgreSQL tables once per day

I have a scenario where I have a central server and a node. Both server and node are capable of running PostgreSQL but the storage space on the node is limited. The node collects data at a high speed and writes the data to its local DB.
The server needs to replicate the data from the node. I plan on accomplishing this with Slony-I or Bucardo.
The node needs to be able to delete all records from its tables at a set interval in order to minimize disk space used. Should I use pgAgent with a job consisting of a script like
DELETE FROM tablex, tabley, tablez;
where the actual batch file to run the script would be something like
#echo off
C:\Progra~1\PostgreSQL\9.1\bin\psql -d database -h localhost -p 5432 -U postgres -f C:\deleteFrom.sql
?
I'm just looking for opinions if this is the best way to accomplish this task or if anyone knows of a more efficient way to pull data from a remote DB and clear that remote DB to save space on the remote node. Thanks for your time.
The most efficient command for you is the TRUNCATE command.
With TRUNCATE, you can chain up tables, like your example:
TRUNCATE tablex, tabley, tablez;
Here's the description from the postgres docs:
TRUNCATE quickly removes all rows from a set of tables. It has the same effect as an unqualified DELETE on each table, but since it does not actually scan the tables it is faster. Furthermore, it reclaims disk space immediately, rather than requiring a subsequent VACUUM operation. This is most useful on large tables.
You may also add CASCADE as a parameter:
CASCADE Automatically truncate all tables that have foreign-key references to any of the named tables, or to any tables added to the group due to CASCADE.
The two best options, depending on your exact needs and workflow, would be truncate, as #Bohemian suggested, or to create a new table, rename, then drop.
We use something much like the latter create/rename/drop method in one of our major projects. This has an advantage where you need to be able to delete some data, but not all data, from a table very quickly. The basic workflow is:
Create a new table with a schema identical to the old one
CREATE new_table LIKE ...
In a transaction, rename the old and new tables simultaneously:
BEGIN;
RENAME table TO old_table;
RENAME new_table TO table;
COMMIT;
[Optional] Now you can do stuff with the old table, while the new table is happily accepting new inserts. You can dump the data to your centralized server, run queries on it, or whatever.
Delete the old table
DROP old_table;
This is an especially useful strategy when you want to keep, say, 7 days of data around, and only discard the 8th day's data all at once. Doing a DELETE in this case can be very slow. By storing the data in partitions (one for each day), it is easy to drop an entire day's data at once.

Minimally Logged Insert Into

I have an INSERT statement that is eating a hell of a lot of log space, so much so that the hard drive is actually filling up before the statement completes.
The thing is, I really don't need this to be logged as it is only an intermediate data upload step.
For argument's sake, let's say I have:
Table A: Initial upload table (populated using bcp, so no logging problems)
Table B: Populated using INSERT INTO B from A
Is there a way that I can copy between A and B without anything being written to the log?
P.S. I'm using SQL Server 2008 with simple recovery model.
From Louis Davidson, Microsoft MVP:
There is no way to insert without
logging at all. SELECT INTO is the
best way to minimize logging in T-SQL,
using SSIS you can do the same sort of
light logging using Bulk Insert.
From your requirements, I would
probably use SSIS, drop all
constraints, especially unique and
primary key ones, load the data in,
add the constraints back. I load
about 100GB in just over an hour like
this, with fairly minimal overhead. I
am using BULK LOGGED recovery model,
which just logs the existence of new
extents during the logging, and then
you can remove them later.
The key is to start with barebones
tables, and it just screams. Building
the index once leaves you will no
indexes to maintain, just the one
index build per index.
If you don't want to use SSIS, the point still applies to drop all of your constraints and use the BULK LOGGED recovery model. This greatly reduces the logging done on INSERT INTO statements and thus should solve your issue.
http://msdn.microsoft.com/en-us/library/ms191244.aspx
Upload the data into tempdb instead of your database, and do all the intermediate transformations in tempdb. Then copy only the final data into the destination database. Use batches to minimize individual transaction size. If you still have problems, look into deploying trace flag 610, see The Data Loading Performance Guide and Prerequisites for Minimal Logging in Bulk Import:
Trace Flag 610
SQL Server 2008 introduces trace flag
610, which controls minimally logged
inserts into indexed tables.

Suspect someone is deleting records from my SQL Server '05 DB - any way to check?

I think someone with shared access to my SQL Server '05 DB is deleting records from a table in a DB for their own reasons.
Is there any audit table I can check to see manual delete queries which may have been run on the DB in the last X number of days?
Thanks for your help.
Ed
May want to consider using a trigger temporarily.
Here's an example.
I'd add an on delete trigger to the table in question. That would allow you to keep an exact log of deleted records (ie, if on your trigger you insert into another table, etc)
SELECT deqs.last_execution_time AS [Time], dest.TEXT AS [Query]
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
ORDER BY deqs.last_execution_time DESC
SQL Server Profiler is probably the easiest way to do this. You can set it to dump all executed queries to a table in the database, or to a file which might be more suitable in your case. You can also set a filter to capture just the queries you're interested in, or the log files become huge.
Unless you've set things up beforehand (via triggers, running Profiler traces, or the like) no, there is no simple native way to "pull out" commands that have been run against a SQL Server database.
#David's idea of querying the procedure cache is one possibility, but would only work if the execution plan(s) are still in memory.
There are third-party transaction log readers available. They could be used to read the contents of the transaction log, but again that only helps if the data/commands are still in there, and after "X days" that seems unlikely.
Another work-around would depend on backups.
Restore a copmlete backup from before your problem time, and compare and contrast with the current version. This would show if data has been deleted, but not how.
If you are in Full backup mode and you have transaction log backups, you can perform various types of incremental restores and actually observer the deletions happening (if they are), but this would probably require a lot of point-in-time recoveries and would be very time intensive.