Generation of backup puts my website down - is there an easy solution? - backup

On my database server there is a cronjob that backups all the databases in a way that makes it easy to restore them.
It is something like this:
0 5 * * * /usr/local/bin/backup.php
The problem is that the website (using that db server) is very slow during that process.
Even, Pingdom sends me a 'website down' alert at the start of the process.
To solve the problem, I have tried this change:
0 5 * * * /bin/nice -n 19 /usr/local/bin/backup.php
but it doesn't seem to improve the situation.
How is that possible?
How would you solve the problem under these requirements?
1. no purchase of any hardware
2. easy to implement and maintain
3. no proprietary solutions

You could put a delay in your backup.php script that ensures a maximum backed up records per second rate or similar, ie using sleep().

The general strategy to solve the general situation of a task that takes a long time to run is to try and run it incrementally, over a longer period.
That is, perhaps your backup can take place in parts; or perhaps you can swap to a temporary database during the backup procedure, etc. It's hard to tell what would work for you, without knowing more.

Related

How to minimize the cost for a sql database

I have a website that need a database to store some user information and a blob storage to save some files.
I want to minimize the cost as much as possible so I played around in Microsoft Azure Pricing Calculator with a Azure SQL Database. For the database I think that over it's hole lifetime 2GB of storage would be enought.
I arrived to 2 options that where dirt cheap but I don't really understand what it gives me.
First is with a serverles computer for 3600 seconds (of runtime?)
Is that time the time that my database is processing the request? For example if I have a select statement that takes 1 sec to complete I'll be left with 3599 sec for that month?
If that's the case what happens if I run out of time?
Second option is using a Hardware Type: Gen 4
but for this one I don't have any other options to configure my needs. Is this obsolete? Can I rely on it for production?
If you need a very cheap one use the Basic or S0.
Keep in mind that Basic are very slow: try to connect to it through SSMS.
Serverless is for databases that you pause for 3/4 of the day. It might be the case for you but keep in mind that when you use them they will cost a lot. I don't think this will be suitable for you.

HA Database configuration that avoids split-brain issues?

I am looking for a (SQL/RDB) database setup that works something like this:
I will have 3+ databases in an active/active/active configuration
prior to doing any insert, the database will communicate with atleast a majority of the others, such that they all either insert at the same time or rollback (transaction)
this way I can write and read from any of the databases, and always get the same results (as long as the field wasn't updated very recently)
note: this is for a use case that will be very read-heavy and have few writes (and delay on the writes is an OK situation)
does anything like this exist? I see all sorts of solutions with database HA configurations, but most of them suggest writing to a primary node or having a passive backup
alternatively I could setup a custom application, and have each application talk to exactly 1 database, and achieve a similar result, but I was hoping something similar would already exist
So my questions is: does something like this exist? if not, are there any technical/architectural reasons why not?
P.S. - I will NOT be using a SAN where all databases can store/access the same data
edit: more clarifications as far as what I am looking for:
1. I have no database picked out yet, but I am more familiar with MySQL / SQL Server / Oracle, so I would have a minor inclination towards on of those
2. If a majority of the nodes are down (or a single node can't communicate with the collective), then I expect all writes from that node to fail, and accept that it may provide old/outdated information
failure / recover scenario expectations:
1. A node goes down: it will query and get updates from the other nodes when it comes back up
2. A node loses connection with the collective: it will provide potentially old data to read request, and refuse any writes
3. A node is in disagreement with the data stores in others: majority rule
4. 4. majority rule does not work: go with whomever has the latest data (although this really shouldn't happen)
5. The entries are insert/update/read only, i.e. there will be no deletes (except manually ofc), so I shouldn't need to worry about an update after a delete, however in that case I would choose to delete the record and ignore the update
6. Any other scenarios I missed?
update: I the closest I see to what I am looking for seems to be using a quorum + 2 DBs, however I would prefer if I could have 3 DBs instead, such that I can query any of them at any time (to further distribute the reads, and also to keep another copy of the data)
You need to define "very recently". In most environments with replication for inserts, all the databases will have the same data within a few seconds of an insert (and a few seconds seems pessimistic).
An alternative approach is a "read-from-one/write-to-all" approach. In this case, reads are spread through the system. Writes are then sent to all nodes by the application (or a common layer that the application uses).
Remember, though, that the issue with database replication is not how it works when it works. The issue is how it recovers when it fails and even how failures are identified. You need to decide what happens when nodes go down, how they recover lost transactions, how you decide that nodes are really synchronized. I would suggest that you peruse the documentation of the database that you are actually using and understand the replication mechanisms provided by that platform.

Why does a simple UPDATE statement sometimes take a few seconds to run?

Our web app automatically emails us when a page execution goes beyond a second or two with timings for running each SQL statement. We track what pages each user is browsing on each page load and this query sometimes takes a couple of seconds to run (we get a number of these automatic emails telling us a page has taken longer than a couple of seconds at the same time).
UPDATE whosonline
SET datetime = GETDATE(),
url = '/user/thepage'
WHERE username = 'companyname\theusername (0123456789)'
Any ideas what could be causing this? Normally it runs in a split second but say every week or so it takes about 2 or 3 seconds for probably a timespan of 10 seconds.
This is a very broad question and there could be a number of reasons:
Is there a pattern to what day/time in the week this happens? Maybe your db machine has just come up
How many users do you have? Are there indexes to the database?
What about the database cache? Is it configured?
How do you know it's a database delay and not a network delay? Have you tried accessing from the local database server and seen if the delays happen there too?
If you have access to SQL Profiler, you might want to run that on the statement to see if anything is happening on the server that might be causing issues. I'd also check the execution path in Management Studio/Query Analyzer if you can as well. Otherwise, if those don't turn up anything it probably is something to do with the web-side of things, not SQL.

FirebirdSQL queries gets stuck at 12:00PM

I'm running Firebird 2.5 (and have also tried earlier versions) on Windows. Every day after 12:00PM running insert/update queries on one specific table hang, but complete successfully by 12:35 or so, no matter when started. It does seem that Firebird is doing some kind of maintenance on the table and it takes half an hour to complete, during which time the table cannot be written to (but the reads are fast). The table itself is really small, some 10000 rows, compared to millions of rows we have in other tables - and other tables do not get stuck.
I haven't been able to find any reason or solution. I tried dumping the table and restoring it, which didn't help, I tried switching between superserver and classic, changed versions with no success.
Has anyone experienced a problem like this?
No. Firebird doesn't have any internal maintenance procedures bind to some specified time of a day. Seems, there is some task on your server scheduled to run at 12:00 PM. Or there are network users of the server who start doing some heavy access at 12:00 PM.
The only maintenance FB does is "garbage collection" (geting rid of old record versions) and this is done on "when needed" basis (usually when records were selected, see the GCPolicy in firebird.conf) not on some predefined time.
Do you experience this hang only on during these certain hours or is it always slow to insert to that table? Have you checked the server load during the slowdown (ie in the task manager, is the CPU maxed out)? Anyway, here is some ideas to check:
What constraints / triggers do you have on the table? If they involve some extensive checks (ie against the other tables which contain millions of rows) this could be the reason inserts take so long.
Perhaps there is some other service which is triggered at that time? Ie do you have a cron job to make backup of the DB at that time? Or perhaps some other system service which runs at that time with higher priority slows down the server?
Do you have trace service active for the table? See fbtrace.conf in FireBird root directory. If it is active, extensive logging might be the cause of slowdown, if it isn't active, using it might help you to find the cause.
What are the setings for ForcedWrites / UnflushedWrites (see firebird.conf)? Does changing them make difference?
Is there something logged for this troublesome timeframe in firebird.log?
To me it looks like you have a process which starts at 12:00 and does something which locks the entire table. Use the monitoring table or the trace manager to see if there is any connection or active transaction which looks suspicious.
I also think your own transaction are started with the WAIT clause without a LOCK TIMEOUT, you might want to change this to NO WAIT or WAIT with a LOCK TIMEOUT, so that your transactions either fail immediately or after the timeout.
My suggestion is to use the TRACE API in 2.5 to track down what is happening near or around that time. That should help get you more information as to what is happening.
I use this for debugging http://upscene.com/products.misc.fbtm.php kinda buggy itself, but when it is working it is a god send.
Are some Client-Connections going DOWN at 12:00 PM? I had a similar problem on a 70.000 records sized table:
Client "A" has a permanently open DB Connection like "select * from TABLE". This is a "read only transaction" but reason enough for the server to generate Record-Versions. Why?
Client "B" made massive Updates to this Table, the Server tries to preserve the world like it was when "A" startet her "select". This is normal for Transaction able DB-Servers, and its implemented by creating Record Copies of the record-data before its updated.
So in my case for this TABLE 170.000 Record Versions existed. You can measure this by
gstat -r -t TABLE db.fdb | grep versions
If Client "B" goes down, the count of Record-Versions is NOT growing any more. Client "A" is the guilty one, freezing all this versions, forces the server to hold it. Finally if Client "A" goes down (or for example a firewall rule cuts all pending connections) Firebird is happy to start the process of getting rid of the now useless Record-Versions.
This "sweep"?! is bad programmed (even 2.5.2) cpu is 3% it do only <10.000 Versions / Minute so this TABLE has a performance of about 2%.

How to kill/resolve a reeeeally long-running update in SQL Server

A colleague of mine (I promise it was a colleague!) has left an update running on our main SQL Server since last Thursday (yes that's right folks, we're pushing 100 hours now!). The SQL in question (in one transaction, I might add) is:
update daily_prices set min_date = (select min(a.date)
from daily_prices a
where a.key = daily_prices.key and
a.iid = daily_prices.iid)
(Yeah I know, heinous...)
The total cost in the query plan is coming out as 22186.7, the estimated number of rows to update is around 151 million.
We obviously need to resolve this query one way or another, we realise that if we are to kill the query we're going to generate some brutal rollback, but we've got no way of knowing how far it has gotten. The only thing we do know is this entry from sys.dm_exec_requests:
session_id status query_text cpu_time total_elapsed_time reads writes logical_reads
52 suspended update daily_prices... 2328469 408947075 13831137 42458588 151809497
So my question is, what would be our best course of action?
wait it out
kill it and roll back, and hope that it rolls back before the next ice age
something else?
I personally would want to wait it out unless I though it had no chance of finishing this week, the roll back at this stage could take far longer than the query has to date. If it's a production server, I really wouldn't take option 2 and kill it unless I absolutely had to.
In terms of regaining some control / working system if you have suitable backups, bring online another database restore the backup / tlog backups, but you will not want to restore to beyond when the transaction was started (or it will still have to roll it back.) This at least gives you a system you could continue dev work against, but unlikely to be the ideal situation for a prod system.
If it's a production server, have some kind words with the individual as to the suitability of testing queries and query plans prior to it being executed. I am sure many DBA's can suggest the less polite methods of instruction :)
So we got fed up with waiting for our transaction to complete, (after a full week on
one piece of SQL, who wouldn't?), and as it was interfering with our backup
process, we thought killing it was a necessary evil.
The database started to rollback the transaction.
5 days passed.
We noted with some posts elsewhere on the internet that sometimes some magic
happened when the database was restarted and the transaction would "go away",
although these are generally debunked*, and it makes no sense, we thought we
had nothing left to lose so we gave it a go. We knew the database would go into
recovery mode, but the database was becoming increasingly sick anyway and unable
to run anything but its current rollback work anyway, and we've seen SQL Server misbehave with hogging system resources and not diverting them to where it needs to do the work.
(* we also know enough database theory to know that the DB wouldn't just "forget"
about a transaction in progress, but we were also seeing stack dumps in the
SQL Server error logs which kind of told us that the SQL Server was getting
increasingly grumpy at the amount of rollback it was having to undertake)
So we restarted the database.
Sure enough the database went into recovery mode. However, the SQL Server event Log
was now giving us an update every 20 seconds or so as to how long it was going to
take (in all, it reckoned about 25 hours from the log messages, but it ended up being
just an hour and a half (!)).
Whether this method of recovery/rollback is faster, I would strongly doubt (as I expect
SQL Server had to do the same level of work to unwind the transaction as before), however it did finish within an hour and a half, either way, I don't want to make a habit of restarting my production database when it is halfway through a rollback). The update messages in the event log were an absolute godsend, as anyone who has written a batch program
will tell you; however inaccurate they turned out to be - at least they were a worst case.
As we had the luxury of being the only two people using this production box, choosing to
send the database into recovery mode worked for us, and gave us informational messages we
didn't have access to with just our previous rollback state (or at least nothing we could
interpret given our lacking DBA skills). Would I recommend doing this in future?
....Absolutely not, however, hopefully the concerned parties have learnt their lesson, and
we can ask the board for some money for a proper development server! (epic Joel-Test fail!)