We have a process in which several site servers send data to a central server (through a Linked Server). A new site has seen the job duration more than double in three weeks, and a couple of the other sites often fail due to run time overlap.
It is a two step process:
Insert new records
Update changed records
The insert only takes a few seconds, but the update takes anywhere from 5 to 20 minutes, depending on the site. I am able to change the query that drives the update and get it down to only a couple seconds, but still when put into an UPDATE statement it takes several minutes.
I am leaning towards moving the jobs to a single job on the central server, so it is a pull operation which, based on the testing I have done, should be much faster. However, my question is: What is considered "best practice" in this situation? I am going to have to change quite a bit to get this working properly, so I might as well do it right.
Related
I'm refactoring my current schema and it's too abstract for me.
I monitor my servers with a homemade monitoring software. This software sends HTTP requests to a Rails web server with about ten different fields worth of information so I can get a quick overview of everything.
My current implementation:
server [id, name, created_date, edited_date, ..., etc ]
status_update [id, server_id, field1, field2, field3, created_date, edited_date, ..., etc]
I treat the servers as Users and status updates as Tweets. I delete any status_update on a server_id older than the tenth one just to keep from growing to infinity.
Though I'm starting to run into a few complications. I need to display information from the most recent status_update on the index page, I need to sort the servers based on status_update info, I need to store info from certain status_updates that may be way older than 10 status_updates old. It also seems like I'm going to start needing to store information from status_updates in both the server and status_update, which would cause hitting the DB multiple times on an insert. Thus, I am looking to refactor.
My requirements:
I only need to display information from the most recent update.
Having the next 9 status_updates helps debug if the system goes offline.
I need to be able to sort based on some info from most recent status_update.
I need the database to remain small (Heroku free).
Ideal performance, IE not hitting database more than once unless necessary.
Non-Complicated DB structure so I can pass it along.
Edit: Additional Info => I am looking to ultimately monitor about 150-200 servers (a lot for a hobby dev, but I'm cheap). Each monitoring service posts every five minutes or so unless something goes wrong. So, worst case scenario has me reaching max capacity every four hours.
I was thinking it would be nice to track when the last time X event happened, and what the result was. Thus, tracking that information would have to be moved to the server model itself since I'm wiping out old records and would lose the information after an hour or so. Though in retrospect, I could just save that info in memory using the monitoring service and send it up every five minutes or only once each time it changes. I could also simply edit that information only when it changes, so as to process less information on each request. Hm!
Efficiency
All ORMs, including ActiveRecord, are designed and built around certain tradeoffs. It's commonplace for ORMs to use several simple SELECT statements to do what a SQL developer would do with a single SELECT statement. You're probably not overwhelming Heroku with your queries.
There's no reasonable structural solution to this problem.
Size
Your "status_update" table should be able to hold an enormous number of rows. Heroku's hobby-dev plan allows 10,000 rows. How many servers do you seriously expect to monitor on a free plan? If I were you, I would delete old rows from it no more than once a day, or when I got a permission error. (On Heroku, certain permission errors mean you're over the row limit.)
It also seems like I'm going to start needing to store information
from status_updates in both the server and status_update, which would
cause hitting the DB multiple times on an insert.
This really makes little sense. Tweets don't require updates to the user account; status updates don't require updates about the server. This might suggest refactoring is in order, but I'd want to see either your models or your CREATE TABLE statements to be sure. (You can paste those into your question, and leave a comment here.)
Alternatives
I'd seriously consider running this Rails app on a local machine, writing data to a database on the local machine, especially if you intend to target 200 web servers. This would eliminate all Heroku row limits, and you don't really need to run it 24 hours a day if this is just a hobby. If you're doing this professionally, your income from it should easily cover the cost of a hobby-basic plan on Heroku. (Currently $9.00/month.) But even then I'd think hard about hosting this locally.
I work for a fleet tracking company and this question is specifically about how I plan to do reports. Let me explain our environment. We have 1x Database, 1x Load Distributing process, and 3x Report Processing servers (let's assume these are equal in every way). When a customer requests a report, all the parameters of that report go in the database. I'm currently working on a load distributing app that will take pending reports from the database and delegate them to the 3 report processing servers that build and email the reports. When a server finishes a report (or an error arises), it notifies the load distributing app. Reports can come in all sizes, from 1 days worth of GPS data for 1 vehicles to 3 months of GPS data for hundreds of vehicles.
I can think of a few ways to do the load balancing but I'm not quite happy with them. I could have each server only do 5 reports at most, but 1 server might get 5 small reports while another gets 5 large reports. I could do a "Round Robin" approach and just hand out the reports sequentially across the servers, but this still doesn't protect against overloading any of the servers.
The best idea I think I have right now is to keep a count of how much GPS data is needed by each report (an easy task to do) and as I assign reports to each server I keep a running total for each server. When a server finishes a report (and notifies the load balancer), subtract that report's amount of GPS data from the running total for that server. This way, I could assign the next report to the server with the smallest amount of GPS data to work with. I could also set a max so that a server cannot get over worked (the problem that is causing us to refactor our whole reports process to begin with). If there are more reports when all servers hit their max, it can just queue them up and attempt them later when the servers finish a few of their reports.
I'm not convinced it's the best approach for finishing reports as quickly as possible. These are just the best I have come up with so far.
How can I optimize my approach to load balancing reports of different sizes across multiple servers?
Assuming that you have only one major table which you select data from, then I would configure one server to do all the big reports first and leave the other two to do smallest to largest. Otherwise big reports might never get done.
For the smaller reports, you want to try, in the absence of anything better, to have them try and run 'similar' reports, meaning those that cluster around similar values in the index mainly used. For example if a server has just completed a report for June 2011, then the next best report to run is same period, not jumping to November 2012. This is dependent on the actual table though, but I am presuming you have lots of date ordered data comprising the bulk of the selection. All you are really trying to do is group reports that are likely to reuse cached indexes/etc as this should give best throughput.
I have a similar scheduling problem, and any queries that are directed to major tables go one server (slow queue) and anything else goes to another ( fast queue), with some exceptions for special cases.
I have SQL job that runs every night which does various inserts/updates/deletes. The job contains 40 steps which mainly execute stored procedures.
It's been running fine up until a week ago when suddenly the run time went up from 2.5 hours to over 5 hours, sometimes even 8,9,10!
Could one you please give me any pointers?
First of all let me recommend you a valuable resource on Simple-Talk site. Is a detailed methodology of how to troubleshoot performance issues on SQL Server.
Does the insert you say was carried out was a huge bulk insert that could affect performance? Maybe if it was a huge load the query execution plans could be different and you need to re-tune your table structure, indexes, etc.
If the run time suddenlychanged and no changes where done in the queries or your database structure then I would ask myself several questions:
first, does the process is still taking so long or it was only one time it ran so slow? maybe now is running smoothly and the issue only arised once. Nevertheless, try to find what triggered that bad performance, it can happend again and take down your server
is the server a dedicated sql server? if not, check if some new tasks unrelated to the SQL engine had been configured, maybe a new tasks is doing some heavy I/O jobs and therefore your CRUD operations take longer
if it is a dedicated server, then check that no new job has been added and can take down your existing jobs. Check this SO link for details on jobs settled up from the SQL Agent
maybe low memory due to another process on same server?
And there is lot more to check, but before going deeper I would check that no external (non sql server related) was the reason of the delay on the process execution.
Our web app automatically emails us when a page execution goes beyond a second or two with timings for running each SQL statement. We track what pages each user is browsing on each page load and this query sometimes takes a couple of seconds to run (we get a number of these automatic emails telling us a page has taken longer than a couple of seconds at the same time).
UPDATE whosonline
SET datetime = GETDATE(),
url = '/user/thepage'
WHERE username = 'companyname\theusername (0123456789)'
Any ideas what could be causing this? Normally it runs in a split second but say every week or so it takes about 2 or 3 seconds for probably a timespan of 10 seconds.
This is a very broad question and there could be a number of reasons:
Is there a pattern to what day/time in the week this happens? Maybe your db machine has just come up
How many users do you have? Are there indexes to the database?
What about the database cache? Is it configured?
How do you know it's a database delay and not a network delay? Have you tried accessing from the local database server and seen if the delays happen there too?
If you have access to SQL Profiler, you might want to run that on the statement to see if anything is happening on the server that might be causing issues. I'd also check the execution path in Management Studio/Query Analyzer if you can as well. Otherwise, if those don't turn up anything it probably is something to do with the web-side of things, not SQL.
I'm running Firebird 2.5 (and have also tried earlier versions) on Windows. Every day after 12:00PM running insert/update queries on one specific table hang, but complete successfully by 12:35 or so, no matter when started. It does seem that Firebird is doing some kind of maintenance on the table and it takes half an hour to complete, during which time the table cannot be written to (but the reads are fast). The table itself is really small, some 10000 rows, compared to millions of rows we have in other tables - and other tables do not get stuck.
I haven't been able to find any reason or solution. I tried dumping the table and restoring it, which didn't help, I tried switching between superserver and classic, changed versions with no success.
Has anyone experienced a problem like this?
No. Firebird doesn't have any internal maintenance procedures bind to some specified time of a day. Seems, there is some task on your server scheduled to run at 12:00 PM. Or there are network users of the server who start doing some heavy access at 12:00 PM.
The only maintenance FB does is "garbage collection" (geting rid of old record versions) and this is done on "when needed" basis (usually when records were selected, see the GCPolicy in firebird.conf) not on some predefined time.
Do you experience this hang only on during these certain hours or is it always slow to insert to that table? Have you checked the server load during the slowdown (ie in the task manager, is the CPU maxed out)? Anyway, here is some ideas to check:
What constraints / triggers do you have on the table? If they involve some extensive checks (ie against the other tables which contain millions of rows) this could be the reason inserts take so long.
Perhaps there is some other service which is triggered at that time? Ie do you have a cron job to make backup of the DB at that time? Or perhaps some other system service which runs at that time with higher priority slows down the server?
Do you have trace service active for the table? See fbtrace.conf in FireBird root directory. If it is active, extensive logging might be the cause of slowdown, if it isn't active, using it might help you to find the cause.
What are the setings for ForcedWrites / UnflushedWrites (see firebird.conf)? Does changing them make difference?
Is there something logged for this troublesome timeframe in firebird.log?
To me it looks like you have a process which starts at 12:00 and does something which locks the entire table. Use the monitoring table or the trace manager to see if there is any connection or active transaction which looks suspicious.
I also think your own transaction are started with the WAIT clause without a LOCK TIMEOUT, you might want to change this to NO WAIT or WAIT with a LOCK TIMEOUT, so that your transactions either fail immediately or after the timeout.
My suggestion is to use the TRACE API in 2.5 to track down what is happening near or around that time. That should help get you more information as to what is happening.
I use this for debugging http://upscene.com/products.misc.fbtm.php kinda buggy itself, but when it is working it is a god send.
Are some Client-Connections going DOWN at 12:00 PM? I had a similar problem on a 70.000 records sized table:
Client "A" has a permanently open DB Connection like "select * from TABLE". This is a "read only transaction" but reason enough for the server to generate Record-Versions. Why?
Client "B" made massive Updates to this Table, the Server tries to preserve the world like it was when "A" startet her "select". This is normal for Transaction able DB-Servers, and its implemented by creating Record Copies of the record-data before its updated.
So in my case for this TABLE 170.000 Record Versions existed. You can measure this by
gstat -r -t TABLE db.fdb | grep versions
If Client "B" goes down, the count of Record-Versions is NOT growing any more. Client "A" is the guilty one, freezing all this versions, forces the server to hold it. Finally if Client "A" goes down (or for example a firewall rule cuts all pending connections) Firebird is happy to start the process of getting rid of the now useless Record-Versions.
This "sweep"?! is bad programmed (even 2.5.2) cpu is 3% it do only <10.000 Versions / Minute so this TABLE has a performance of about 2%.