I have an application that connects with a SQL Server database and cycles through batches of records to perform various tasks and then updates the database accordingly (i.e. "success", "error", etc...).
The potential problem I'm running into is that since it takes roughly a minute or so to get through each record (long story), if I have more than one user running the application there's a high chance of "data collisions" or users trying to process the same records at the same time. Which cannot happen if it is to execute properly.
Initially, I thought about adding a LOCKED column to help the application determine if the record was already opened by another user, however if the app were to crash or to be exited without completing the record it was currently on, then it would show that record as opened by another user indefinitely... right? Or am I missing an easy solution here?
Anyway, what would be ideal is if it were possible to have the application SELECT 100 records at a time, and "lock them out" on the database while the application processes them AND so that other users can run the application and SELECT a different set of 100 so as not to overlap. Is that possible? I've tried to do some research on the matter, but to be honest my experience in SQL Server is very limited. Thanks for any and all help!
Related
Given an SQL table with timestamped records. Every once in a while an application App0 does something like foreach record in since(certainTimestamp) do process(record); commitOffset(record.timestamp), i.e. periodically it consumes a batch of "fresh" data, processes it sequentially and commits success after each record and then just sleeps for reasonable time (to accumulate yet another batch). That works perfect with single instance.. however how to load balance multiple ones?
In exactly the same environment App0 and App1 concurrently competite for the fresh data. The idea is that ready query executed by the App0 must not overlay with the same read query executed by the App1 - such that they never try to process the same item. In other words, I need SQL-based guarantees that concurrent read queries return different data. Is that even possible?
P.S. Postgres is preferred option.
The problem description is rather vague on what App1 should do while App0 is processing the previously selected records.
In this answer, I make the following assumptions:
all Apps somehow know what the last certainTimestamp is and it is the same for all Apps whenever they start a DB query.
while App0 is processing, say the 10 records it found when it started working, new records come in. That means, the pile of new records with respect to certainTimestamp grows.
when App1 (or any further App) starts, the should process only those new records with respect to certainTimestamp that are not yet being handled by other Apps.
yet, if on App fails/crashes, the unfinished records should be picked the next time another App runs.
This can be achieved by locking records in many SQL databases.
One way to go about this is to use
SELECT ... FOR UPDATE SKIP LOCKED
This statement, in combination with the range-selection since(certainTimestamp) selects and locks all records matching the condition and not being locked currently.
Whenever a new App instance runs this query, it only gets "what's left" to do and can work on that.
This solves the problem of "overlay" or working on the same data.
What's left is then the definition and update of the certainTimestamp.
In order to keep this answer short, I don't go into that here and just leave the pointer to the OP that this needs to be thought through properly to avoid situations where e.g. a single record that cannot be processed for some reason keeps the certainTimestamp at a permanent minimum.
Background:
Two nights ago the old-as-hell and very poorly designed website for the company I work for got attacked by a bot that submitted about 5000+ phony orders. In the course of deleting all of those false orders from the database, SQL Management Studio crashed, and the application had to be stopped via task manager and restarted. After that I was getting optimistic concurrence control errors when trying to delete some of the fake records, and had to complete the cleanup via DELETE statement.
(yes, I KNOW it's generally bad practice to delete records from the results pane, but for people like me who aren't actually programmers but get stuck with the IT work because we're the only ones who know how to find the on switch, it makes me less paranoid that I won't delete a record I didn't mean to)
Ever since then, there is a specific page in the admin section of the site that takes a VERY long time to perform a SELECT query for a specific range. The query will complete if you sit there long enough, but here's a screenshot of the ColdFusion error box that comes up with it:
ColdFusion error message
I suspect that between the bot attack and Studio Express crashing in the middle of an DELETE query, part of the table is corrupted, which is why it exceeds the allowable time limit. I don't know if our webhost has a backup of the database (I've been in contact with them the last couple days).
What tools can I use to check for and repair errors on that table?
I starting at a new company and am a web applications developer. So I am not a DB Guru, but have worked with it my entire career obviously being a developer. The company keeps randomly getting a locking issues, that then holds up the internal application an locks everyone. They then have to kill a service to correct the issue, and free up SQL. Now, I am wondering if when a query occurs, if it is a long query and the user thinks the system froze, so they close out the app, will that freeze up SQL? Basically, if the lock query is in the middle of a query, and the application who called the query just does a hard exit, does SQL then dispose that thread, or does it freeze it, in return hanging up next threads that are to run after that lock completes?
I have done in prior companies where if a table is currently locked, we alert the end user so they are aware of this and do not think it is an error. Wondering if we should do the same here?
Thanks
I've got an app at work I support that uses a SQL Server 2008 DB (vendor created/supported app). One of the things this app does is load records into ETL tables in the DB all day to be moved to a data warehouse.
Unfortunately, the app is having lots of problems with the ETL tables right now and the vendor has no monitoring solution. I have no accesses to the DB to add a stored procedure or anything, but I can run a count * on the ETL tables to see if things are getting out of hand.
I have managed to write a VB.NET app that will return the COUNT of rows in these ETL tables so I can keep an eye on things, but it will only return the counts if I fire a button event.
I've never written an app that runs/updates "in real time" before, and I'm looking for some guidance on how I can create an app that would update these COUNT values in as close to real time as possible.
Any guidance would be greatly appreciated!
You could achieve that by writing a Console application, since you seem used to .Net.
The console application runs and you can read the values by using console.writeline() and console.readline() in your program.cs. Or you could update the record counts in a table or send an email.
When you say real time, the console application can be scheduled to run - e.g. through creating a task in task scheduler or sql agent, or it can be run by launching the exe. A rough example is that, you could send yourself an email every 10 minutes by creating a task that launches the console ap every 10 minutes.
If you're using a Windows Forms app, just add in a Timer object that fires the SQL query off. As an added bonus, you could include fields on the form to control how often the timer fires to get the resolution that's right for you.
You can use the Timer control in Console apps too, of course.
I'm running Firebird 2.5 (and have also tried earlier versions) on Windows. Every day after 12:00PM running insert/update queries on one specific table hang, but complete successfully by 12:35 or so, no matter when started. It does seem that Firebird is doing some kind of maintenance on the table and it takes half an hour to complete, during which time the table cannot be written to (but the reads are fast). The table itself is really small, some 10000 rows, compared to millions of rows we have in other tables - and other tables do not get stuck.
I haven't been able to find any reason or solution. I tried dumping the table and restoring it, which didn't help, I tried switching between superserver and classic, changed versions with no success.
Has anyone experienced a problem like this?
No. Firebird doesn't have any internal maintenance procedures bind to some specified time of a day. Seems, there is some task on your server scheduled to run at 12:00 PM. Or there are network users of the server who start doing some heavy access at 12:00 PM.
The only maintenance FB does is "garbage collection" (geting rid of old record versions) and this is done on "when needed" basis (usually when records were selected, see the GCPolicy in firebird.conf) not on some predefined time.
Do you experience this hang only on during these certain hours or is it always slow to insert to that table? Have you checked the server load during the slowdown (ie in the task manager, is the CPU maxed out)? Anyway, here is some ideas to check:
What constraints / triggers do you have on the table? If they involve some extensive checks (ie against the other tables which contain millions of rows) this could be the reason inserts take so long.
Perhaps there is some other service which is triggered at that time? Ie do you have a cron job to make backup of the DB at that time? Or perhaps some other system service which runs at that time with higher priority slows down the server?
Do you have trace service active for the table? See fbtrace.conf in FireBird root directory. If it is active, extensive logging might be the cause of slowdown, if it isn't active, using it might help you to find the cause.
What are the setings for ForcedWrites / UnflushedWrites (see firebird.conf)? Does changing them make difference?
Is there something logged for this troublesome timeframe in firebird.log?
To me it looks like you have a process which starts at 12:00 and does something which locks the entire table. Use the monitoring table or the trace manager to see if there is any connection or active transaction which looks suspicious.
I also think your own transaction are started with the WAIT clause without a LOCK TIMEOUT, you might want to change this to NO WAIT or WAIT with a LOCK TIMEOUT, so that your transactions either fail immediately or after the timeout.
My suggestion is to use the TRACE API in 2.5 to track down what is happening near or around that time. That should help get you more information as to what is happening.
I use this for debugging http://upscene.com/products.misc.fbtm.php kinda buggy itself, but when it is working it is a god send.
Are some Client-Connections going DOWN at 12:00 PM? I had a similar problem on a 70.000 records sized table:
Client "A" has a permanently open DB Connection like "select * from TABLE". This is a "read only transaction" but reason enough for the server to generate Record-Versions. Why?
Client "B" made massive Updates to this Table, the Server tries to preserve the world like it was when "A" startet her "select". This is normal for Transaction able DB-Servers, and its implemented by creating Record Copies of the record-data before its updated.
So in my case for this TABLE 170.000 Record Versions existed. You can measure this by
gstat -r -t TABLE db.fdb | grep versions
If Client "B" goes down, the count of Record-Versions is NOT growing any more. Client "A" is the guilty one, freezing all this versions, forces the server to hold it. Finally if Client "A" goes down (or for example a firewall rule cuts all pending connections) Firebird is happy to start the process of getting rid of the now useless Record-Versions.
This "sweep"?! is bad programmed (even 2.5.2) cpu is 3% it do only <10.000 Versions / Minute so this TABLE has a performance of about 2%.