I accidentally deleted records from a Virtuoso graph DB. Does anyone know how to undo the delete and restore the data?
In the simple default case, you cannot undo a delete such as you describe.
If you were taking backups, and have one from the right point in time, you can restore that.
If you were retaining transaction logs or other audit trail, you can replay these.
More details about your specific situation may change some of the above answers, but this is unlikely.
(ObDisclaimer: OpenLink Software produces Virtuoso, and employs me.)
Related
I am developing a brand new product at work that is supposed to go live soon. It's predominantly an ETL product that will be dealing with enormous volumes of data in a queue type operation. So records come in, we do work on them, and then they are picked up and sent back out. So everything that's being done in this system is done over and over again until all the records have completed processing.
I'm being told by my boss (who is open and reasonable, this isn't a demand) that I should add NOLOCK hints to all the queries.
I'm torn on this because I've always read that it's bad practice. I've also read about the READPAST hint and I'm thinking that might be a good alternative, but I want to get others' opinions as no one in my organization has used it before.
My understanding is that READPAST would pick up any records that aren't locked, and would just ignore locked records, which in some systems I could see being a problem. In this system where the job will run again a few minutes later and will likely pick up the previously locked record, I don't see that it would matter. It's not super time-sensitive, so if a record takes a few minutes longer because it's locked, that's acceptable.
Wondering what others thoughts on this are? I realize this is not a replacement for proper indexing and I'm working on that as well.
I need to synchronize tables between 2 databases daily, the source is MSSQL 2008, the target is MSSQL 2005. If I use UPDATE, INSERT, and DELETE statements (i.e. UPDATE rows that changed, INSERT new rows, DELETE rows no longer present), will there be performance improvements if I perform the DELETE statement first? i.e. so that the UPDATE statement doesn't look at rows that don't need to be updated, because they will be deleted.
Here are some other things I need to consider. The tables have 1-3 million+ rows, and because of the amount of transactions and business requirements, the source DB needs to remain online, and the query needs to be as efficient as possible. The job will be run daily in a SQL server agent job on the target DB. On top of that, I am a DB rookie!
Thanks StackOverflow community, you are awesome!
I'd say, first you do delete, then update then insert, so you don't have to update rows which will be deleted anyway and you'll not update rows which are just inserted.
But actually, have you seen SQL Server merge syntax? It could save you a great amount of code.
update I have not checked performance of MERGE statement against INSERT/UPDATE/DELETE, here's related link given by Aaron Bertrand for more details.
Rule of Thumb: DELETE, then UPDATE, then INSERT.
Performance aside, my main concern is to avoid any potential Deadlocks when:
Updating something you will immediately Delete.
Inserting something you may immediately try to Update.
If you only modify what is necessary and use transactions correctly, then you may use any order.
P.S. Someone suggested using MERGE - I've tried it a few times and my preference is to never use it.
I think Roman's answer is what you were looking for in your current situation: DELETE, UPDATE, INSERT (or MERGE.)
Now there are other possible routes which can make things even faster, but with a rather different process:
1. Consider saving all orders in a file that you, once in a while, run against the target
Assuming both databases are exactly the same, for each SQL order that modifies the 2008 database, save that order in a .sql file which you later execute against the 2005 database. You have to consider locking the file while writing to it, and maybe have some kind of redundancy. However, this means you need no access to the 2008 database at all while doing the work on the 2005 database. In other words, no side effects to the 2008 database speed.
Pitfall: you may miss a statement and the destination will not be an exact equivalent...
2. Ongoing replication
I do not know about MSSQL enough to tell you of a good tool to do automatic replication (see here: http://technet.microsoft.com/en-us/library/ms151198.aspx), but I'd bet you can find a good tool. MySQL (http://dev.mysql.com/doc/refman/5.0/en/replication.html) and PostgreSQL (http://wiki.postgresql.org/wiki/Streaming_Replication) have such tools and those are all free.
This would be the solution I would choose. Depending on the tool you use, it can be really very well optimized meaning that the impact on the live system will be minimal and the 2005 duplicate will be up to date within seconds (depending on whether it is a long distance remote connection or not, the amount of work, the setup of each server, the Internet connections, etc.)
The pitfall is obviously that it adds an ongoing process on the database, but if you find an MSSQL tool that works like the streaming replication of PostgreSQL, it makes use of a copy of the journal which means it is dead fast (no heavy use of disk I/O.)
3. Cluster Database (like Cassandra)
This would involve a change of database which I'm totally sure you're not ready to do (especially because most of those systems do not offer SQL,) but I thought that it would be a good thing to talk about in your situation.
A system like Cassandra (http://cassandra.apache.org/) automatically replicate its data on many computers. It can actually be setup to replicate all the data 100% or X% of data per computer with redundancy in case of failure (a computer that breaks down). This alleviates the need for a specific copy on a separate computer because the performance can be increased simply by adding a few nodes to your system. (At less than $1,000 a computer, it is worth it! Frankly you could create a Peta Byte system for $50k or less and end up with something a lot faster than any SQL database...)
The main problem is that the use of those clusters is completely different than SQL. But that could be a solution for big businesses having large databases that need to be really fast and they do not want to invest in a mini-computer (think Cobol and $250k computers that manage 100 million rows in a few milli-seconds...)
With Cassandra you can run extremely heavy batch processes on back end computers that do not make a dent to the front end system!
We all know about techniques to prevent db deadlocks - acquire locks in the same order, etc. But at some point, systems under pressure may simply suffer from deadlocks here and there. Should we simply accept that and always be prepared to retry when a deadlock occurs or should deadlocks be considered absolutely verboten and should we do everything in our power to prevent them?
The answer is yes.
You should do everything in your power to prevent them, but are you ever going to be satisfied that you've made them impossible?
Do everything in your power to prevent them, and be prepared to retry when they occur. :)
Keep in mind that "doing everything in your power" can mean things like queueing batch updates, making inserts into temp tables and then merging those into the main tables later and other non-trivial techniques. Be sure to check your transaction isolation level and your lock escalation policy.
This will probably be closed, but the world is trending to NoSQL solutions to this problem, breaking problems up so that guaranteed consistency isn't required from the datasource meaning that locks aren't required.
Facebook would be a good example of this, it doesn't matter when everyone sees your update, or if different users around the world see different versions of your profile. As long as the update works or eventually fails, that is good enough.
In essence this is a followup of this question. I'm beginning to feel that I should give up the whole idea, but I'll give it one more shot.
What I want is pretty much like a DB transaction. It should track my changes to the DB and then in the end allow me to either commit or rollback them. If I insert an object, I should get it back in my next (appropriate) SELECT query. If I delete it, future SELECT queries should not return it. Etc.
But there is one catch - this transaction would be very long running. It would start when the user opened a form (I'm talking about Windows Forms here), and the commit/rollback would be when the user closed it(with OK/Cancel). So it could take anywhere between seconds and days. This requirement rules out a standard DB transaction because that would lock the tables/rows it touched, and other users wouldn't be able to use the system. Also the transaction should not commit ANY changes to the DB until it was really committed. So if one user makes some changes, others don't see them until OK button is hit. This prevents errors in case the computer crashes or is disconnected from the network.
I'm quite OK if the solution puts constraints on my model (I'm using MSSQL 2008, btw). I can design the DB/code any way I like. I'm also fine with the idea that a commit could fail because someone already modified one of the objects my transaction touched.
Is there anything like this? I looked at NHibernate.Burrow, but I'm not sure that that's the thing I want.
Added: It's the very beginning of the project so I'm not tied to NHibernate. I started out with it but I can still change easily.
As far as I can judge, DataObjects.Net supports exactly this concept via DisconnectedState. The feature is very new (released just few weeks ago), its preliminary documentation is here. WPF sample for DataObjects.Net uses it for UI transactions.
I'm not sure if it is mentioned there, but DisconnectedState, as well as its OperationLog can be serialized. So its cached state can survive even application restarts.
I don't think anyone will implement this in the NHibernate core, because nobody will use it. Viewmodel is not the same model as domain model.
This is not a direct answer to your question, but this is the sort of thing that WWF (gotta love the name) was set out to solve (not that it did so at least by v 3.5).
If you're still following this, Ayende Rahien has an article in MSDN magazine http://msdn.microsoft.com/en-us/magazine/ee819139.aspx about the session per form/presenter approach. Also take a look at chapter 5 of the NHibernate book http://manning.com/kuate/ (sample chapter available), the one on transactions and conversations.
As long as you delay the flush/transaction till the ok button is pressed, it should work (depending on the flush mode). But complete isolation is a difficult ask because your session will be able to access data that has been committed by other sessions when dealing with multiple entities. You will have to think about handling such issues.
As an aside, how would you deal with this situation if you don't use NHibernate?
EclipseLink has limited supported for such a beast. They call it "Conforming" and they implemented it in the "unit of work" context.
A DBA that my company hired to troubleshoot deadlock issues just told me that our OLTP databases locking problems will improve if we set the transaction level to READ COMMITTED from READ UNCOMMITTED.
Isn't that just 100% false? READ COMMITTED will cause more locks, correct?
More Details:
Our data is very "siloed" and user specific. 99.9999999 % of all user interactions work with your own data and our dirty read scenarios, if they happen, can barely effect what the user is trying to do.
Thanks for all the answers, the dba in question ended up being useless, and we fixed the locking issues by adding a single index.
I regret that I didn't specify the locking problems were occurring for update statements and not regular selects. From my googing the two different query types have distinct solutions when dealing with locking issues.
That does sound like a bit of a rash decision, however without all the details of your environment it is difficult to say.
You should advise your DBA to consider the use of SQL Server's advanced isolation features, i.e. the use of Row Versioning techniques. This was introduced to SQL Server 2005 to specifically address issues with OLTP database that experience high locking.
The following white paper contains quite complicated subject matter but it is a must read for all exceptional DBA's. It includes example of how to use each of the additional isolation levels, in different types of environments i.e. OLTP, Offloaded Reporting Environment etc.
http://msdn.microsoft.com/en-us/library/ms345124.aspx
In summary it would be both foolish and rash to modify the transaction isolation for all of your T-SQL queries without first developing a solid understanding of how the excessive locking is occuring within your environment.
I hope this helps but please let me know if you require further clarification.
Cheers!
Doesn't it depend on what your problem is: for example if your problem is a deadlock, mightn't an increase in the locking level cause an earlier acquisition of locks and therefore a decreased possibility of deadly embrace?
It the data is siloed and you are still getting deadlocks then you may simply need to add rowlock hints to the queries that are causing the problem so that the locks are taken at the row level and not the page level (which is the default).
READ UNCOMMITTED will reduce the number of locks if you are locking data because of SELECT statements. If you are locking data because of INSERT, UPDATE and DELETE statements then changing the isolation level to READ UNCOMMITTED won't do anything for you. READ UNCOMMITTED has the same effect as adding WITH (NOLOCK) to your queries.
This sounds scary. Do you really want to just change these parameters to avoid deadlocks? Maybe the data needs to be locked?
That said, it could be that the DBA is referring to the new (as of SQL Server 2005) READ COMMITTED SNAPSHOT that uses row versioning and can eliminate some kinds of deadlocks.
http://www.databasejournal.com/features/mssql/article.php/3566746/Controlling-Transactions-and-Locks-Part-5-SQL-2005-Snapshots.htm