I am currently analyzing a database and happened to find two datasets whose origin is unknown to me. The problem is that they shouldn't even be in there...
I checked all insertion scripts and I'm sure that these datasets are not inserted. They are also not in the original dumpfile.
The only explanation I can come up with at the moment is, that they are inserted via some procedure call in the scripts.
Is there any way for me to track down the origin of a single dataset?
best regards,
daZza
yes, you can,
assuming your database is running in archivelog mode you can use logminer to find which transactions did the inserts. See Using LogMiner to Analyze Redo Log Files It will take some serious time but that might be worth it.
Related
I have a new idea and question about that I would like to ask you.
We have a CRM application on-premise / in house. We use that application kind of 24X7. We also do billing and payroll on the same CRM database which is OLTP and also same thing with SSRS reports.
It looks like whenever we do operation in front end which does inserts and updates to couple of entities at the same time, our application gets frozen until that process finishes. e.g. extracting payroll for 500 employees for their activities during last 2 weeks. Basically it summarize total working hours pulls that numbers from database and writes/updates that record where it says extract has been accomplished. so for 500 employees we are looking at around 40K-50K rows for Insert/Select/Update statements together.
Nobody can do anything while this process runs! We are considering the following options to take care of this issue.
Running this process in off-hours
OR make a copy of DB of Dyna. CRM and do this operations(extracting thousands of records and running multiple reports) on copy.
My questions are:
how to create first of all copy and where to create it (best practices)?
How to make it synchronize in real-time.
if we do select statement operation in copy DB than it's OK, but if we do any insert/update on copy how to reflect that on actual live db? , in short how to make sure both original and copy DB are synchronize to each other in real time.
I know I asked too many questions, but being SQL person, stepping into CRM team and providing suggestion, you know what I am trying to say.
Thanks folks for your any suggestion in advance.
Well to answer your question in regards to the live "copy" of a database a good solution is an alwayson availability group.
https://blogs.technet.microsoft.com/canitpro/2013/08/19/step-by-step-creating-a-sql-server-2012-alwayson-availability-group/
Though I dont think that is what you are going to want in this situation. Alwayson availability groups are typically for database instances that require very low failure time frames. For example: If the primary DB server goes down in the cluster it fails over to a secondary in a second or two at the most and the end users only notice a slight hiccup for a second.
What I think you would find better is to look at those insert statements that are hitting your database server and seeing why they are preventing you from pulling data. If they are truly locking the table maybe changing a large amount of your reads to "nolock" reads might help remedy your situation.
It would also be helpful to know what kind of resources you have allocated and also if you have proper indexing on the core tables for your DB. If you dont have proper indexing then a lot of the queries can take longer then normal causing the locking your seeing.
Finally I would recommend table partitioning if the tables you are pulling against are to large. This can help with a lot of disk speed issues potentially and also help optimize your querys if you partition by time segment (i.e. make a new partition every X months so when a query pulls from one time segment they only pull from that one data file).
https://msdn.microsoft.com/en-us/library/ms190787.aspx
I would say you need to focus on efficiency more then a "copy database" as your volumes arent very high to be needing anything like that from the sounds of it. I currently have a sql server transaction database running with 10 million+ inserts on it a day and I still have live reports hit against it. You just need the resources and proper indexing to accommodate.
a friend asked me if there is a way to see the past dml statements and I wasn't really sure on how to go about answering that question. What he wants to see is the last set of insert statements. So that means it could be more than 1 record. At first I was just saying to check the latest identity, but then he asked what if more inserts were performed at the same time. Can you guys help me out? Is there a DMV I should use that I just don't know about? Thanks.
If you did not prepare for this question then there is no build in way to get to that information. However, you could use third party log reader tools to recover (all) the last statements that where executed against the database. This requires the database to be in Full recovery mode. You could potentially go back as far as you have log backups with this method.
If you want to prepare for that question being ask in the future, you have several options.
The most obvious one is Change Data Capture. You also could write a trigger yourself that records data changes.
You could also run a trace capturing SQL Batch Started.
Finally you could use a third party network sniffer/logger to capture all statements send to the server (this however requires that connection encryption is not used).
I am working on simple transform SSIS package to import data from one server and load to another server. Only one table is used on each.
I wanted to know, as its just refreshing of data, does the old data in the table needed to be deleted before loading, I needed expert advice about what should I do. Should I truncate the old table or use delete? What other concerns should I keep in mind?
Please give the justification for your answers, it will help to fight technically with my lead.
It depends on what the requirements are.
Do you need to keep track of any changes to the data? If so, truncating the data each time, will not allow you to track the history of your data. A good option in this case is to stage the source data in a separate table/database and load your required data in another structure (with possible history tracking, e.g., a fact table with slowly changing dimensions).
Truncating is the best option to remove data, as it's a minimally logged operation.
I'm looking to apply continuous delivery concepts to web app we are building, and wondering if there any solution to protecting the database from accidental erroneous commit. For example, a bug that erases whole table instead of a single record.
How this issue impact can be limited according to continuous delivery doctorine, where the application deployed gradually over segments of infrastructure?
Any ideas?
Well first you cannot tell just from looking what is a bad SQL statement. You might have wanted to delete the entire contents of the table. Therefore is is not physiucally possible to have an automated tool that detects intent.
So to protect your database, first make sure you are in full recovery (not simple) mode and have full backups nightly and transaction log backups every 15 minutes or so. Now you cannot lose much information no matter how badly the process breaks. Your dbas should be trained to be able to recover to a point in time. If you don't have any dbas, I'd suggest the best thing you can do to protect your data is hire some. This is a non-negotiable in any non-trivial database environment and it is terribly risky not to have trained, experienced dbas if your data is critical to the business.
Next, you need to treat SQL like any other code, it should be in source control in scripts. If you are terribly concerned about accidental deletions, then write the scripts for deletes to copy all deletes to a staging table and delete the content of the staging table once a week or so. Enforce this convention in the code reviews. Or better yet set up an auditing process that runs through triggers. Once all records are audited, it is much easier to get back the 150 accidental deletions without having to restore a database. I would never consider having any enterprise application without auditing.
All SQL scripts without exception should be code-reviewed just like other code. All SQL scripts should be tested on QA and passed before moving to porduction. This will greatly reduce the possiblility for error. No developer should have write rights to production, only dbas should have that. Therefore each script should be written so that is can just be run, not run one chunk at a time where you could accidentally forget to highlight the where clause. Train your developers to use transactions correctly in the scripts as well.
Your concern is bad data happening to the database. The solution is to use full logging of all transactions so you can back out of transactions that you want to. This would usually be used in a context of full backups/incremental backups/full logging.
SQL Server, for instance, allows you to restore to a point in time (http://msdn.microsoft.com/en-us/library/ms190982(v=sql.105).aspx), assuming you have full logging.
If you are creating and dropping tables, this could be an expensive solution, in terms of space needed for the log. However, it might meet your needs for development.
You may find that full-logging is too expensive for such an application. In that case, you might want to make periodic backups (daily? hourly?) and just keep these around. For this purpose, I've found LightSpeed to be a good product for fast and efficient backups.
One of the strategies that is commonly adopted is to log the incremental sql statements rather than a collective schema generation so you can control the change at a much granular levels:
ex:
change 1:
UP:
Add column
DOWN:
Remove column
change 2:
UP:
Add trigger
DOWN:
Remove trigger
Once the changes are incrementally captured like this, you can have a simple but efficient script to upgrade (UP) from any version to any version without having to worry about the changes that happening. When the change # are linked to build, it becomes even more effective. When you deploy a build the database is also automatically upgraded(UP) or downgraded(DOWN) to that specific build.
We have an pipeline app which does that at CloudMunch.
i have a table in an access database
this access database is used on a regular basis, basically from 9-5
someone else has a copy of this exact table. sometimes records are added, sometimes deleted, and sometimes data within the records is updated.
i need to update the access database table with the offsite table every hour or so. what is the best algorithm of updating the data? there are about 5000 records.
would it severely lock up the table for a few seconds every hour?
i would like to publicly apologize for my rude comment to david fenton
My impression is that this question ties together pieces you've been exploring with your previous questions:
a file "listener" to detect the presence of a new file and do something with it when found
list files with some extension in a folder
DoCmd.TransferText to pull file data into your database
Insert, Update, Delete records in a table based on an imported set of records
Maybe it's time to give us a more detailed picture of what you're dealing with.
Tony asked if both sites are on the same WAN (Wide Area Network). You replied they are on Windows. Elsewhere you said you're using a network. Please tell us about the network.
I'm still unsure whether you need a one-way or two-way data exchange. You've talked about importing changes from the remote table into the local master table. Do you need to do the same type of operation at the remote site: import changes made to the table at the master site?
Tell us what needs to happen regarding the issue James raised. Can local and remote users ever edit the same record? If they can, how will you resolve the conflict? Similarly, what should happen if a remote user updates a record and a local user deletes their copy of that record?
Based on what you've told us so far, this sounds like a real challenge for Access, made more challenging by the rate of record changes (5,000 per hour). I like the outline Kevin suggested. However your challenge will be more complicated since you also need to account for record deletions at both sites.
It seems like you may have to create something which duplicates Access' Replication feature. Maybe you should look at the Jet Replication Wiki to see if you can modify your design to take advantage of Replication. I can't help you there, and unfortunately you appear to have frustrated David Fenton who is a leading authority on Jet Replication.
If a few seconds performance is critical, you'd rather move to a better database engine (like Sqlite, MySQL, MS SQL server). If you want a single file, then Sqlite is the best for you. All these use by-single-record locks, so you can read and write simultaneously.
If you stay with access, you will probably have to implement a timer to update only a few records at a time.
Before you do anything else you need to establish the "rules" as far as collisions go.
If a row in the local copy is updated and the same row in the remote copy is updated which one is the "correct" version? Ditto for deletions, inserts are even more of a pain as you can have the "same" set of values but perhaps a different key.
After you have worked out how to handle each of these cases you can then go on to thinking about the implementation.
As other posters have suggested the way to completely avoid these issues is switch to SQLServer or any other "proper" database which can be updated over the network by all users and where concurrency issues are handled by the DBMS when the updates are applied.
Other users have already suggested switching to a server based database i.e. SQL server etc. I would echo this and say it is the best way to go however if you are stuck with access and have no choice then I would suggest you add a field (with an index) along the lines of “Last Updated”. You could then export all records that have been modified within a particular time frame. Export this file as a CSV, ship it over to the remote site and import it into the “master” access database. With a bit of scripting you could automate this process.