Alternative to indexed view on text column - sql

I have a database table I don't own or control being loaded from another system - and all the fields are Text - so obviously it's useless for any queries performance-wise.
I thought to solve the problem by creating an indexed view, which just converts every field to int, date or varchar... But apparently you can't create an indexed view on a text field.
I know I can do a create table as select... but that's a once off, and it won't automatically update if someone does another load into the underlying table.
Is there any way I can make a live table without text columns from one with text columns?

You don't own it or control it, so I guess a trigger is out of the question. I might give Change Tracking a try. You can use it to either sync changes as they come or trigger a reload of your version of the table. If you can't tolerate any delays in the sync, then this might not work for you.
If the updates come in large batches or via a complete reload only once in a while, then triggering a reload might be the way to go. Verify that there are no changes for a minute or so to insure the data is stable before reloading. A job scheduled to run every few minutes could handle the reload.
If doing a faster sync, then a job running a script using a loop and waitfor (seconds or minutes) could process new changes since the last run or loop.
There should be very little overhead for detecting the changes.

Related

Change Tracking w/o an Application to Sync with

I work as part of a two man DBA team running SQL Server 2008 R2 with me being somewhat of an accidental DBA. We recently had an issue where a small table we hardly ever use ended up getting truncated. Both of us swear we didn't do it, but it happened nonetheless.
To avoid the situation in the future, we're interested in implementing change tracking. It's not really necessary for us to preserve the data that was changed so we decided against using change data capture.
With that said, the things I'm reading about change tracking seem to be more about using it to synchronize data with an application rather than simply recording all the changes. Can I use change tracking to simply keep a list of all the changes made in the last 6 months or something? Once I enable it for each database in the SQL Server GUI, where is the info stored? Any other info you may have on implementing this correctly would be great.
Thanks!
From the documentation for change tracking:
The values of the primary key column is only information from the
tracked table that is recorded with the change information. These
values identify the rows that have been changed. To obtain the latest
data for those rows, an application can use the primary key column
values to join the source table with the tracked table.
So, if you're going to try to use this as a mechanism to somehow recover accidental deletions, I think you'll find it lacking. But don't take my word for it. Set up the following test:
In a test environment, set up a dummy table and enable change tracking on it.
Insert some data into the dummy table.
Delete the data
Recover the data using change tracking.
Despite having already dismissed CDC as an option, it sounds more in line with what you're after. CDC does keep track of non-primary key columns so if someone does a data modification accidentally, CDC will keep track of all of the values in the row(s) affected. It has the added benefit of not allowing table truncation because of how it's implemented (it uses the replication log reader).
Additionally, you can configure the CDC cleanup job to automatically purge data after any any amount of time you want (it sounds like 6 months is your retention period, which is completely doable).

Programmatically purge document deletion

I've a database with an agent that periodically delete (via Java agent, "removePermanently" method) all documents in a view and re-create them.
After some month, i've noticed that database size is considerably increased.
Showing database information through this command
sh database <dbpath>
it results that i've a lot of deleted documents (i suppose they are deletion stubs)
Document Type Live Deleted
Documents 1,922 817,378
Compacting database, 80% space was recovered.
Is there a way to programmatically delete stubs definitively to avoid "database explosion"? Or, is there a way to correctly manage this scenario (deletion and creation of documents)?
Don't delete the documents! Re-use them. That's the best answer. Seriously. Take the existing documents, clear the fields and set Form := "Obsolete". Modify the selection formula for all your views by appending & Form != "Obsolete" Create a new hidden view called "Obsolete" with selection formula Form = "Obsolete", and instead of creating new documents, change your code to go to the Obsolete view, grab an available document and set new field values (including changing the Form field). Only create new documents if there are not enough available in the Obsolete view. Any performance that you lose by doing this, which really should be minimal with the number of documents that you seem to have, will be more than offset by what you will gain by avoiding the growth and fragmentation of the NSF file that you are creating by doing all the deletions and creating new documents.
If, however, there's no possible way for you to do that -- maybe some third party tool that is outside of your control is creating the documents -- then it's important to know if the database you are talking about is replicated. If it is replicated, then you must be very careful because purging deletion stubs before all replicas are brought up to date will cause deleted documents to "come back to life" if a replica that has been off-line since before the delete occurs comes back on-line.
If the database is not replicated at all, or is reliably replicated across all replicas quickly, then you can reduce the purge interval. Go to the Replication Settings dialog, find the checkbox labeled "Remove documents not modified in the last __ days". Do not check the box, but enter a small number into the number of days. The purge interval for deletion stubs will be set to 1/3 of this number. So if you set it to 3 the effect will be that stubs are kept for 1 day and then purged, giving you 24 hours to assure that all replicas are up to date. If you need more, set the interval higher, maintaining the 3x multiple as needed. If a server is down for an extended period of time (longer than your purge interval), then adjust your operations procedures so that you will be sure to disable replication of the database to that server before it comes back on line and the replica can be deleted and recreated. Be aware, though, that user replicas pose the same problem, and it's not really possible to control or be aware of user replicas that might go off-line for longer than the purge interval. In any case, remember: do not check the box. To reduce the purge interval for deletion stubs only, just reduce the number.
Apart from this, the only way to programmatically delete deletion stubs requires use of the Notes C API. It's possible to call the required routines from LotusScript, but in my experience once the total number of stubs plus documents gets too high you will likely run into an error and may have to create and deploy a new non-replica copy of the database to get past it. You can find code along with my explanation in the answer to this previous question.
I have to second Richard's recommendation to reuse documents. I recently had a similar project, and started the way you did with deleting everything and importing half a million records every night. Deletion stubs and the growth of the FT index quickly became problems, eating up huge amounts of disk space and slowing performance significantly. I tried to manage the deletion stubs, but I was clearly going against the grain of Domino's architecture.
I read Richard's suggestion here, and adopted that approach. Here's what I did:
1) create 2 views based on form - one for 'active' records, and another for 'inactive' records
2) start the agent by setting autoupdate = false for both views
3) use stampall("form", "inactive") to change all fo the active records to inactive
4) manually refresh the 2 views using notesview.refresh()
5) start importing data. for each record, pull a document out of the pool of inactive records (by walking the 'inactive' view)
6) if if run out of inactive records in the pool, create new ones
7) when import is complete, manually refresh the views again
8) use db.createftindex(0, true) to re-create the FT index
the code is really not that complex, and it runs in about the same amount of time, if not faster, than my original approach.
Thanks Richard!
Also, look at the advanced db properties - several things there that will help optimize the db.
It sounds like you are "refreshing" the contents of the database by periodically deleting all the documents and creating new ones from some other source. Cut that out. If the data are in the Notes database already, leave the document alone. What you're doing is very inefficient.

Is there better way than this way to update listview every 1 second from DataBase?

i have in my application a listview that shows the transport events , this listview should be updated every one second , to follow up the events.
i simply do that by a timer (1000 interval) that declare one connetion object,dataReader... and then fill the listview ,finally, i dispose the connection and another objects (this is every one timer tick).
Now, is there any better way to do that ? maybe better for performance,memory or other somethings?
i'am not expert, so i thinked maybe that is declaring many conncetions every second may making some memory problems :) (correct me if that is wrong)
DataBase Access 2007
VS 2012
Thank You.
Assuming that you are using ADO.NET to access your database, your access model should be fine, because .NET uses connection pooling to minimize performance impacts of closing and re-opening DB connections.
Your overall architecture may be questioned, however: polling for updates on a timer is usually not the best option. A better approach would be maintaining an update sequence in a separate table. The table would have a single row, with a single int column initially set to zero. Every time an update to the "real" data is made, this number is bumped up by one. With this table in place, your program could read just this one number every second, rather than re-reading your entire data set. If your program detects that the number is the same as it was the previous time, it stops and waits for the next timer interval; otherwise, it re-reads the data set.

What's the easiest way to add an index on a live myISAM table?

I have a myISAM table running in production on mySQL, and by doing a few tests, we've found we can tremendously speed up a query by adding a certain compound index. So far so good. However, I am not really about the best way to add this index in a production environment without locking the table for a long time (it's got 27GBs of data, so not so much, but it does take a while).
Do you have any tips? If this was a more sophisticated setup of course we'd have a live replica of all of the data on another machine, and we could safely switch. Unfortunately, we're not there yet, and I would like to speed up this query as soon as possible (it's causing big customer headaches). Is there some simple way to replicate the data and then do a swap-out trick? Some other tricks that I am missing?
UPDATE: Reading about "Online Index Operations" in SQL Server makes me very jealous http://msdn.microsoft.com/en-us/library/ms191261.aspx :)
Thanks!
you can use replication to get downtime on the order of a couple minutes, instead of the hours it might take to create an index on that table.
to set up the slave, see http://dev.mysql.com/doc/refman/5.0/en/replication-howto-existingdata.html
a recommendation i can make to help speed up the process is in step 2 follow the "Creating a Data Snapshot Using Raw Data Files" method. but instead of copying over the wire to the slave, copy to a different location on the master. and bring the master back up as soon at the copy is done and you've made the necessary changes to the config file (set server-id and enabled binary logging). this will minimize your downtime to just a minute or two. once the server is back up, you can copy the copied files to the slave box.
once you have the slave up and running and you have verified everything is replicating properly, you can pause the slave. create the index on the salve. when the index creation is complete, resume the slave. this will catch the slave up to the master. on the master, use FLUSH TABLE WITH READ LOCK. check the slave status to make sure the log position on the master and the slave match. if they do, shut down the slave and copy the files for that table back to the master.
I'm with Randy. We've been in a similar situation, and there are two ways in MySQL to accomplish something like this:
Take down the server while it runs. This is what you'll probably do. It's simple, it's easy, it works. Time to do? Maybe a half hour/45 minutes, dependent on disk bandwidth. See below.
Make a new table with the new index, copy all the data over, pause the server delete the first table, alter the new one to the old name, start the server. Downtime? 10 minutes, maybe, but really complicated.
Option two works, and saves you the downtime of creating the index (if it takes a long time). But it takes more space, it's more complicated (since you have to deal with the new records inserted off the main table, and it will probably lock on MyISAM while copying the data out. Deleting a table will take some time, altering the table to the new name will take some time. It's just really complicated. If you had a 2TB table this might be useful, but for 27G it's probably overkill.
Do you have a second server that is close in specifications to your production server? Load up your most recent backup and do the index there, so you know about how long it will take to add. Then plan for downtime.
InnoDB is better about many things, but new indexes still lock the table. Those abilities that MSSQL (and I think PostgreSQL) have to do those kind of things without locking would be great.
Find your low usage window and take your application offline during the index build. Since you don't have replication or a multimaster or whatever, you're just going to have to bite the bullet on this one. See you at 1am. :-)
Not much you can do with one server here.
If you copy the table and do a dry run, at least you'll find out how long it's going to take without locking the live table, so you can schedule some maintenance time if necessary, or make a decision whether you can just push the button and leave users hanging for a couple of minutes :)
Or schedule it for a quiet time...
at 04:00 /usr/bin/mysql -uXXX -pXXX -e 'alter table mytable add key(col1, col2)'

Database safety: Intermediary "to_be_deleted" column/table?

Everyone has accidentally forgotten the WHERE clause on a DELETE query and blasted some un-backed up data once or twice. I was pondering that problem, and I was wondering if the solution I came up with is practical.
What if, in place of actual DELETE queries, the application and maintenance scripts did something like:
UPDATE foo SET to_be_deleted=1 WHERE blah = 50;
And then a cron job was set to go through and actually delete everything with the flag? The downside would be that pretty much every other query would need to have WHERE to_be_deleted != 1 appended to it, but the upside would be that you'd never mistakenly lose data again. You could see "2,349,325 rows affected" and say, "Hmm, looks like I forgot the WHERE clause," and reset the flags. You could even make the to_be_deleted field a DATE column, so the cron job would check to see if a row's time had come yet.
Also, you could remove DELETE permission from the production database user, so even if someone managed to inject some SQL into your site, they wouldn't be able to remove anything.
So, my question is: Is this a good idea, or are there pitfalls I'm not seeing?
That is fine if you want to do that, but it seems like a lot of work. How many people are manually changing the database? It should be very few, especially if your users have an app to work with.
When I work on the production db I put EVERYTHING I do in a transaction so if I mess up I can rollback. Just having a standard practice like that for me has helped me.
I don't see anything really wrong with that though other than ever single point of data manipulation in each applicaiton will have to be aware of this functionality and not just the data it wants.
This would be fine as long as your appliction does not require that the data is immediately deleted since you have to wait for the next interval of the cron job.
I think a better solution and the more common practice is to use a development server and a production server. If your development database gets blown out, simply reload it. No harm done. If you're testing code on your production database, you deserve anything bad that happens.
A lot of people have a delete flag or a row status flag. But if someone is doing a change through the back end (and they will be doing it since often people need batch changes done that can't be accomplished through the front end) and they make a mistake they will still often go for delete. Ultimately this is no substitute for testing the script before applying it to a production environment.
Also...what happens if the following query gets executed "UPDATE foo SET to_be_deleted=1" because they left off the where clause. Unless you have auditing columns with a time stamp how do you know which columns were deleted and which ones were done in error? But even if you have auditing columns with a time stamp, if the auditing is done via a stored procedure or programmer convention then these back end queries may not supply information letting you know that they were just applied.
Too complicated. The standard approach to this is to do all your work inside a transaction, so if you screw up and forget a WHERE clause, then you simply roll back when you see the "2,349,325 rows affected" result.
It may be easier to create a parallel table for deleted rows. A DELETE trigger (and UPDATE too if you want to undo changes as well) on the original table could copy the affected rows to the parallel table. Adding a datetime column to the parallel table to record the date & time of the change would let you permanently remove rows past a certain age using your cron job.
That way, you'd use normal DELETE statements on the original table, so there's no chance you'll forget to run your special "DELETE" statement. You also sidestep the to_be_deleted != 1 expression, which is just a bug waiting to happen when someone inevitably forgets.
It looks like you're describing three cases here.
Case 1 - maintenance scripts. Risk can be minimized by developing them and testing them in an environment other than your production box. For quick maintenance, do the maintenance in a single transaction, and check everything before committing. If you made a mistake, issue the rollback command. For more serious maintenance that you can't necessarily wait around for, or do in a single transaction, consider taking a backup directly before running the maintenance job, so that you can always restore back to the point before you ran your script if you encounter serious problems.
Case 2 - SQL Injection. This is an architecture issue. Your application shouldn't pass SQL into the database, access should be controlled through packages / stored procedures / functions, and values that are going to come from the UI and be used in a DDL statement should be applied using bind variables, rather than by creating dynamic SQL by appending strings together.
Case 3 - Regular batch jobs. These should have been tested before being deployed to production. If you delete too much, you have a bug, and are going to have to rely on your backup strategy.
Everyone has accidentally forgotten
the WHERE clause on a DELETE query and
blasted some un-backed up data once or
twice.
No. I always prototype my DELETEs as SELECTs and only if the latter gives the results I want to delete change the statement before WHERE to a DELETE. This let's me inspect in any needed detail the rows I want to affect before doing anything.
You could set up a view on that table that selects WHERE to_be_deleted != 1, and all of your normal selects are done on that view - that avoids having to put the WHERE on all of your queries.
The pitfall is that it's unnecessarily complicated and someone will inadvertently forget too check the flag in their query. There's also the issue of potentially needing to delete something immediately instead of wait for the scheduled job to run.
To avoid the to_be_deleted WHERE clause you could create a trigger before the delete command fires off to insert the deleted rows into a separate table. This table could be cleared out when you're sure everything in it really needs to be deleted, or you could keep it around for archive purposes.
You also get a "soft delete" feature so you can give the(certain) end-users the power of "undo" - there would have to be a pretty strong downside in the mix to cancel the benefits of soft deleting.
The "WHERE to_be_deleted <> 1" on every other query is a huge one. Another is once you've ran your accidentally rogue query, how will you determine which of the 2,349,325 were previously marked as deleted?
I think the practical solution is regular backups, and failing that, perhaps a delete trigger that captures the tuples to be axed.
The other option would be to create a delete trigger on each table. When anything is deleted, it would insert that "to be deleted" record into another table, ideally named TABLENAME_deleted.
The downside would be that the db would have twice as many tables.
I don't recommend triggers in general, but it might be what you are looking for.
This is why, whenever you are editing data by hand, you should BEGIN TRAN, edit your data, check that it looks good (for instance that you didn't delete more data than you were expecting) and then END TRAN. If you're using Postgres then you want to create lots of savepoints as well so that a typo doesn't wipe out your intermediate work.
But that said, in many applications it does make sense to have software mark records as invalid rather than deleting them. Add a last_modified date that is automatically updated, and you are all prepared to set up incremental updates into a data warehouse. Even if you don't have a data warehouse now, it never hurts to prepare for the future when preparing is cheap. Plus in the event of manual mistakes you still have the data, and can just find all of the records that got "deleted" when you made your mistake and fix them. (You should still use transactions though.)