How to clean a table of old values? - sql

Basically I have a table that grows very fast as it registers all user impressions. However most of the data is useless, I only need the latest entry made for each user. (The table is used to authenticate users).
I'm looking to delete the old data, so the table should end up having a stable number of rows around the total number of registered users.
I can use a cron job, then there's the option of simply adding a line at the end of the authentication script that deletes old rows. It would run on every page load.
DELETE WHERE `Date` < NOW() - SOME INTERVAL
Is this efficient, should I use a CRON JOB or what else?

Executing this from the page would add up to the time for the user login.
That is a bad approach. Better use cronjob or some other job scheduling tool like Jenkins

I would say, you could CREATE a temp table to hold your latest records.
And then DROP old table alltogether. Faster :) dropping than deleting.
Rename your temp table to the old one.
So this logic could be in your CRONJOB if you prefer.

Related

How to safely drop / delete a table?

I need to drop a table but I want to be 100% sure the table is unused first. How can I do so with complete certainty?
I've already:
Made sure there are no references to the table in the codebase
Dropped the table in the staging environment over a week ago
Renamed the table in production (I appended _to_delete at the end) over a week ago
Asked other engineers if table is needed
I suppose I can revoke permissions to the table from the application database user as a next step. What I would love is to be able to record table access to know for sure that table is not being referenced, but I wasn't able to find a way to do that over a specific timeframe.
And yes, I'm realize I'm being a bit paranoid (I could always restore the table from backup if it turns out it's needed) but I'm not a DBA so I'd prefer to be extra cautious.
Create a backup of the table and then drop the table, if application breaks then you always have the option to re-create it with the backup table.
Paranoia is a virtue for a database administrator.
Revoking permissions seems like a good way to proceed.
To check if the table is used, observe the seq_scan and idx_scan columns of the pg_stat_user_tables entry for the table. If these values don't change, the table is not accessed. These values are not 100% accurate, since statistics are deliberately sent via a UDP socket, but if the numbers don't change at all, you can be pretty certain that the table is unused.

SQL Server: copy newly added rows from one table and insert into another automatically

I need to perform some calculations using few columns from a table. This database table that gets updated every couple of hours generates duplicates on couple of columns every other day. There is no way tell which one is inserted first which affects my calculations.
Is there a way to copy these rows into a new table automatically as data gets added every couple of hours and perform calculations on the fly? This way whatever comes first will be captured into a new table for a dashboard and for other business use cases.
I thought of creating a stored procedure and using a job scheduler to perform this. But I do not have admin access and can not schedule jobs. Is there another way of doing this efficiently? Much appreciated!
Edit: My request for admin access is being approved.
Another way as to stated in the answers, what you can do is:
Make a temp table.
Make a prod table.
Use stored procedure to copy everything from the temp table into prod table after any load have been done.
Use the same stored procedure to clean the temp table after the load is done.
Don't know if this will work, but this is in general how we are dealing with huge amount of load on a daily basis.

How do I lock out writes to a specific table while several queries execute?

I have a table set up in my sql server that keeps track of inventory items (in another database) that have changed. This table is fed by several different triggers. Every 15 minutes a scheduled task runs a batch file that executes a number of different queries that send updates on the items flagged in this table to update several ecommerce websites. The last query in the batch file resets the flags.
As you can imagine there is potential to lose changes if an item is flagged while this batch file is running. I have worked around this by replaying the last 25 hours of updates every 24 hours, just in case this scenario happened. It works, but IMO is kind of clumsy.
What I would like to do is delay any writes to this table until my script finishes, and resets the flags on all the rows that were flagged when the script started. Then allow all of these delayed writes to happen.
I've looked into doing this with table hints (TABLOCK) but this seems to be limited to one query--unless I'm misunderstanding what I have read, which is certainly possible. I have several that run in succession. TIA.
Alex
Could you modify your script into a stored procedure that extracts all the data into a temporary table using a select statement that applies a lock to the production table. You could then drop your lock on the main table and do all your processing in the temporary table (or permanent table built for the purpose) away from the live system. It will be a lot slower and put more load on your SQL box but speed shouldn't be an issue if you have a point in time snapshot of it.
If that option is not applicable then maybe you could play with wrapping the whole thing in a transaction and putting a table lock on your production table with the first select statement.
Good luck mate

How to delete a large record from SQL Server?

In a database for a forum I mistakenly set the body to nvarchar(MAX). Well, someone posted the Encyclopedia Britanica, of course. So now there is a forum topic that won't load because of this one post. I have identified the post and ran a delete query on it but for some reason the query just sits and spins. I have let it go for a couple hours and it just sits there. Eventually it will time out.
I have tried editing the body of the post as well but that also sits and hangs. When I sit and let my query run the entire database hangs so I shut down the site in the mean time to prevent further requests while it does it's thinking. If I cancel my query then the site resumes as normal and all queries for records that don't involve the one in question work fantastically.
Has anyone else had this issue? Is there an easy way to smash this evil record to bits?
Update: Sorry, the version of SQL Server is 2008.
Here is the query I am running to delete the record:
DELETE FROM [u413].[replies] WHERE replyID=13461
I have also tried deleting the topic itself which has a relationship to replies and deletes on topics cascade to the related replies. This hangs as well.
Option 1. Depends on how big the table itself and how big are the rows.
Copy data to a new table:
SELECT *
INTO tempTable
FROM replies WITH (NOLOCK)
WHERE replyID != 13461
Although it will take time, table should not be locked during the copy process
Drop old table
DROP TABLE replies
Before you drop:
- script current indexes and triggers so you are able to recreate them later
- script and drop all the foreign keys to the table
Rename the new table
sp_rename 'tempTable', 'replies'
Recreate all the foreign keys, indexes and triggers.
Option 2. Partitioning.
Add a new bit column, called let's say 'Partition', set to 0 for all rows except the bad one. Set it to 1 for bad one.
Create partitioning function so there would be two partitions 0 and 1.
Create a temp table with the same structure as the original table.
Switch partition 1 from original table to the new temp table.
Drop temp table.
Remove partitioning from the source table and remove new column.
Partitioning topic is not simple. There are some examples in the internet, e.g. Partition switching in SQL Server 2005
Start by checking if your transaction is being blocked by another process. To do this, you can run this command..
SELECT * FROM sys.dm_os_waiting_tasks WHERE session_id = {spid}
Replace {spid} with the correct spid number of the connection running your DELETE command. To get that value, run SELECT ##spid before the DELETE command.
If the column sys.dm_os_waiting_tasks.blocking_session_id has a value, you can use activity monitor to see what that process is doing.
To open activity monitor, right-click on the server name in SSMS' Object Explorer and choose Activity Monitor. The Processes and Resource Waits sections are the ones you want.
Since you're having issues deleting the record and recreating the table, have you tried updating the record?
Something like (changing "body" field name to whatever it is in the table):
update [u413].[replies] set body='' WHERE replyID=13461
Once you clear out the text from that single reply record you should be able to alter the data type of the column to set an upper bound. Something like:
alter table [u413].[replies] alter column body nvarchar(100)

Need to alter column types in production database (SQL Server 2005)

I need help writing a TSQL script to modify two columns' data type.
We are changing two columns:
uniqueidentifier -> varchar(36) * * * has a primary key constraint
xml -> nvarchar(4000)
My main concern is production deployment of the script...
The table is actively used by a public website that gets thousands of hits per hour. Consequently, we need the script to run quickly, without affecting service on the front end. Also, we need to be able to automatically rollback the transaction if an error occurs.
Fortunately, the table only contains about 25 rows, so I am guessing the update will be quick.
This database is SQL Server 2005.
(FYI - the type changes are required because of a 3rd-party tool which is not compatible with SQL Server's xml and uniqueidentifier types. We've already tested the change in dev and there are no functional issues with the change.)
As David said, execute a script in a production database without doing a backup or stop the site is not the best idea, that said, if you want to do changes in only one table with a reduced number of rows you can prepare a script to :
Begin transaction
create a new table with the final
structure you want.
Copy the data from the original table
to the new table
Rename the old table to, for example,
original_name_old
Rename the new table to
original_table_name
End transaction
This will end with a table that is named as the original one but with the new structure you want, and in addition you maintain the original table with a backup name, so if you want to rollback the change you can create a script to do a simple drop of the new table and rename of the original one.
If the table has foreign keys the script will be a little more complicated, but is still possible without much work.
Consequently, we need the script to
run quickly, without affecting service
on the front end.
This is just an opinion, but it's based on experience: That's a bad idea. It's better to have a short, (pre-announced if possible) scheduled downtime than to take the risk.
The only exception is if you really don't care if the data in these tables gets corrupted, and you can be down for an extended period.
In this situation, based on th types of changes you're making and the testing you've already performed, it sounds like the risk is very minimal, since you've tested the changes and you SHOULD be able to do it safely, but nothing is guaranteed.
First, you need to have a fall-back plan in case something goes wrong. The short version of a MINIMAL reasonable plan would include:
Shut down the website
Make a backup of the database
Run your script
test the DB for integrity
bring the website back online
It would be very unwise to attempt to make such an update while the website is live. you run the risk of being down for an extended period if something goes wrong.
A GOOD plan would also have you testing this against a copy of the database and a copy of the website (a test/staging environment) first and then taking the steps outlined above for the live server update. You have already done this. Kudos to you!
There are even better methods for making such an update, but the trade-off of down time for safety is a no-brainer in most cases.
And if you absolutely need to do this in live then you might consider this:
1) Build an offline version of the table with the new datatypes and copied data.
2) Build all the required keys and indexes on the offline tables.
3) swap the tables out in a transaction. 00 you could rename the old table to something else as an emergency backup.
sp_help 'sp_rename'
But TEST FIRST all of this in a prod like environment. And make sure your backups are up to date. AND do this when you are least busy.