I'm running the following SAS command:
Proc SQL;
Delete From Server003.CustomerList;
Quit;
Which is taking over 8 minutes... when it takes only a few seconds to read that file. What could be cause a delete to take so long and what can I do to make it go faster?
(I do not have access to drop the table, so I can only delete all rows)
Thanks,
Dan
Edit: I also apparently cannot Truncate tables.
This is NOT regular SQL. SAS' Proc SQL does not support the Truncate statement. Ideally, you want to figure out what's going on with the performance of the delete from; but if what you really need is truncate functionality, you could always just use pure SAS and not mess with SQL at all.
data Server003.CustomerList;
set Server003.CustomerList (obs=0);
run;
This effectively performs and operates like a Truncate would. It maintains the dataset/table structure but fails to populate it with data (due to the OBS= option).
Are there a lot of other tables which have foreign keys to this table? If those tables don't have indexes on the foreign key column(s) then it could take awhile for SQL to determine whether or not it's safe to delete the rows, even if none of the other tables actually has a value in the foreign key column(s).
Try adding this to your LIBNAME statement:
DIRECT_EXE=DELETE
According to SAS/ACCESS(R) 9.2 for Relational Databases: Reference,
Performance improves significantly by using DIRECT_EXE=, because the SQL delete statement is passed directly to the DBMS, instead of SAS reading the entire result set and deleting one row at a time.
I would also mention that in general SQL commands run slower in SAS PROC SQL. Recently I did a project and moved the TRUNCATE TABLE statements into a Stored Procedure to avoid the penalty of having them inside SAS and being handled by their SQL Optimizer and surrounding execution shell. In the end this increased the performance of the TRUNCATE TABLE substantially.
It might be slower because disk writes are typically slower than reads.
As for a way around it without dropping/truncating, good question! :)
You also could consider the elegant:
proc sql; create table libname.tablename like libname.tablename; quit;
I will produce a new table with the same name and same meta data of your previous table and delete the old one in the same operation.
Related
Working on redesigning some databases in my SQL SERVER 2012 instance.
I have databases where I put my raw data (from vendors) and then I have client databases where I will (based on client name) create a view that only shows data for a specific client.
Because of the this data being volatile (Google Adwords & Google DFA) I typically just delete the last 6 days and insert 7 days everyday from the vendor databases. Doing this gives me comfort in knowing that Google has had time to solidify its data.
The question I am trying to answer is:
1. Instead of using views, would it be better use a 'SELECT INTO' statement and DROP the table everyday in the client database?
I'm afraid that by automating my process using the 'DROP TABLE' method will not scale well longterm. While testing it myself, it seems that performance is improved because it does not have to scan the entire table for the date range. I've also tested this with an index on the 'date' column and performance still seemed better with the 'DROP TABLE' method.
I am looking for best practices here.
NOTE: This is my first post. So I am not too familiar with how to format correctly. :)
Deleting rows from a table is a time-consuming process. All the deleted records get logged, and performance of the server suffers.
Instead, databases offer truncate table. This removes all the rows of the table without logging the rows, but keeps the structure intact. Also, triggers, indexes, constraints, stored procedures, and so on are not affected by the removal of rows.
In some databases, if you delete all rows from a table, then the operation is really truncate table. However, SQL Server is not one of those databases. In fact the documentation lists truncate as a best practice for deleting all rows:
To delete all the rows in a table, use TRUNCATE TABLE. TRUNCATE TABLE
is faster than DELETE and uses fewer system and transaction log
resources. TRUNCATE TABLE has restrictions, for example, the table
cannot participate in replication. For more information, see TRUNCATE
TABLE (Transact-SQL)
You can drop the table. But then you lose auxiliary metadata as well -- all the things listed above.
I would recommend that you truncate the table and reload the data using insert into or bulk insert.
I have to load a text file into a database on a daily basis that is about 50MB in size. I am using Perl DBI to load the file using insert statements into a SQL Server. It is not very performant, and I was wondering if there are better/faster ways of loading from DBI into SQL Server.
You should probably use the BULK INSERT statement. No reason you couldn't run that from DBI.
When doing large INSERT/UPDATE operations, it's generally useful to disable any indexes on the target table(s), make the changes, and re-enable the indexes. This way, the indexes only have to be rebuilt once instead of rebuilding them after each INSERT/UPDATE statement runs.
(This can also be applied in a zero-downtime way by copying the original table to an unindexed temp table, doing your work on the temp table, adding indexes, dropping the original table, and renaming the temp table to replace it.)
Another way to speed things up (if not already done) is to use prepared statements and bind-values.
I have the following Oracle SQL:
Begin
-- tables
for c in (select table_name from user_tables) loop
execute immediate ('drop table '||c.table_name||' cascade constraints');
end loop;
-- sequences
for c in (select sequence_name from user_sequences) loop
execute immediate ('drop sequence '||c.sequence_name);
end loop;
End;
It was given to me by another dev, and I have no idea how it works, but it drops all tables in our database.
It works, but it takes forever!
I don't think dropping all of my tables should take that long. What's the deal? And, can this script be improved?
Note: There are somewhere around 100 tables.
"It works, but it takes forever!"
Forever in this case meaning less than three seconds a table :)
There is more to dropping a table than just dropping the table. There are dependent objects to drop as well - constraints, indexes, triggers, lob or nested table storage, etc. There are views, synonyms stored procedures to invalidate. There are grants to be revoked. The table's space (and that of its indexes, etc) has to be de-allocated.
All of this activity generates recursive SQL, queries which select from or update the data dictionary, and which can perform badly. Even if we don't use triggers, views, stored procs, the database still has to run the queries to establish their absence.
Unlike normal SQL we cannot tune recursive SQL but we can shape the environment to make it run quicker.
I'm presuming that this is a development database, in which objects get built and torn down on a regular basis, and that you're using 10g or higher.
Clear out the recycle bin.
SQL> purge recyclebin;
Gather statistics for the data dictionary (will require DBA privileges). These may already be gathered, as that is the default behaviour in 10g and 11g. Find out more.
Once you have dictionary stats ensure you're using the cost-based optimizer. Ideally this should be set at the database level, but we can fix it at the session level:
SQL> alter session set optimizer_mode=choose;
I would try changing the DROP TABLE statement to use the Purge keyword. Since you are dropping all tables, you don't really need to cascade the constraints at the same time. This action is probably what is causing it to be slow. I don't have an instance of Oracle to test this with though, so it may throw an error.
If it does throw an error, or not go faster, I would remove the Sequence drop commands to figure out which command is taking so much time.
Oracle's documentation on the DROP TABLE command is here.
One alternative is to drop the user instead of the individual tables etc., and recreate them if needed. It's generally more robust as is drops all of the tables, view, procedures, sequences etc., and would probably be faster.
We have a table with a 150+ million records. We need to clear/delete all rows. Delete operation would take forever due to it writing to the t-logs and we cannot change our recovery model for the whole DB. We have tested the truncate table option.
What we realized that truncate deallocates pages from the table, and if I am not wrong makes them available for reuse but doesn't shrink the db automatically. So, if we want to reduce the DB size, we really would need to do run the shrink db command after truncating the table.
Is this normal procedure? Anything we need to be careful or aware about, or are there any better alternatives?
truncate is what you're looking for. If you need to slim down the db afterwards, run a shrink.
This MSDN refernce (if you're talking T-SQL) compares the behind the scenes of deleting rows versus truncating.
"Delete all rows"... wouldn't DROP TABLE (and re-recreate an empty one with same schema / indices) be preferable ? (I personally like "fresh starts" ;-) )
This said TRUNCATE TABLE is quite OK too, and yes, DBCC SHRINKFILE may be required afterwards if you wish to recover the space.
Depending on the size of the full database, the shrink may take a while; I've found it to go faster if it is shrunk in smaller chunks, rather than trying to get it back all at once.
One thing to remember with Truncate Table (as well as drop table) is going forward this will not work if you ever have foreign keys referencing the table.
As pointed out, if you can't use truncate or drop
SELECT 1
WHILE ##ROWCOUNT <> 0
DELETE TOP (100000) MyTable
You have a normal solution (truncate + shrink db) to remove all the records from a table.
As Irwin pointed out. The TRUNCATE command won't work while being referenced by a Foreign key constraint. So first drop the constraints, truncate the table and recreate the constraints.
If your concerned about performance and this is a regular routine for your system. You might want to look into moving this table to it's own data file, then run shrink only against the target datafile!
How do we insert data about 2 million rows into a oracle database table where we have many indexes on it?
I know that one option is disabling index and then inserting the data. Can anyone tell me what r the other options?
bulk load with presorted data in index key order
Check SQL*Loader out (especially the paragraph about performance optimization) : it is the standard bulk loading utility for Oracle, and it does a good job once you know how to use it (as always with Oracle).
there are many tricks to fasten de insert, below i wrote some of them
if you use sequence.nextval for insert make sure sequence has big cache value (1000 is enough usually)
drop indexes before insert and create afterwards (make sure you get the create scripts of indexes before dropping) while creating you can use parallel option
if target table has fk dependencies disable them before insert and after insert enable again. if you are sure of your data you can use novalidate option (novalidate option is valid for oracle, other rdbms systems probably have similar option)
if you select and insert you can give parallel hint for select statement and for insert you can use append hint (direct-path insert ) (direct-path insert concept is valid for oracle, other rdbms systems probably have similar option)
Not sure how you are inserting the records; if you can; insert the data in smaller chunks. In my experience 50 sets of 20k records is often quicker than 1 x 1000000
Make sure your database files are large enough before you start save you from database growth during the insert ...
If you are sure about the data, besides the index you can disable referential and constraint checks. You can also lower the transaction isolation level.
All these options come with a price, though. Each option increases your risk of having corrupt data in the sense that you may end up with null FK's etc.
As an another option, one can use oracle advanced and faster data pump (expdp, impdp) utilities availability 10 G onward. Though, Oracle still supports old export/import (exp, imp).
Oracle provides us with many choices for data loading, some way faster than others:
Oracle10 Data Pump Oracle import utility
SQL insert and merge
statements PL/SQL bulk loads for the forall PL/SQL operator
SQL*Loader
The pros/cons of each can be found here ..