when Updating & selecting from huge table where the id start from 30000 timing more than 2min - sql

Kindly i have a database containing 2 tables: first one with more than 2 million records
and the second a history table of the first one.
when we are trying to do a select or update and the ID (primary Key) is more than 30 000 the query is taking more than 1 min to be executed.
if i do a truncate and import the Data and the id starts from 1 the query take 1 sec.
Kindly why the latency when the ID > 30 000 and any other solution? Indexes done on: 1 for PK another one on date and another one on field priority integer
if i do a simple select top 10 * from table1 i have the latency of 2 min
PS: lots of transaction are done in the same time on the first table (select , insert , update & delete)
Thank you,

If you are suffering from lock contention, you have essentially two choices:
Stop all other activity while you do the query. This will guarantee a consistent view of the data. (Technically other steps are required to guarantee consistency with other tables). You can do this by specifying the with (TABLOCKX) query hint.
(You can also use the "serialized" transaction isolation level, which will progressively lock the table from end to end as you read it. You can do clever things with indexes to make this less painful, but that qualifies as "advanced").
Allow other activity to continue, but accept that the answer you get may be inconsistent with activity which took place while the query was in progress. That is, rows which were being added during the query may, or may not, be included, and rows which were being deleted during the query may, or may not, be included, and rows which were being updated, may, or may not, have the previous or subsequent values. In addition, you may find that some later values are included while some earlier values are not, so it won't be temporally consistent. You can do this with the with (NOLOCK) hint.
(You can also use the "read uncommitted" transaction isolation level. Generally it is easier just to specify NOLOCK when you need it).
Depending on what you are doing, either option might be best. If you want to provide rough numbers, or a sort of rolling "top 10" for reporting/monitoring purposes, use option 2. If you must provide something like an accounting number, bank balance, etc, so you must have a checkpoint where you know the exact correct number at an instant in time, use option 1. This is rare - if in doubt you probably want option 2.
The "in between" options, read committed, read snapshot etc, are rarely actually needed, in spite of being the default, and should probably only be used if you understand the situation well enough to know whether you need them or not.

Related

Performance bottleneck at target in informatica mapping

I have a informatica mapping wherein soft delete condition is as follows:
Pk_Src is null and Pk_Tgt is not null then set the active_flag to N.
Now based on this condition the mapping evaluated that there are 400k records which needs to be updated. Its a simple update but it is taking more than 3 hours using update strategy
Appreciate your valuable inputs.
Dex.
How many records are on the table and how many indexes does active_flag appear in? If active_flag is in many indexes then you should consider dropping those indexes before the session starts and redefining them after the session ends. Have you looked through the session log to see which steps are taking time? There may be more beside the update query. Another strategy to try is increasing your commit interval to 500000 (so long as your db undo can withstand that)
In general you should expect very slow performance when doing updates/deletes from an infa mapping. 10-100 times slower than inserts. This is due to the fact that there is no ‘array’ based update/delete, and each update therefore needs to be handled separately and sequentially, and you end up in a situation where 90% if the time is spend sending handshakes back and forth. With inserts the handshakes is only being done for each ‘array-size’ records (say 10000) and the overhead is therefore negligible.
The best solutions I have found so far is:
use push down optimization (draw back is that mapping needs to be 100% push-downable, which means: no variable ports, and both source and target behind the same connection and much more)
use a stage/apply approach:
In the pre-sql of the session, drop&create a STAGE table with all the keys you wish to delete
Override the table name for your target to point to this STAGE table
Let the mapping do inserts to the STAGE table
In the post-sql do a
Delete from TABLE where Exists (select * from STAGE where TABLE.ID=STAGE.ID)
I hope you cal follow me

What is the best way to ensure consistent ordering in an Oracle query?

I have an program that needs to run queries on a number of very large Oracle tables (the largest with tens of millions of rows). The output of these queries is fed into another process which (as a side effect) can record the progress of the query (i.e., the last row fetched).
It would be nice if, in the event that the task stopped half way through for some reason, it could be restarted. For this to happen, the query has to return rows in a consistent order, so it has to be sorted. The obvious thing to do is to sort on the primary key; however, there is probably going to be a penalty for this in terms of performance (an index access) versus a non-sorted solution. Given that a restart may never happen this is not desirable.
Is there some trick to ensure consistent ordering in another way? Any other suggestions for maintaining performance in this case?
EDIT: I have been looking around and seen "order by rowid" mentioned. Is this useful or even possible?
EDIT2: I am adding some benchmarks:
With no order by: 17 seconds.
With order by PK: 46 seconds.
With order by rowid: 43 seconds.
So any order by has a savage effect on performance, and using rowid makes little difference. Accepted answer is - there is no easy way to do it.
The best advice I can think of is to reduce the chance of a problem occurring that might stop the process, and that means keeping the code simple. No cursors, no commits, no trying to move part of the data, just straight SQL statements.
Unless a complete restart would be a completely unacceptable disaster, I'd go for simplicity without any part-way restart code at all.
If you want some order and queried data is unsorted then you need to sort it anyway, and spend some resources to do sorting.
So, there are at least two variants for optimization:
Minimize resources spent on sorting;
Query already sorted data.
For the first variant Oracle on its own calculates a best variant to minimize data access and overall query time. It may be possible to choose sorting order involved in unique index which already used by optimizer, but it's a very questionable tactic.
Second variant is about index-organized tables and about forcing Oracle with hints to use some specific index. It seems Ok if you need to process nearly all records in some specific table, but if selectivity of query is high it's significantly slows a process, even on a single table.
Think about a table with surrogate primary key which holds data with 10-year transaction history. If you need data only for previous year and you force order by primary key then Oracle need to process records in all 10 years one-by-one to find all records which belongs to a single year.
But if you need data for 9 years from this table then full table scan may be faster than index-based choice.
So selectivity of your query is a key to choose between full table scan and result sorting.
For storing results and restarting query a good solution is to use Oracle Streams Advanced Queuing to fed another process.
All unprocessed messages in queue redirected to Exception Queue where it may be processed separately.
Because you don't specify exact ordering for selected messages I suppose that you need ordering only to maintain unprocessed part of records. If it's true then with AQ you don't need ordering at all and may, even, process records in parallel.
So, finally, from my point of view Buffered Queue is what you really need.
You could skip ordering and just update the records you processed with something like SET is_processed = 'Y' or SET date_processed = sysdate. Complete restartability and no ordering.
For performance you can partition by is_processed. Yes, partition key changes might be slow, but it is all about trade-offs.

Slow queries on 'transaction' table - sql partition as a solution?

I have a table with 281,433 records in it, ranging from March 2010 to the current date (Sept 2014). It's a transaction table which consists of records that determine stock which is currently in and out of the warehouse.
When making picks from the warehouse, the system needs to look over every transaction from a particular customer that was ever made (based on the AccountListID field, which determines the customer, a customer might on average have about 300 records in the table). This happens 2-3 times per request from the particular .NET application when a picking run is done.
There are times when the database seemingly locks out. Some requests complete no bother, within about 3 seconds. Others hang for 'up to 4 minutes' according to the end users.
My guess is with 4-5 requests at the same time all looking at this one transaction table things are getting locked up.
I'm thinking about partitioning this table so that the primary transaction table only contains record from the last 2 years. The end user has agreed that any records past this date are unnecessary.
But I can't just delete them, they're used elsewhere in the system. I have indexes already in place and they make a massive difference (going from >30 seconds to <2, on the accountlistid field). It seems partitioning is the next step.
1) Am I going down the right route as a solution to my 'locking' problem?
2) When moving a set of records (e.g. records where the field DateTimeCheckedIn is more than 2 years old) is this a manual process or does partitioning automatically do this?
Partitioning shouldn't be necessary on a table with fewer than 300,000 rows, unless each record is really big. If a record is occupying more than 4k bytes, then you have 300,000 pages (2,400,000,000 bytes) and that is getting larger.
Indexes are usually the solution for something like this. Taking more than a second to return 300 records in an indexed database seems like a long time (unless the records are really big and the network overhead adds to the time). Your table and index should both fit into memory. Check your memory configuration.
The next question is about the application code. If it uses cursors, then these might be the culprit by locking rows under certain circumstances. For read-only cursors, "FAST_FORWARD" or "FORWARD READ_ONLY" should be fast. It is possible that if the application code is locking all the historical records, then you might get contention. After all, this would occur when two records (for different) customers are on the same data page. The solution is to not lock the historical records as you read them. Or, to avoid using cursors all together.
I don't think partitioning will be necessary here. You can probably fix this with a well-placed index: I'm thinking a single index covering (in order) company, part number, and quantity. Or, if it's an old server, possibly just add ram. Finally, since this is reading a lot of older data for transactions, where individual transactions themselves are likely never (or at most very rarely) updated once written, you might do better with a READ UNCOMMITTED isolation level for this query.

Will one query run faster than multiple queries, if they are deleting the same amount of records

I have a table like this:
Table company
companyid | companyname | owner |
5 |coffecompany |Mike |
6 |juicecompany |Mike |
For some reason, I need to use this:
DELETE FROM company WHERE companyid='5';
DELETE FROM company WHERE companyid='6';
instead of
DELETE FROM company WHERE owner='Mike';
But I wonder if the second choice run faster, if it does, will it run much faster? In the future, I may have to use it to delete a large amount of records, so I really need to know.
delete from company where companyId in (5, 6); should always be faster, even though the difference might be negligible if eg. you've got proper indexes, no concurrent queries, no issues with locking etc.
Note that my query is for MS SQL, if your database server allows using the same construct (ie. specifying all the values in such concise way), you should probably use it, if not, go with something like delete from company where companyId = 5 or companyId = 6; Also, don't use string literals if companyid is a number (is the table column actually a number, or a text?).
In any case, it gives the server more lee-way in implementing the actual operation, and DB servers tend to be very good at query optimization.
One possible bottle-neck for deletion could be in the transaction logs, however. It might very well be that if you're deleting a huge amount of rows at once, it would be better to do a few separate deletes in separate transactions to fit within transaction size limits.
Generally, SQL is language operating on sets of data so second query will be much faster for huge amount of rows.
First choice might be slower as you'll have to send query text as many times as you have rows to delete. Imagine network traffic if you want to delete 1 000 000 rows.
On small amounts of rows probably you won't be able to see any difference.
If you are using Oracle, think of using bind variable :
execute immediate 'DELETE FROM company WHERE companyid=:ID' USING 6;
But other than that, there is no specific answer to your question, you need to benchmark yourself, it depends on the amount of data, your indexes etc...
When using Where clause in a query, RDBMS will go to find the result set applying the condition.
Normally RDBMS will do a full-table-scan to find the result set, it means that any records will be investigated to see if the condition is matched. Based on the table size that will be time consuming.
Above approach will differed when when the column(s) listed in the where condition are indexed.
Indexing is a way of sorting a number of records on multiple fields. Creating an index on a field in a table creates another data structure which holds the field value, and pointer to the record it relates to. This index structure is then sorted, allowing Binary Searches to be performed on it.
As a simplified sample:
A linear search (full-table-scan) on the field A of table T containing N records would require an average of N/2 accesses to find a value.
If 'A' field is indexed then a sorted binary search will requiring an average of log2 N block accesses. Asuming that N=1,000,000 then we will have
N/2 = 500,000
log2 1000000 = 19.93 = 20
Instantly we can see this is a drastic improvement.
It looks like the companyid is the primary key of company table, if so any primary key column will be indexed automatically by RDMS and the search will be more effective than searching by owner.

T-SQL Optimize DELETE of many records

I have a table that can grew to millions records (50 millions for example). On each 20 minutes records that are older than 20 minutes are deleted.
The problems is that if the table has so many records such deletion can take a lot of time and I want to make it faster.
I can not do "truncate table" because I want to remove only records that are older than 20 minutes. I suppose that when doing the "delete" and filtering the information that need to be delete, the server is creating log file or something and this take much time?
Am I right? Is there a way to stop any flag or option to optimize the delete, and then to turn on the stopped option?
To expand on the batch delete suggestion, i'd suggest you do this far more regularly (every 20 seconds perhaps) - batch deletions are easy:
WHILE 1 = 1
BEGIN
DELETE TOP ( 4000 )
FROM YOURTABLE
WHERE YourIndexedDateColumn < DATEADD(MINUTE, -20, GETDATE())
IF ##ROWCOUNT = 0
BREAK
END
Your inserts may lag slightly whilst they wait for the locks to release but they should insert rather than error.
In regards to your table though, a table with this much traffic i'd expect to see on a very fast raid 10 array / perhaps even partitioned - are your disks up to it? Are your transaction logs on different disks to your data files? - they should be
EDIT 1 - Response to your comment
TO put a database into SIMPLE recovery:
ALTER DATABASE Database Name SET RECOVERY='SIMPLE'
This basically turns off transaction logging on the given database. Meaning in the event of data loss you would need loose all data since your last full backup. If you're OK with that, well this should save a lot of time when running large transactions. (NOTE that as the transaction is running, the logging still takes place in SIMPLE - to enable the rolling back of the transaction).
If there are tables within your database where you cant afford to loose data you'll need to leave your database in FULL recovery mode (i.e. any transaction gets logged (and hopefully flushed to *.trn files by your servers maintenance plans). As i stated in my question though, there is nothing stopping you having two databases, 1 in FULL and 1 in SIMPLE. the FULL database would be fore tables where you cant afford to loose any data (i.e. you could apply the transaction logs to restore data to a specific time) and the SIMPLE database would be for these massive high-traffic tables that you can allow data loss on in the event of a failure.
All of this is relevant assuming your creating full (*.bak) files every night & flushing your log files to *.trn files every half hour or so).
In regards to your index question, it's imperative your date column is indexed, if you check your execution plan and see any "TABLE SCAN" - that would be an indicator of a missing index.
Your date column i presume is DATETIME with a constraint setting the DEFAULT to getdate()?
You may find that you get better performance by replacing that with a BIGINT YYYYMMDDHHMMSS and then apply a CLUSTERED index to that column - note however that you can only have 1 clustered index per table, so if that table already has one you'll need to use a Non-Clustered index. (in case you didnt know, a clustered index basically tells SQL to store the information in that order, meaning that when you delete rows > 20 minutes SQL can literally delete stuff sequentially rather than hopping from page to page.
The log problem is probably due to the number of records deleted in the trasaction, to make things worse the engine may be requesting a lock per record (or by page wich is not so bad)
The one big thing here is how you determine the records to be deleted, i'm assuming you use a datetime field, if so make sure you have an index on the column otherwise it's a sequential scan of the table that will really penalize your process.
There are two things you may do depending of the concurrency of users an the time of the delete
If you can guarantee that no one is going to read or write when you delete, you can lock the table in exclusive mode and delete (this takes only one lock from the engine) and release the lock
You can use batch deletes, you would make a script with a cursor that provides the rows you want to delete, and you begin transtaction and commit every X records (ideally 5000), so you can keep the transactions shorts and not take that many locks
Take a look at the query plan for the delete process, and see what it shows, a sequential scan of a big table its never good.
Unfortunately for the purpose of this question and fortunately for the sake of consistency and recoverability of the databases in SQL server, putting a database into Simple recovery mode DOES NOT disable logging.
Every transaction still gets logged before committing it to the data file(s), the only difference would be that the space in the log would get released (in most cases) right after the transaction is either rolled back or committed in the Simple recovery mode, but this is not going to affect the performance of the DELETE statement in one way or another.
I had a similar problem when I needed to delete more than 70% of the rows from a big table with 3 indexes and a lot of foreign keys.
For this scenario, I saved the rows I wanted in a temp table, truncated the original table and reinserted the rows, something like:
SELECT * INTO #tempuser FROM [User] WHERE [Status] >= 600;
TRUNCATE TABLE [User];
INSERT [User] SELECT * FROM #tempuser;
I learned this technique with this link that explains:
DELETE is a a fully logged operation , and can be rolled back if something goes wrong
TRUNCATE Removes all rows from a table without logging the individual row deletions
In the article you can explore other strategies to resolve the delay in deleting many records, that one worked to me