I want to update the cache size of an existing sequence and i want to describe a sequence in oracle like table . how to do it ?
and what are all the drawbacks of increasing the cache value of an sequence
Alter sequence seq_name cache 20;
See the docs.
To get the ddl you may use the dbms_metadata package, wich can be used for any object:
select dbms_metadata.get_ddl('SEQUENCE','SEQ_NAME') from dual;
Increasing the cache size is useful when you have massive fetches from sequence. Increasing it has no drawback considering the fact that you use them.
But if you generate 1 milion values at a time and you use only 10, maybe is not a good ideea, because 999990 values are lost. Next session will generate another 1000000 values.
I think the engine works to generate them and allocate values for your session.
For example in my opinion, a cache 10 times less than you normally use in a session is ok.
UPDATE: Adding David Aldridge's comment:
The usefullness of a large cache is really related to the rate at
which it is used in general, so not just for large selects but for
systems with many session all using one value at a time. As
background, the performance problem with a small cache is caused by
the need for the SEQ$ system table to be modified when the cache is
exhausted. It's a small operation but obviously you don't want to be
doing it 100 times a second.
So, increasing the cache you'll have fewer concurent sessions on the same resource.
Related
I have oracle table contain 900 million records , this table partioned to 24 partion , and have indexes :
i try to using hint and i put fetch_buffer to 100000:
select /+ 8 parallel +/
* from table
it take 30 minutes to get 100 million records
my question is :
is there are any way more faster to get the 900 million (all data in the table ) ? should i use partions and did 24 sequential queries ? or should i use indexes and split my query to 10 queries for example
The network is almost certainly the bottleneck here. Oracle parallelism only impacts the way the database retrieves the data, but data is still sent to the client with a single thread.
Assuming a single thread doesn't already saturate your network, you'll probably want to build a concurrent retrieval solution. It helps that the table is already partitioned, then you can read large chunks of data without re-reading anything.
I'm not sure how to do this in Scala, but you want to run multiple queries like this at the same time, to use all the client and network resources possible:
select * from table partition (p1);
select * from table partition (p2);
...
Not really an answer but too long for a comment.
A few too many variables can impact this to give informed advice, so the following are just some general hints.
Is this over a network or local on the server? If the database is remote server then you are paying a heavy network price. I would suggest (if possible) running the extract on the server using the BEQUEATH protocol to avoid using the network. Once the file(s) complete, is will be quicker to compress and transfer to destination than transferring the data direct from database to local file via JDBC row processing.
With JDBC remember to set the cursor fetch size to reduce round tripping - setFetchSize. The default value is tiny (10 I think), try something like 1000 to see how that helps.
As for the query, you are writing to a file so even though Oracle might process the query in parallel, your write to file process probably doesn't so it's a bottleneck.
My approach would be to write the Java program to operate off a range of values as command line parameters, and experiment to find which range size and concurrent instances of the Java give optimal performance. The range will likely fall within discrete partitions so you will benefit from partition pruning (assuming the range value is an a indexed column ideally the partition key).
Roughly speaking I would start with range of 5m, and run concurrent instances that match the number of CPU cores - 2; this is not a scientifically derive number just one that I tend to use as my first stab and see what happens.
I am facing an issue with an ever slowing process which runs every hour and inserts around 3-4 million rows daily into an SQL Server 2008 Database.
The schema consists of a large table which contains all of the above data and has a clustered index on a datetime field (by day), a unique index on a combination of fields in order to exclude duplicate inserts, and a couple more indexes on 2 varchar fields.
The typical behavior as of late, is that the insert statements get suspended for a while before they complete. The overall process used to take 4-5 mins and now it's usually well over 40 mins.
The inserts are executed by a .net service which parses a series of xml files, performs some data transformations and then inserts the data to the DB. The service has not changed at all, it's just that the inserts take longer than they use to.
At this point I'm willing to try everything. Please, let me know whether you need any more info and feel free to suggest anything.
Thanks in advance.
Sounds like you have exhausted the buffer pools ability to cache all the pages needed for the insert process. Append-style inserts (like with your date table) have a very small working set of just a few pages. Random-style inserts have basically the entire index as their working set. If you insert a row at a random location the existing page that row is supposed to be written to must be read first.
This probably means tons of disk seeks for inserts.
Make sure to insert all rows in one statement. Use bulk insert or TVPs. This allows SQL Server to optimize the query plan by sorting the inserts by key value making IO much more efficient.
This will, however, not realize a big speedup (I have seen 5x in similar situations). To regain the original performance you must bring the working set back into memory. Add RAM, purge old data, or partition such that you only need to touch very few partitions.
drop index's before insert and set them up on completion
I have a large table (~170 million rows, 2 nvarchar and 7 int columns) in SQL Server 2005 that is constantly being inserted into. Everything works ok with it from a performance perspective, but every once in a while I have to update a set of rows in the table which causes problems. It works fine if I update a small set of data, but if I have to update a set of 40,000 records or so it takes around 3 minutes and blocks on the table which causes problems since the inserts start failing.
If I just run a select to get back the data that needs to be updated I get back the 40k records in about 2 seconds. It's just the updates that take forever. This is reflected in the execution plan for the update where the clustered index update takes up 90% of the cost and the index seek and top operator to get the rows take up 10% of the cost. The column I'm updating is not part of any index key, so it's not like it reorganizing anything.
Does anyone have any ideas on how this could be sped up? My thought now is to write a service that will just see when these updates have to happen, pull back the records that have to be updated, and then loop through and update them one by one. This will satisfy my business needs but it's another module to maintain and I would love if I could fix this from just a DBA side of things.
Thanks for any thoughts!
Actually it might reorganise pages if you update the nvarchar columns.
Depending on what the update does to these columns they might cause the record to grow bigger than the space reserved for it before the update.
(See explanation now nvarchar is stored at http://www.databasejournal.com/features/mssql/physical-database-design-consideration.html.)
So say a record has a string of 20 characters saved in the nvarchar - this takes 20*2+2(2 for the pointer) bytes in space. This is written at the initial insert into your table (based on the index structure). SQL Server will only use as much space as your nvarchar really takes.
Now comes the update and inserts a string of 40 characters. And oops, the space for the record within your leaf structure of your index is suddenly too small. So off goes the record to a different physical place with a pointer in the old place pointing to the actual place of the updated record.
This then causes your index to go stale and because the whole physical structure requires changing you see a lot of index work going on behind the scenes. Very likely causing an exclusive table lock escalation.
Not sure how best to deal with this. Personally if possible I take an exclusive table lock, drop the index, do the updates, reindex. Because your updates sometimes cause the index to go stale this might be the fastest option. However this requires a maintenance window.
You should batch up your update into several updates (say 10000 at a time, TEST!) rather than one large one of 40k rows.
This way you will avoid a table lock, SQL Server will only take out 5000 locks (page or row) before esclating to a table lock and even this is not very predictable (memory pressure etc). Smaller updates made in this fasion will at least avoid concurrency issues you are experiencing.
You can batch the updates using a service or firehose cursor.
Read this for more info:
http://msdn.microsoft.com/en-us/library/ms184286.aspx
Hope this helps
Robert
The mos brute-force (and simplest) way is to have a basic service, as you mentioned. That has the advantage of being able to scale with the load on the server and/or the data load.
For example, if you have a set of updates that must happen ASAP, then you could turn up the batch size. Conversely, for less important updates, you could have the update "server" slow down if each update is taking "too long" to relieve some of the pressure on the DB.
This sort of "heartbeat" process is rather common in systems and can be very powerful in the right situations.
Its wired that your analyzer is saying it take time to update the clustered Index . Did the size of the data change when you update ? Seems like the varchar is driving the data to be re-organized which might need updates to index pointers(As KMB as already pointed out) . In that case you might want to increase the % free sizes on the data and the index pages so that the data and the index pages can grow without relinking/reallocation . Since update is an IO intensive operation ( unlike read , which can be buffered ) the performance also depends on several factors
1) Are your tables partitioned by data 2) Does the entire table lies in the same SAN disk ( Or is the SAN striped well ?) 3) How verbose is the transaction logging . Can the buffer size of the transaction loggin increased to support larger writes to the log to suport massive inserts ?
Its also important which API/Language are you using? e.g JDBC support a batch update feature which makes the updates a little bit efficient if you are doing multiple updates .
I have created a table in APEX that has a PK that is incremented by a SQL sequence:
CREATE SEQUENCE seq_increment
MINVALUE 1
START WITH 880
INCREMENT BY 1
CACHE 10
This seems to work perfectly. The issue is that sometimes, usually when I get on in the morning and run a process to enter a new row, it skips a bunch of numbers. I only care because these numbers are being used as the ID# of documents in my company and losing/skipping blocks of numbers is not going to be acceptable when this tool goes live.
It does seem to jump to the next '10' number. i.e. yesterday my last test assigned 883 and this morning it assigned 890 as the next number. Looking at my code for creation of the sequence I notice that I have set it up to cache 10 values so that it will process quicker. Is it possible that this cache is getting dumped over night and that it is pulling 890 because it had 880-889 in cache and it was dumped?
Are there other potential causes and solutions?
Sequences will not and can not generate gap-free values. So you'd expect that numbers will occasionally be skipped. That's perfectly normal when you're using sequences.
As you've surmised, the most likely scenario is that the sequence cache is aging out of the shared pool overnight when the APEX application isn't being used. You can reduce the frequency of gaps by declaring your sequence NOCACHE but that will decrease performance and it will not eliminate gaps it will just make them less frequent.
Oracle sequences are never guaranteed to be contiguous. If you need an absolutely contiguous set of values, you'll need to implement a custom solution.
Odds are that CACHE 10 is why you're losing numbers in this case. The cache value is how many sequence values are stored in memory for future use. Rebooting will clear the cache and cause 10 new values to be retrieved. Similarly, if the sequence is not used for long enough, the current set of values may be flushed out of the shared pool, also causing a new set of values to be retrieved.
This is clearly not the case in your instance, but sequence numbers can also be lost due to rollbacks. A rolled back transaction involving one or more sequences discards the sequence value(s).
Some sequence numbers have been aged out of one of the in-memory structures (shared pool I think?). This is expected behaviour for sequences. The only guarantee that you have is that they are unique. If you need to present gap-free sequences you'll have to do this at reporting time using e.g. rownum pseudo-column. It is made this way deliberately otherwise you would have to serialise all inserts i.e. lock table. And even that wouldn't work properly if an insert was rolled back!
Let’s say you have a table with about 5 million records and a nvarchar(max) column populated with large text data. You want to set this column to NULL if SomeOtherColumn = 1 in the fastest possible way.
The brute force UPDATE does not work very well here because it will create large implicit transaction and take forever.
Doing updates in small batches of 50K records at a time works but it’s still taking 47 hours to complete on beefy 32 core/64GB server.
Is there any way to do this update faster? Are there any magic query hints / table options that sacrifices something else (like concurrency) in exchange for speed?
NOTE: Creating temp table or temp column is not an option because this nvarchar(max) column involves lots of data and so consumes lots of space!
PS: Yes, SomeOtherColumn is already indexed.
From everything I can see it does not look like your problems are related to indexes.
The key seems to be in the fact that your nvarchar(max) field contains "lots" of data. Think about what SQL has to do in order to perform this update.
Since the column you are updating is likely more than 8000 characters it is stored off-page, which implies additional effort in reading this column when it is not NULL.
When you run a batch of 50000 updates SQL has to place this in an implicit transaction in order to make it possible to roll back in case of any problems. In order to roll back it has to store the original value of the column in the transaction log.
Assuming (for simplicity sake) that each column contains on average 10,000 bytes of data, that means 50,000 rows will contain around 500MB of data, which has to be stored temporarily (in simple recovery mode) or permanently (in full recovery mode).
There is no way to disable the logs as it will compromise the database integrity.
I ran a quick test on my dog slow desktop, and running batches of even 10,000 becomes prohibitively slow, but bringing the size down to 1000 rows, which implies a temporary log size of around 10MB, worked just nicely.
I loaded a table with 350,000 rows and marked 50,000 of them for update. This completed in around 4 minutes, and since it scales linearly you should be able to update your entire 5Million rows on my dog slow desktop in around 6 hours on my 1 processor 2GB desktop, so I would expect something much better on your beefy server backed by SAN or something.
You may want to run your update statement as a select, selecting only the primary key and the large nvarchar column, and ensure this runs as fast as you expect.
Of course the bottleneck may be other users locking things or contention on your storage or memory on the server, but since you did not mention other users I will assume you have the DB in single user mode for this.
As an optimization you should ensure that the transaction logs are on a different physical disk /disk group than the data to minimize seek times.
Hopefully you already dropped any indexes on the column you are setting to null, including full text indexes. As said before, turning off transactions and the log file temporarily would do the trick. Backing up your data will usually truncate your log files too.
You could set the database recovery mode to Simple to reduce logging, BUT do not do this without considering the full implications for a production environment.
What indexes are in place on the table? Given that batch updates of approx. 50,000 rows take so long, I would say you require an index.
Have you tried placing an index or statistics on someOtherColumn?
This really helped me. I went from 2 hours to 20 minutes with this.
/* I'm using database recovery mode to Simple */
/* Update table statistics */
set transaction isolation level read uncommitted
/* Your 50k update, just to have a measures of the time it will take */
set transaction isolation level READ COMMITTED
In my experience, working in MSSQL 2005, moving everyday (automatically) 4 Million 46-byte-records (no nvarchar(max) though) from one table in a database to another table in a different database takes around 20 minutes in a QuadCore 8GB, 2Ghz server and it doesn't hurt application performance. By moving I mean INSERT INTO SELECT and then DELETE. The CPU usage never goes over 30 %, even when the table being deleted has 28M records and it constantly makes around 4K insert per minute but no updates. Well, that's my case, it may vary depending on your server load.
READ UNCOMMITTED
"Specifies that statements (your updates) can read rows that have been modified by other transactions but not yet committed." In my case, the records are readonly.
I don't know what rg-tsql means but here you'll find info about transaction isolation levels in MSSQL.
Try indexing 'SomeOtherColumn'...50K records should update in a snap. If there is already an index in place see if the index needs to be reorganized and that statistics have been collected for it.
If you are running a production environment with not enough space to duplicate all your tables, I believe that you are looking for trouble sooner or later.
If you provide some info about the number of rows with SomeOtherColumn=1, perhaps we can think another way, but I suggest:
0) Backup your table
1) Index the flag column
2) Set the table option to "no log tranctions" ... if posible
3) write a stored procedure to run the updates