Implementing Round Robin insertions to oracle rac db with the help of sequence - sql

Problem
My system inserts records to oracle rac DB at a rate of 600tps. During the insertion-procedure-call each record is assigned a sequence, so that each record should get distributed among 20 different batch ids (implementation of a round robin mechanism).
Following is the step for selecting batch
1) A record comes. Assigns nextValue from a sequence.
2) Do MOD(sequence,20). It gives values from 0 to 19.
Issue:
3 records comes to DB simultaneously and hits 3 different nodes in RAC
Comes out with sequences 2,102,1002.
MOD for all happens to be same.
All try to get into the same batch.
Round Robin fails here.
Please help to resolve the issue.

This is due to the implementation of Sequences on RAC. When a node is first asked for the next value of a sequence it get a bunch of them (e.g. 100 to 119) and then hands them out until it needs a new lot, when it gets another bunch (160 - 179). While Node 1 is handing out 100 then 101, Node 2 will be handing out 121, 122 etc etc.
The size of the 'bunch' is controlled by as I remember the Cache size defined on a Sequence. If you set a cache size of 0, then you will get no caching, and the sequences will be handed out sequentially. However, doing that will involve the Nodes is management overhead while they work out what the next one actually is, and with 600tps this might not be a good idea: you'd have to try it and see,

Related

Negamax: what to do with "partial" results after canceling a search?

I'm implementing negamax with alpha/beta transposition table based on the pseudo code here, with roughly this algorithm:
NegaMax():
1. Transposition Table lookup
2. Loop through moves
2a. **Bail if I'm out of time**
2b. Make move, call -NegaMax, undo move
2c. Update bestvalue, alpha/beta but if appropriate
3. Transposition table store/update
4. Return bestvalue
I'm also using iterative deepening, calling NegaMax with progressively higher depths.
My question is: when I determine I've run out of time (2a. in the beginning of move loop) what is the right thing to do? Do I bail immediately (not updating the transposition table) or do I just break the loop (saving whatever partial work I've done)?
Currently, I return null at that point, signifying that the search was canceled before "completing" that node (whether by trying every move or the alpha/beta cut). The null gets propagated up and up the stack, and each node on the way up bails by return, so step 3 never runs.
Essentially, I only store values in the TT if the node "completed". The scenario I keep seeing with the iterative deepening:
I get through depths 1-5 really quick, so the TT has a depth = 5, type = Exact entry.
The depth = 6 search is taking a long time, so I bail.
I ultimately return the best move in the transposition table, which is the move I found during the depth = 5 search. The problem is, if I start a new depth = 6 search, it feels like I'm starting it from scratch. However, if I save whatever partial results I found, I worry that I'll have corrupted my TT, potentially by overwriting the completed depth = 5 entry with an incomplete depth = 6 entry.
If the search wasn't completed, the score is inaccurate and should likely not be added to the TT. If you have a best move from the previous ply and it is still best and the score hasn't dropped significantly, you might play that.
On the other hand, if at depth 6 you discover that the opponent has a mate in 3 (oops!) or could win your queen, you might have to spend even more time to try to resolve that.
That would leave you with less time for the remaining moves (if any...), but it might be better to be slightly short on time than to get mated with plenty of time remaining. :-)

Target Based commit point while updating into table

One of my mappings is running for a really long time (2 hours).From the session log i can see the statment "Time out based commit poin" which is tking most of the time and Busy percentage for the SQL tranfsormation is very high(which is taking time,I ran the SQL query manually in DB,its working fine ).So, basically there is a router which splits the record between insert and update.And the update stream is taking long.It has a SQL transforamtion,Update statrtergy and aggregator.I added an sorter before aggregator but no luck.
Also changed comit interval ,Lins Sequential Buffer lenght and Maximum memory allowed by checking some of the other blogs.Could you please help me with this.
If possible try to avoid the transformations which are creating cache because in future if the input records increase. Cache size will also increase and decrease the throughput
1) Aggregator : Try to use the Aggregation in SQL override itself
2) Sorter : Try to do the same in the SQL Override itself
Generally SQL transformation is slow for huge data loads, because for each input record an SQL session is invoked and a connection is established to database and the row is fetched. Say for example there are 1 million records, 1 million SQL sessions are initiated in the backend and the database is called.
What the SQL transformation doing ? Is it just generating a Surrogate key or its fetching a value from a table based on derived value from the stream
For fetching a value from a table based on derived value from the stream:
Try to use lookup
For generating Surrogate key, Use Oracle Sequence instead
Let me know if its purpose is any thing other than that
Also do the below checks
Sort the session log on thread and just make a note of start and end times of
the following
1) lookup caches creation (time between Query issued --> First row returned --> Cache creation completed)
2) Reader thread first row return time
Regards,
Raj

how to get next 1000 records the fastest way

I'm using Azure Table Storage.
Let's say i have a Partition in my Table with 10,000 records, and I would like to get records number 1000 to 1999. And next time i would like to get records number 4000 to 4999 etc.
What is the fastest way of doing that?
All I can find till now are two options, which I don't like very much:
1. run a query which returns all 10,000 records, and filter out what I want when I get all 10,000 records.
2. Run a query whichs returns 1000 records at a time, and use a continuation token to get the next 1000 records.
Is it possible to get a continuation token without downloading all corresponding records? It would be great if i can get Continuation Token 1, than get Continuation token 2, and with CT2 get records 2000 to 2999.
Theoretically you should be able to use continuation tokens without downloading the actual data for the first 1000 recors by closing the connection you have after the first request. And I mean closing it at TCP level. And before you read all data. Then open a new connection and use continuation token there. Two WebRequests will not do it since the HTTP implementation will likely use keep alive wchich means all your data is going to be read in the background even though you don't read it in your code. Actually you can configure your HTTP requests to not use keep alive.
However, another way is naturally if you know the RowKey and can search on that but I assume you don't know which row keys will be in each 1000 entity batch.
Last I would ask why you have this problem in the first place. And what your access pattern is. If inserts are common and getting these records is rare I wouldn't bother making it more efficient. if this is like a paging problem i would probably get all data on the first request and cache it (in the cloud). if inserts are rare but you need to run this query often I would consider making the insertion of data have one partion for every 1000 entities and rebalance as needed (due to sorting) as entities are inserted.

NHibernate Hi/Lo - gaps in ids

Scenario: Hi/Lo is initialized for MyEntity with Lo 100
The table is empty.
Two sessions with different connections have both inserted three items.
TableIds
1
2
3
100
101
102
If a third one comes in at a later time and inserts three items:
TableIds
...
200
201
202
Is there a way to not get these gaps?
The Lo value is stored against the SessionFactory, not the Session. You only ever get gaps when you restart your application and create a new instance of the SessionFactory.
A new Hi value is pulled from the database and stored in the SessionFactory, so if you have a web farm, each site will have it's own instance of the SessionFactory which will each have it's own 'next' Hi value from the database. When it runs out of Lo values it will update the Hi to the next available Hi from the database.
Edit:
If you have client applications then I would recommend not using HiLo at all, instead use a GuidComb, it's a sequential Guid and you wont have an issue with gaps.
Since it's an existing app however and you can't really go changing the identifier, I would recommend requiring the client applications to insert via a Web Service which has it's own SF, that way you can maintain a single Hi instead of multiple Hi's per client app.
If you can't do that then you're going to have to lower your Lo.
As Phill quite rightly pointed out the gaps appear when you build a sessionFactory e.g. the application restarts etc or you are running on the cloud and each node has its own set of Hi and lo.
You could reduce the lo to 10 rather than 100 to reduce the gaps and/or I recommend using int64 rather than int32 for ease of mind.
However does the gaps actually matter? Would you ever see yourself running out?
I have never read anything negative about performance (database issues) with a large number of gaps when using Hilo. The only thing I have seen is that some people complain that they run out when using int32 or when the lo has been set to high.

SQL connection lifetime

I am working on an API to query a database server (Oracle in my case) to retrieve massive amount of data. (This is actually a layer on top of JDBC.)
The API I created tries to limit as much as possible the loading of every queried information into memory. I mean that I prefer to iterate over the result set and process the returned row one by one instead of loading every rows in memory and process them later.
But I am wondering if this is the best practice since it has some issues:
The result set is kept during the whole processing, if the processing is as long as retrieving the data, it means that my result set will be open twice as long
Doing another query inside my processing loop means opening another result set while I am already using one, it may not be a good idea to start opening too much result sets simultaneously.
On the other side, it has some advantages:
I never have more than one row of data in memory for a result set, since my queries tend to return around 100k rows, it may be worth it.
Since my framework is heavily based on functionnal programming concepts, I never rely on multiple rows being in memory at the same time.
Starting the processing on the first rows returned while the database engine is still returning other rows is a great performance boost.
In response to Gandalf, I add some more information:
I will always have to process the entire result set
I am not doing any aggregation of rows
I am integrating with a master data management application and retrieving data in order to either validate them or export them using many different formats (to the ERP, to the web platform, etc.)
There is no universal answer. I personally implemented both solutions dozens of times.
This depends of what matters more for you: memory or network traffic.
If you have a fast network connection (LAN) and a poor client machine, then fetch data row by row from the server.
If you work over the Internet, then batch fetching will help you.
You can set prefetch count or your database layer properties and find a golden mean.
Rule of thumb is: fetch everything that you can keep without noticing it
if you need more detailed analysis, there are six factors involved:
Row generation responce time / rate(how soon Oracle generates first row / last row)
Row delivery response time / rate (how soon can you get first row / last row)
Row processing response time / rate (how soon can you show first row / last row)
One of them will be the bottleneck.
As a rule, rate and responce time are antagonists.
With prefetching, you can control the row delivery response time and row delivery rate: higher prefetch count will increase rate but decrease response time, lower prefetch count will do the opposite.
Choose which one is more important to you.
You can also do the following: create separate threads for fetching and processing.
Select just ehough rows to keep user amused in low prefetch mode (with high response time), then switch into high prefetch mode.
It will fetch the rows in the background and you can process them in the background too, while the user browses over the first rows.