Could the STAN number be repeteable and random? - iso8583

I'm developing a Connector with some bank, and we're using the ISO8583 protocol, right now, i'm setting the STAN(field 11) with some random number generated with a random generator but sometimes I have some number collisions, the question is, could I safely use this generator or do I need to make the STAN a sequential number?
Thanks in advance.

The System Trace Audit Number (STAN) ISO-8583 number has different values and is maintained basically between relationships within the transaction. That is it can stay the same or the same transaction will have many STANs over its transaction path but it SHOULD be the same between two end point and it is usually controlled in settings whos STAN to use.
For Example:
Terminal -> Terminal Driver -> Switch 1->Switch 2->Issuer
The STAN is say assign by the terminal driver and then remains constant at minimum for the following relationships... though may change for each relationship.
Terminal Driver - Switch 1
Switch 1 -> Switch 2
Switch 2 -> Issuer
Note that internally within each system to the STAN may be unique as well but it needs to keep a unique STAN for each relationship.. and it shouldn't change between the request and response as it is needed for multi-part transactions (Single PA, Multiple Completions & Multi-PA, Single Completion) as well as for reversals and such in Data Element 90.

Depends on your remote endpoint, but I've seen many requiring sequential numbers, and detecting duplicates.

Usually STAN is the number increased for each request.
Random STAN generation is not the best case for network messages sequences.
The duplication of STANs can be due to different sources, i.e. Host clients or Terminals.
STAN itself cannot be the only field to detect unique transaction requests. It must be mixed together with other fields like RRN, Terminal ID, Merchant ID.
See also "In ISO message, what's the use of stan and rrn ?"

Related

How does Basic Paxos proposer know when to increment roundId / proposal number

Looking at the screenshot from this video, around 27:20
https://www.youtube.com/watch?v=JEpsBg0AO6o
When server S5 sends out the prepare RPC, it uses the roundId (proposal number) 4 and server number 5 (hence 4.5) as well as value "Y".
But how does it know 4 is the the roundId to use? Earlier, S1 used up roundId 3, but there's no way for S5 to know about that, as there hasn't been communication between S5 and anybody else at the time S5 chose 4 as roundId.
In theory, there is no need to know what is the latest number as every proposer will keep increasing the round number until it gets it right.
In your example S5 knows nothing, so it will start with smallest number and then keep going up.
In practical application, when a proposer proposes a number, and if the proposal is declined by an acceptor, the declined message will contain the largest round number seen so far by that acceptor; this will help the proposer to retry with a larger round number (instead of increasing their current number one by one).
-- Edit: Max posted a link with an answer claiming (at of now) that N has to be unique per proposer
Let me explain why there is no global uniqueness requirement by an example.
Let's say we have a system with two proposers and three accepters(and few learners):
Both proposers sent PREPARE(1) - same number - to all acceptors.
Based on paxos rules, only one of proposers will get the majority of PROMISE messages - this is based on the rule that acceptor promises if PREPARE has number strongly greater than any other previously seen by the acceptor.
Now we are in a state where one proposer has two (or three) PROMISES for N=1 and the other proposer has one (or zero) PROMISE with N = 1.
Only first proposer may issue ACCEPT(1, V) as it got majority. The other proposer does not have the majority of PROMISES and has to retry with a larger N.
After the other proposer retries, it will use N larger than any other it saw before - hence it will try with N=2
From now on, it's all works the same way - proposer PREPARES and if get majority of PROMISE for its N, then the proposer issues ACCEPT(N, VALUE_ACCORDING_TO_PROTOCOL)
The key understanding for paxos is there is no way to have two ACCEPT(N, V) messages being sent where same N will have different V, hence there is no issue that two proposer use the same N.
As for initiating every node with some unique ID - that's ok; will it improve the performance - it's a big question and I haven't see a formal proof for that yet.

Data Science: Using Inferential Statistics to label train dataset

Lack of High Schools in remote areas is a problem for students in developing country. Students in some locations are better than that in other. So, I have to find those locations. Now, the main problem is defining "BETTER". I have made some rules that will define the profile of a location.
Right now, I am concerned with the good students.
So, what I have done is-
1. Used some inferential statistics to and made some rules to come up with the conclusion that Location A,B,C,etc are the most potential locations where you can put the high schools because according to my rules these locations contain quality students.
I did all of the things above to label the data because I required to define "BETTER" and label the data so that I can now use machine learning algorithm to learn the factors which makes a location a potential location so that if I give a data point from test data to the model, it will instantly tell if the location is better or not.
Overview of the method:
For each location, I have these 4 information:
total_students_staying_for_high_school_education(A),
total_students_leaving_for_high_school_education_in_another_place(B),
mean_grade_point_of_students_of_type_B,
ratio (calculated as B/A),
For the location whose ratio > 1
I applied the chi-squared significance test to come up with a statistic which would tell me if students are leaving that place in significant amount than staying. I used ANOVA and then Tukey test to compare means_grade points and then find combinations of pairs of locations whose means vary and whose is greater than the others.
I then wrote a python program with a custom comparator that first compares if mean_grade of those points vary and returns the one with greater mean. If the means don't vary, the comparator return the location with the one whose chi-squared value is greater.
This is how, the whole process comes up with few suggestions of location and I call those location "BETTER".
What I am concerned about is-
1. How do I verify if my rules are valid? Or do I even need to verify it?
2. Most importantly, is mingling statistics with machine learning as described above an appropriate approach?Is there any major leakage in the method?Can anyone suggest a more general method?

What is difference between equivalence class testing and input domain partitioning?

I'm learning software testing now, just wondering what is difference between equivalence class testing and input domain partitioning, seems like both of them about to partition input domain.
Frankly saing, during my career as software testing engineer I haven't met a lot of mentions about input domain partitions.
But nevertheless this term exists and let's try to take a look is there a difference between equivalence class testing and input domain partitioning?
Equivalence class technique divides possible test data for, let's say application module, into partitions of equivalent data. They're "equivalent" because any member of that partition can perfectly represent the other member of that partition, and theoretically you need only one test using one of the partitions' members in order to make testing of that partition enough sufficient. Moreover the partitions should not overlap.
Yes I know, that's a little bit cumbersome, but let's take a look on the example: you have an input field on the web page which accepts all kind of chars but up to 256 of them. It gives you following equivalence partitions (simplified):
Char types:
only letters
only numbers
only special chars
mixed chars (letters + numbers + spec. chars)
Char quantity:
0
>0
<256
256
Each of that equivalence partitions has sub-partitions, e.g. "letters":
Big letters
Small letters
Mixed letters
That means that in order to sufficiently test "letters partitions" you have to design test case which will include at least one of those sub-partitions. Let's say it will be "letters -> Big letters": "TEST INPUT STRING". Take a look that here we've also combined our test string with "Char quantity - >0" equivalence partition.
So basicly saying combining sub-partitions of "Char types" and "Char quantities" partitions, you'll be able to design a minimum test set for testing input data of that field.
From the other side input domain for a program contains all the possible inputs to that program which is farely equal to equivalence classes of possible inputs of the application module.
Sometimes the ones who speak about input domain for a program, say also about regions which is the same thing as sub-partition of equivalence partitions. Moreover those input domains (and accordingly regions) must not overlap (so must they not within equivalence partition testing).
With all that said I would consider those two terms as ones, that describe the same matter but using different words.

How predictable is NEWSEQUENTIALID?

According to Microsoft's documentation on NEWSEQUENTIALID, the output of NEWSEQUENTIALID is predictable. But how predictable is predictable? Say I have a GUID that was generated by NEWSEQUENTIALID, how hard would it be to:
Calculate the next value?
Calculate the previous value?
Calculate the first value?
Calculate the first value, even without knowing any GUID's at all?
Calculate the amount of rows? E.g. when using integers, /order?id=842 tells me that there are 842 orders in the application.
Below is some background information about what I am doing and what the various tradeoffs are.
One of the security benefits of using GUID's over integers as primary keys is that GUID's are hard to guess. E.g. say a hacker sees a URL like /user?id=845 he might try to access /user?id=0, since it is probable that the first user in the database is an administrative user. Moreover, a hacker can iterate over /user?id=0..1..2 to quickly gather all users.
Similarly, a privacy downside of integers is that they leak information. /order?id=482 tells me that the web shop has had 482 orders since its implementation.
Unfortunately, using GUID's as primary keys has well-known performance downsides. To this end, SQL Server introduced the NEWSEQUENTIALID function. In this question, I would like to learn how predictable the output of NEWSEQUENTIALID is.
The underlying OS function is UuidCreateSequential. The value is derived from one of your network cards MAC address and a per-os-boot incremental value. See RFC4122. SQL Server does some byte-shuffling to make the result sort properly. So the value is highly predictable, in a sense. Specifically, if you know a value you can immediately predict a range of similar value.
However one cannot predict the equivalent of id=0, nor can it predict that 52DE358F-45F1-E311-93EA-00269E58F20D means the store sold at least 482 items.
The only 'approved' random generation is CRYPT_GEN_RANDOM (which wraps CryptGenRandom) but that is obviously a horrible key candidate.
In most cases, the next newsequentialid can be predicted by taking the current value and adding one to the first hex pair.
In other words:
1E29E599-45F1-E311-80CA-00155D008B1C
is followed by
1F29E599-45F1-E311-80CA-00155D008B1C
is followed by
2029E599-45F1-E311-80CA-00155D008B1C
Occasionally, the sequence will restart from a new value.
So, it's very predictable
NewSequentialID is a wrapper around the windows function UuidCreateSequential
You can try this code:
DECLARE #tbl TABLE (
PK uniqueidentifier DEFAULT NEWSEQUENTIALID(),
Num int
)
INSERT INTO #tbl(Num) values(1),(2),(3),(4),(5)
select * from #tbl
On my machine in this time is result:
PK Num
52DE358F-45F1-E311-93EA-00269E58F20D 1
53DE358F-45F1-E311-93EA-00269E58F20D 2
54DE358F-45F1-E311-93EA-00269E58F20D 3
55DE358F-45F1-E311-93EA-00269E58F20D 4
56DE358F-45F1-E311-93EA-00269E58F20D 5
You should try it several times in different time/date to interpolate the behaviour.
I tried it run several times and the first part is changing everytime (you see in results: 52...,53...,54...,etc...). I waited some time to check it, and after some time the second part is incremented too. I suppose the incementation continues to the all parts. Basically it look like simple +=1 incementation transformed into Guid.
EDIT:
If you want sequential GUID and you want have control over the values, you can use Sequences.
Sample code:
select cast(cast(next value for [dbo].[MySequence] as varbinary(max)) as uniqueidentifier)
• Calculate the next value? Yes
Microsoft says:
If privacy is a concern, do not use this function. It is possible to guess the value of the next generated GUID and, therefore, access data associated with that GUID.
SO it's a possibility to get the next value. I don't find information if it is possible to get the prevoius one.
from: http://msdn.microsoft.com/en-us/library/ms189786.aspx
edit: another few words about NEWSEQUENTIALID and security: http://vadivel.blogspot.com/2007/09/newid-vs-newsequentialid.html
Edit:
NewSequentialID contains the server's MAC address (or one of them), therefore knowing a sequential ID gives a potential attacker information that may be useful as part of a security or DoS attack.
from: Are there any downsides to using NewSequentialID?

Experimental/private branch for OID numbers in LDAP Schemas?

Attributes or object classes in LDAP schemas are identified through a unique number called OID. Moreover OIDs are also used in the SNMP protocol. Everyone can apply for an enterprise number by the IANA and then define his own subnumbers. But the processing of the application can last up to 30 days.
Does anyone know if there is a "test" branch of OID numbers that could be used for experimental purposes while waiting for an official enterprise number?
Apparently the OID branch 2.25 can be used with UUIDs without registration.
The detailled explanation can be found here:
http://www.oid-info.com/get/2.25 and there is also a link to an UUID generator.
=> I think it's good solution for unregistered OIDs. Simply generate one such OID with the UUID-Generator. You will get something like 2.25.178307330326388478625988293987992454427 and can then simply make your own subnumbers by adding .1, .2, ... at the end.
There is also the possibility to register such an 2.25 OID, but a human intervention is still needed and uniqueness isn't totally garanteed as it is still possible (although unlikely) that someone else uses the same OID as unregistered OID. For registered OIDs I would still prefer the registration of a private entreprise number by the IANA.
Here is also a list of how to get an OID assigned: http://www.oid-info.com/faq.htm#10. But the main answers are already listed here.
No. However, if there is nothing published from your work no one will know.
Some LDAP server companies will sub OID numbers if you wanted to try something. But you could just makeup anything.
The currently assigned numbers only start with 0, 1, or 2. If you started with 4 or something, any savey person would know you were faking it.
We put some info together on OIDs here:
http://ldapwiki.willeke.com/wiki/HowToGetYourOwnLDAPOID
-jim
I don't know where you're based. In the UK, each company gets it's own OID branch to play with as it will http://www.oid-info.com/get/1.2.826.0
(Not sure if there are similar setups in other countries
You could try following for internal prototyping (check "Object Identifiers (OIDs)" paragraph).