How to increase the "uniqueness" of the apache session - apache

We are running an application with a lot of visitors on an Apache 2.5 with php5.6.
After considering our sessionids as unique for the longest time we discovered, that after about 12 months duplicates of sessionids are generated, which mess up our saved records in the database, which is connected to the sessionid as identifier.
Is there a possibility to make the sessionid "more" unique to reduce the possibility of duplicates?

Looks like I created a duplicate with
How unique is the php session id
So increasing the entropy length should help.
My system has as current configuration
session.entropy_length = 32;
session.entropy_file = /dev/urandom;
the other solution suggests 512 instead of 32.
However the main key to avoid duplicates (or make them less probable) is to increase the length of the session string. I guess we will add 4 characters more, that should help with our problem.

Related

REDIS usecase using large keys with small values

I have a use-case for using redis that is a little bit different.
In my MySQL I have an entity, let's call it HumanEntity. this HumanEntity has many to many relations.
HumanEntity.Urls - Many URLs per HumanEntity.
HumanEntity.UserNames - Many UserNames per HumanEntity.
HumanEntity.Phones ...
HumanEntity.Emails ...
in a normal one hour, the application creates hundreds of these many values.
The use-case is that, the application receives an HTTP call (100 per one second) with a HumanEntity value (Url or UserName or Phone or Email).
I need to scan my MySQL (1,000,000 records) and return back the HumanEntity.Id(integer) .
Since its ok to have some latency in the data integrity I thought about REDIS.
Can I store the values as a Redis key and the and the HumanEntity.Id(integer) as the value.
My API needs to return back the HumanEntity.Id(integer).
does it make sense to have such long key and such short value? The URL, for example, maybe 1500 bytes and the value can be 1 byte.
What is the best redis method to implement that?
Thanks
If the values are not unique then you may have some problem. Phones, emails or usernames maybe unique for user but i am not sure about url or any other property stored in your database. You may overwrite the value of an identifier with another user's.
If you don't have any problem like that; you may proceed with string types, Time complexity of GET and SET is O(1) - that's the best you may get.
In some cases such as checking whether the user used any coupon, you may use long(let's say 64 chars) user id as key, and 1 as value and use EXISTS to determine it. So it's valid to use long key and short value.

bitcoin block solving, all nonces used but no hit

I'm trying to understand how bitcoin block solving attempts works.
I see a nonce is a 32-bit number, so around 4 billion values to try.
Also, I saw a famous mining pool having 500 Ph/s power at hand. And I found there one particular block solved in 40 minutes.
So, that is (40 x 3600) x (500 x 10^15) = 7.2 x 10^22 hashes calculated
on that pool, to solve one block.
That means the nonces has been "cycled" 16763 billion times during those 40 minutes.
So I'm wondering what are those 16763 billion more things done after each nonce cycle? ("1 cycle of nonces" is going from 0 to 4294967295) ?
I see that we can change the timestamp at a certain proportion, and the merkel root hash also.
Aren't merkel hashes and timestamps more strict to calculate and use than nonces?
Those 16763 billion things are changes of the timestamp and merkel only? Can we have as much unique merkel hashes re-generated and timestamps changes as needed?
Can you give me examples? sorry if my view is a bit biased, I'm starting with this.
Apparently, I've found that when the nonces have cycled (overflow), an extraNonce value is incremented, and that requires the Merkel hash to be recalculated based on that extraNonce value.
a link here

How predictable is NEWSEQUENTIALID?

According to Microsoft's documentation on NEWSEQUENTIALID, the output of NEWSEQUENTIALID is predictable. But how predictable is predictable? Say I have a GUID that was generated by NEWSEQUENTIALID, how hard would it be to:
Calculate the next value?
Calculate the previous value?
Calculate the first value?
Calculate the first value, even without knowing any GUID's at all?
Calculate the amount of rows? E.g. when using integers, /order?id=842 tells me that there are 842 orders in the application.
Below is some background information about what I am doing and what the various tradeoffs are.
One of the security benefits of using GUID's over integers as primary keys is that GUID's are hard to guess. E.g. say a hacker sees a URL like /user?id=845 he might try to access /user?id=0, since it is probable that the first user in the database is an administrative user. Moreover, a hacker can iterate over /user?id=0..1..2 to quickly gather all users.
Similarly, a privacy downside of integers is that they leak information. /order?id=482 tells me that the web shop has had 482 orders since its implementation.
Unfortunately, using GUID's as primary keys has well-known performance downsides. To this end, SQL Server introduced the NEWSEQUENTIALID function. In this question, I would like to learn how predictable the output of NEWSEQUENTIALID is.
The underlying OS function is UuidCreateSequential. The value is derived from one of your network cards MAC address and a per-os-boot incremental value. See RFC4122. SQL Server does some byte-shuffling to make the result sort properly. So the value is highly predictable, in a sense. Specifically, if you know a value you can immediately predict a range of similar value.
However one cannot predict the equivalent of id=0, nor can it predict that 52DE358F-45F1-E311-93EA-00269E58F20D means the store sold at least 482 items.
The only 'approved' random generation is CRYPT_GEN_RANDOM (which wraps CryptGenRandom) but that is obviously a horrible key candidate.
In most cases, the next newsequentialid can be predicted by taking the current value and adding one to the first hex pair.
In other words:
1E29E599-45F1-E311-80CA-00155D008B1C
is followed by
1F29E599-45F1-E311-80CA-00155D008B1C
is followed by
2029E599-45F1-E311-80CA-00155D008B1C
Occasionally, the sequence will restart from a new value.
So, it's very predictable
NewSequentialID is a wrapper around the windows function UuidCreateSequential
You can try this code:
DECLARE #tbl TABLE (
PK uniqueidentifier DEFAULT NEWSEQUENTIALID(),
Num int
)
INSERT INTO #tbl(Num) values(1),(2),(3),(4),(5)
select * from #tbl
On my machine in this time is result:
PK Num
52DE358F-45F1-E311-93EA-00269E58F20D 1
53DE358F-45F1-E311-93EA-00269E58F20D 2
54DE358F-45F1-E311-93EA-00269E58F20D 3
55DE358F-45F1-E311-93EA-00269E58F20D 4
56DE358F-45F1-E311-93EA-00269E58F20D 5
You should try it several times in different time/date to interpolate the behaviour.
I tried it run several times and the first part is changing everytime (you see in results: 52...,53...,54...,etc...). I waited some time to check it, and after some time the second part is incremented too. I suppose the incementation continues to the all parts. Basically it look like simple +=1 incementation transformed into Guid.
EDIT:
If you want sequential GUID and you want have control over the values, you can use Sequences.
Sample code:
select cast(cast(next value for [dbo].[MySequence] as varbinary(max)) as uniqueidentifier)
• Calculate the next value? Yes
Microsoft says:
If privacy is a concern, do not use this function. It is possible to guess the value of the next generated GUID and, therefore, access data associated with that GUID.
SO it's a possibility to get the next value. I don't find information if it is possible to get the prevoius one.
from: http://msdn.microsoft.com/en-us/library/ms189786.aspx
edit: another few words about NEWSEQUENTIALID and security: http://vadivel.blogspot.com/2007/09/newid-vs-newsequentialid.html
Edit:
NewSequentialID contains the server's MAC address (or one of them), therefore knowing a sequential ID gives a potential attacker information that may be useful as part of a security or DoS attack.
from: Are there any downsides to using NewSequentialID?

ROR - Generate an alpha-numeric string for a DB ID

In our DB, every Person has an ID, which is the DB generated, auto-incremented integer. Now, we want to generate a more user-friendly alpha-numeric ID which can be publicly exposed. Something like the Passport number. We obviously don't want to expose the DB ID to the users. For the purpose of this question, I will call what we need to generate, the UID.
Note: The UID is not meant to replace the DB ID. You can think of the UID as a prettier version of the DB ID, which we can give out to the users.
I was wondering if this UID can be a function of the DB ID. That is, we should be able to re-generate the same UID for a given DB ID.
Obviously, the function will take a 'salt' or key, in addition to the DB ID.
The UID should not be sequential. That is, two neighboring DB IDs should generate visually different-looking UIDs.
It is not strictly required for the UID to be irreversible. That is, it is okay if somebody studies the UID for a few days and is able to reverse-engineer and find the DB ID. I don't think it will do us any harm.
The UID should contain only A-Z (uppercase only) and 0-9. Nothing else. And it should not contain characters which can be confused with other alphabets or digits, like 0 and O, l and 1 and so on. I guess Crockford's Base32 encoding takes care of this.
The UID should be of a fixed length (10 characters), regardless of the size of the DB ID. We could pad the UID with some constant string, to bring it to the required fixed length. The DB ID could grow to any size. So, the algorithm should not have any such input limitations.
I think the way to go about this is:
Step 1: Hashing.
I have read about the following hash functions:
SHA-1
MD5
Jenkin's
The hash returns a long string. I read here about something called XOR folding to bring the string down to a shorter length. But I couldn't find much info about that.
Step 2: Encoding.
I read about the following encoding methods:
Crockford Base 32 Encoding
Z-Base32
Base36
I am guessing that the output of the encoding will be the UID string that I am looking for.
Step 3: Working around collisions.
To work around collisions, I was wondering if I could generate a random key at the time of UID generation and use this random key in the function.
I can store this random key in a column, so that we know what key was used to generate that particular UID.
Before inserting a newly generated UID into the table, I would check for uniqueness and if the check fails, I can generate a new random key and use it to generate a new UID. This step can be repeated till a unique UID is found for a particular DB ID.
I would love to get some expert advice on whether I am going along the correct lines and how I go about actually implementing this.
I am going to be implementing this in a Ruby On Rails app. So, please take that into consideration in your suggestions.
Thanks.
Update
The comments and answer made me re-think and question one of the requirements I had: the need for us to be able to regenerate the UID for a user after assigning it once. I guess I was just trying to be safe, in the case where we lose a user's UID and we will able to get it back if it is a function of an existing property of the user. But we can get around that problem just by using backups, I guess.
So, if I remove that requirement, the UID then essentially becomes a totally random 10 character alphanumeric string. I am adding an answer containing my proposed plan of implementation. If somebody else comes with a better plan, I'll mark that as the answer.
As I mentioned in the update to the question, I think what we are going to do is:
Pre-generate a sufficiently large number of random and unique ten character alphanumeric strings. No hashing or encoding.
Store them in a table in a random order.
When creating a user, pick the first these strings and assign it to the user.
Delete this picked ID from the pool of IDs after assigning it to a user.
When the pool reduces to a low number, replenish the pool with new strings, with uniqueness checks, obviously. This can be done in a Delayed Job, initiated by an observer.
The reason for pre-generating is that we are offloading all the expensive uniqueness checking to a one-time pre-generation operation.
When picking an ID from this pool for a new user, uniqueness is guaranteed. So, the operation of creating user (which is very frequent) becomes fast.
Would db_id.chr work for you? It would take the integers and generate a character string from them. You could then append their initials or last name or whatever to it. Example:
user = {:id => 123456, :f_name => "Scott", :l_name => "Shea"}
(user.id.to_s.split(//).map {|x| (x.to_i + 64).chr}).join.downcase + user.l_name.downcase
#result = "abcdefshea"

When would you ever set the increment value on a database identity field?

Given the table:
CREATE TABLE Table1
(
UniqueID int IDENTITY(1,1)
...etc
)
Now why would you ever set the increment to something other than 1?
I can understand setting the initial seed value differently. For example if, say, you're creating one database table per month of data (e.g. Table1_082009, Table1_092009) and want to start the UniqueID of the new table where the old one left off. (I probably wouldn't use that strategy myself, but hey, I can see people doing it).
But for the increment? I can only imagine it being of any use in really odd situations, for example:
after the initial data is inserted, maybe later someone will want to turn identity insert on and insert new rows in the gaps, but for efficient lookup on the index will want the rows to be close to each other?
if you're looking up ids based directly off a URL, and want to make it harder for people to arbitrarily access the other items (for example, instead of the user being able to work out that changing the URL suffix from /GetData?id=1000 to /GetData?id=1001, you set an increment of 437 so that the next url is actually /GetData?id=1437)? Of course if this is your "security" then you're probably already in trouble...
I can't think of anything else. Has anyone used an increment that wasn't 1, and why? I'm really just curious.
One idea might be using this to facilitate partitionnement of data (though there might be more "automated" ways to do that) :
Considering you have two servers :
On one server, you start at 1 and increment by 2
On the other server, you start at 2 and increment by 2.
Then, from your application, your send half inserts to one server, and the other half to the second server
some kind of software load-balancing
This way, you still have the ability to identify your entries : the "UniqueID" is still unique, even if the data is split on two servers / tables.
But that's only a wild idea -- there are probably some other uses to that...
Once, for pure fun, (Oh yeah, we have a wild side to us) decided to negative increment. It was strange to see the numbers grow in size and smaller in value at the same time.
I could hardly sit still in my chair.
edit (afterthought):
You think the creator of the IDENTITY was in love with FOR loops? You know..
for (i = 0; i<=99; i+=17)
or for those non semi-colon folks out there
For i = 0 to 100 step 17
Only for entertainment. And you have to be REALLY bored.