I would like to store values as a score in a redis sorted set that be be as big as 10^24 (and if possible even 2^256)
What are the integer size limits with ZRANGE?
For some context I'm trying to implement a ranking of top holders for a custom ethereum token. e.g. https://etherscan.io/token/0xdac17f958d2ee523a2206206994597c13d831ec7#balances
I want to hold the balances in a Redis DB and access it through node.js. I can retrieve the actual balances using web3, in case the db crashes or something. The point is i would like to have the data sorted and i would like to be able to access the data blazingly fast.
Quotation from the Redis documentation about sorted sets:
Range of integer scores that can be expressed precisely
Redis sorted sets use a double 64-bit floating point number to represent the score. In all the architectures we support, this is represented as an IEEE 754 floating point number, that is able to represent precisely integer numbers between -(2^53) and +(2^53) included. In more practical terms, all the integers between -9007199254740992 and 9007199254740992 are perfectly representable. Larger integers, or fractions, are internally represented in exponential form, so it is possible that you get only an approximation of the decimal number, or of the very big integer, that you set as score.
So when leaving the precise range and an approximation of the score is good enough for your use case, wikipedia says that 2^1023 would be the highest exponent possible.
Related
I'm adding values to my sorted set where the score could be as big as 38 digits, for example 5.5857766150356906e+37. From my tests it seems like redis can handle commands like zrangebyscore fine with them, but I still feel like I need to ask - is there a limit to how big these scores can be?
Am I doing something way too expensive for day-to-day redis when I start storing more values?
According to the Redis documentation, a sorted set score is:
the string representation of a double precision floating point number.
A double-precision floating point number:
allows the representation of numbers between 10−308 and 10308, with full 15–17 decimal digits precision.
So you're fine.
For most numbers, we know there will be some precision error with any floating point value. For a 32-bit float, that works out the be roughly 6 significant digits which will be accurate before you can expect to start seeing incorrect values.
I'm trying to store a human-readable value which can be read in and recreate a bit-accurate recreation of the serialized value.
For example, the value 555.5555 is stored as 555.55548095703125; but when I serialize 555.55548095703125, I could theoretically serialize it as anything in the range (555.5554504395, 555.555511475) (exclusive) and still get the same byte pattern. (Actually, probably that's not the exact range, I just don't know that there's value in calculating it more accurately at the moment.)
What I'd like is to find the most human-readable string representation for the value -- which I imagine would be the fewest digits -- which will be deserialized as the same IEEE float.
This is exactly a problem which was initially solved in 1990 with an algorithm the creators called "Dragon": https://dl.acm.org/citation.cfm?id=93559
There is a more modern technique from last year which is notably faster called "Ryu" (Japanese for "dragon"): https://dl.acm.org/citation.cfm?id=3192369
The GitHub for the library is here: https://github.com/ulfjack/ryu
According to their readme:
Ryu generates the shortest decimal representation of a floating point
number that maintains round-trip safety. That is, a correct parser can
recover the exact original number. For example, consider the binary
64-bit floating point number 00111110100110011001100110011010. The
stored value is exactly 0.300000011920928955078125. However, this
floating point number is also the closest number to the decimal number
0.3, so that is what Ryu outputs.
I'm using timestamps as the score. I want to prevent duplicates by appending a unique object-id to the score. Currently, this id is a 6 digit number (the highest id right now is 221849), but it is expected to increase over a million. So, the score will be something like
1407971846221849 (timestamp:1407971846 id:221849) and will eventually reach 14079718461000001 (timestamp:1407971846 id:1000001).
My concern is not being able to store scores because they've reached the max allowed.
I've read the docs, but I'm a bit confused. I know, basic math. But bear with me, I want to get this right.
Redis sorted sets use a double 64-bit floating point number to represent the score. In all the architectures we support, this is represented as an IEEE 754 floating point number, that is able to represent precisely integer numbers between -(2^53) and +(2^53) included. In more practical terms, all the integers between -9007199254740992 and 9007199254740992 are perfectly representable. Larger integers, or fractions, are internally represented in exponential form, so it is possible that you get only an approximation of the decimal number, or of the very big integer, that you set as score.
There's another thing bothering me right now. Would the increase in ids break the chronological sort sequence ?
I will appreciate any insights, suggestions, different prespectives or flat out if what I'm trying to do is non-sense.
Thanks for any help.
No, it won't break the "chronological" order, but you may loose the precision of the last digits, so two members may end up having the same score (i.e. non-unique).
There is no problem with duplicate scores. It is just maintaining a sorted set in memory. Members are unique but the scores may be the same. If you want chronological processing I would just rely on the timestamp without adding an id to it.
Appending an id would break the chronological sort if your ids are mixed such that you could have timestamps 1, 2, 3 (simple example) and ids 100, 10, 1, you won't get the correct sort. If your ids will always be added monotonically then you should just use the id as the score.
I have Big numpy array with numbers like 1.01594734e+09
I just want this data as integers or rounded off till 5 decimals in case of 1.01594734e+03
or something like that
You need to choose what you want. I assume you want to make the array smaller.
If you want to convert array a into integers, then:
a_int = a.astype('int')
However, keep in mind that this does not save any storage space as int is an 8-octet (64-bit) integer, and float is an 8-octet float.
If you know you have integer data which is limited in size, you may specify the storage format to be something shorter:
a_int = a.astype('int32')
If you have pure integer data which fits into the destination type, there is no loss of precision in this conversion.
On the other hand - depending on your data - you may have equally good results by using 4-octet (32-bit) floats:
a_shortfloat = a.astype('float32')
This conversion causes some loss of precision depending on the data.
The second alternative you suggest is to round a number into a given number of decimals, there are two quite different possibilities.
Simple rounding to 5 decimals:
a_rounded = a.round(decimals=5)
This, however, does not save any storage space, the numbers are only rounded (and they are not accurate even after that due to the limitations of the floating point representation).
Another possibility is to use a fixed point notation:
a_fixedpoint = (a * 100000 + .5).astype('int32')
With this representation your example number 1.01594734e+03 will become 101 594 734. Whether or not this is a useful representation depends on the application. Sometimes fixed-point numbers are very useful, but if your numbers have a wide dynamic range (e.g. from 1e-5 to 1e5), then floating point numbers are the correct way of handling them.
I am attempting to store a float in my SQLite3 database using java. When I go to store the number 1.2 in the database, it is actually stored as 1.199999998 & the same occurs for every even number (1.4, 1.6, etc.).
This makes is really diffult to delete rows because I delete a row according to its version column(whose type =float). So this line wont work:
"DELETE FROM tbl WHERE version=1.2"
Thats because there is no 1.2 but only 1.19999998. How can I make sure that when I store a float in my SQLite3 DB, that it is the exact number I input?
Don't use a float if you need precise accuracy. Try a decimal instead.
Remember that the 1.2 you put in your source code or that the user entered into a textbox and ultimately ended up in the database is actually stored as a binary value (usually in a format known as IEEE754). To understand why this is a problem, try converting 1.2 (1 1/5) to binary by hand (binary .1 is 1/2, .01 is 1/4) and see what you end up with:
1.001100110011001100110011001100110011
You can save time by using this converter (ignore the last "1" that breaks the cycle at the site —its because the converter had to round the last digit).
As you can see, it's a repeating pattern. This goes on pretty much forever. It would be like trying to represent 1/3 as a decimal. To get around this problem, most programming languages have a decimal type (as opposed to float or double) that keeps a base 10 representation. However, calculations done using this type are orders of magnitude slower, and so it's typically reserved for financial transactions and the like.
This is the very nature of floating point numbers. They are not exact.
I'd suggest you either use an integer, or text field to store a version.
You should never rely on the accuracy of a float or a double. A float should never be used for keys in a data base or to represent money.
You should probably use decimal in this case.
Floats are not an accurate data type. They are designed to be fast, have a large range of values, and have a small memory footprint.
They are usually implemented using the IEEE standard
http://en.wikipedia.org/wiki/IEEE_754-2008
As Joel Coehoorn has pointed out, 1.2 is the recurring fraction 1.0011 0011 0011... in binary and can't be exactly represented in a finite number of bits.
The closest you can get with an IEEE 754 float is 1.2000000476837158203125. The closest you can get with a double is 1.1999999999999999555910790149937383830547332763671875. I don't know where you're getting 1.199999998 from.
Floating-point was designed for representing approximate quantities: Physical measurements (a swimming pool is never exactly 1.2 meters deep), or irrational-valued functions like sqrt, log, or sin. If you need a value accurate to 15 significant digits, it works fine. If you truly need an exact value, not so much.
For a version number, a more appropriate representation would be a pair of integers: One for the major version and one for the minor version. This would also correctly handle the sequence 1.0, 1.1, ..., 1.9, 1.10, 1.11, which would sort incorrectly in a REAL column.