I want to efficiently search IPv6 subnet range using redis.
i thought of storing the IPv6 numeric addresses in redis and search them by range.
those are 128-bit ints, e.g:
import ipaddress
int(ipaddress.ip_address(u'113f:a:2:3:4:1::77'))
> 22923991422715307029586104612626104439L
and query by range:
ZRANGEBYSCORE numerics <subnet-S-start> <subnet-S-end>
HOWEVER, redis sorted-sets can hold score of up to 2^53, so all my large ints are being trimmed and I'm losing precision.
Is there a way to save such large numbers in redis without losing precision?
Do you have a better suggestion?
Thanks
You can use the lexical range API, it will suit you exactly. https://redis.io/commands/zrangebylex
Insert the addresses with a score of 0, I don't even think you need to encode them as numbers, just pad the individual bytes, and you should be able to query an range.
Related
I would like to store values as a score in a redis sorted set that be be as big as 10^24 (and if possible even 2^256)
What are the integer size limits with ZRANGE?
For some context I'm trying to implement a ranking of top holders for a custom ethereum token. e.g. https://etherscan.io/token/0xdac17f958d2ee523a2206206994597c13d831ec7#balances
I want to hold the balances in a Redis DB and access it through node.js. I can retrieve the actual balances using web3, in case the db crashes or something. The point is i would like to have the data sorted and i would like to be able to access the data blazingly fast.
Quotation from the Redis documentation about sorted sets:
Range of integer scores that can be expressed precisely
Redis sorted sets use a double 64-bit floating point number to represent the score. In all the architectures we support, this is represented as an IEEE 754 floating point number, that is able to represent precisely integer numbers between -(2^53) and +(2^53) included. In more practical terms, all the integers between -9007199254740992 and 9007199254740992 are perfectly representable. Larger integers, or fractions, are internally represented in exponential form, so it is possible that you get only an approximation of the decimal number, or of the very big integer, that you set as score.
So when leaving the precise range and an approximation of the score is good enough for your use case, wikipedia says that 2^1023 would be the highest exponent possible.
My goal is to store in Redis:
Plain IP addresses like 228.228.228.228
IP networks like 228.228.228.0/24
in order to check in request/response cycle whether or not
current IP xxx.yyy.xxx.vvv is inside( contained by):
Plain ips
or
Ip network ( for example 228.228.228.228 inside 228.228.228.0/24)
Overall amount of ips and networks - few 1000 items.
Question is – what is the best way (best structure) to store both plain ips and networks in Redis and make aforementioned check without fetching data from Redis?
Thanks.
P.S. Current IP is already known.
UPDATE
Ok, lets simplify it a bit with example.
I have 2 ips and 2 networks in where I want to check if certain ip is contained.
# 2 plain ip
202.76.250.29
37.252.145.1
# 2 networks
16.223.132.0/24
9.76.202.0/24
There are 2 possible ways where exact ip might be contained:
1)Just in plain ips. For example 202.76.250.29 contained in the structure above and 215.08.11.23 is not contained simply by definition.
2)Ip might be contained inside network. For example 9.76.202.100 contained inside networks 9.76.202.0/24 but not contained inside list of plain ips as there are no any exact ip = 9.76.202.100.
Little bit of of explanation about ip networks. Very simplified.
Ip network represents range of ips. For example ipv4 network "192.4.2.0/24" represents 256 ip addresses:
IPv4Address('192.4.2.1'), IPv4Address('192.4.2.2'),
…
…
…
IPv4Address('192.4.2.253'), IPv4Address('192.4.2.254')
In another words ip network is a range of ip addresses
from '192.4.2.1' up to '192.4.2.254'
In our example 9.76.202.100 contained inside networks 9.76.202.0/24 as one of this addresses inside the range.
My idea is like this:
Any ip address can be represented as integer. One of our ip addresses
202.76.250.29 converted to integer is 3394042397.
As ip network is a range of ips, so that it is possible to convert it in a range of integers by converting first and last ip in range in integers.
For example one of our networks 16.223.132.0/24 represents range between IPv4Address('16.223.132.1') and IPv4Address('16.223.132.254'). Or integers range from 283083777 up to 283083781 with step 1.
Individual ip can be represented as range between it’s integer and it’s integer + 1 (lower bound included, upper bound excluded).
Obviously search in plain ips can be done by putting them to SET and then using SISMEMBER. But what about searching inside networks. Can we do some trick with ranges maybe?
"Best" is subjective(in memory, in speed etc) but you may use two sets/hash to store them. Since they are unique both hashes and sets would be fine. If you prefer you can use a single set/hash to save both ip and network ip addresses but i would prefer separate since they are two different type of data sets(just like database tables).
Then you can use either of those
SISMEMBER with O(1) time complexity
HEXISTS with O(1) time complexity.
It can be handled on application level with multiple commands or lua script(in a single transaction).
Depending on your choice add to your keys with SADD and HSET(the field value would be 1).
--
Edit: (hope i get it right)
For the range of network addresses create sets from the integers surrounding two dots such as 12.345.67.1-12.345.67.254 range will be represented as 12.345.67 and you will add this to the set. When you want to search for 12.345.67.x it will be parsed into 12.345.67 in your application level and you will check with SISMEMBER. Same can be done with hash with HEXISTS.
Since ip addresses contain four different numbers with three dots, you will discard last dot and last number and the rest will be representing(i assume) the network range.
For IPs you can use Set and query by certain IP within O(1) time.
For IP range, I think you can use List with Lua Script for query. List will have O(n) time for searching, but since you only have 1000 items, O(N) and O(1) will not have a huge difference for Redis in memory query.
I'm adding values to my sorted set where the score could be as big as 38 digits, for example 5.5857766150356906e+37. From my tests it seems like redis can handle commands like zrangebyscore fine with them, but I still feel like I need to ask - is there a limit to how big these scores can be?
Am I doing something way too expensive for day-to-day redis when I start storing more values?
According to the Redis documentation, a sorted set score is:
the string representation of a double precision floating point number.
A double-precision floating point number:
allows the representation of numbers between 10−308 and 10308, with full 15–17 decimal digits precision.
So you're fine.
Hello I was trying to find a good way to hash a set of numerical numbers which its output would be under 20 characters that are positive and unique. Any one have any suggestions?
For hashing in general, I'd use the HASHBYTES function. You can then convert the binary data to a string and just pick the first 20 characters, that should still be unique enough.
To get around HASHBYTES limitations (8000 bytes for instance), you can incrementally hash, e.g. for each value concat the previous hash with the value to be added and hash that again. This will make it unique with order etc. and unless you append close to 8000 bytes in one value it will not cause data truncation for the hashing.
EDIT
Here is the problem I am trying to solve:
I have a string broken up into multiple parts. These parts are not of equal, or predictable length. Each part will have a hash value. When I concatenate parts I want to be able to use the hash values from each part to quickly get the hash value for the parts together. In addition the hash generated by putting the parts together must match the hash generated if the string were hashed as a whole.
Basically I want a hashing algorithm where the parts of the data being hashed can be hashed in parallel, and I do not want the order or length of the pieces to matter. I am not breaking up the string, but rather receiving it in unpredictable chunks in an unpredictable order.
I am willing to ensure an elevated collision rate, so long as it is not too elevated. I am also ok with a slightly slower algorithm as it is hardly noticeable on small strings, and done in parallel for large strings.
I am familiar with a few hashing algorithms, however I currently have a use-case for a hash algorithm with the property that the sum of two hashes is equal to a hash of the sum of the two items.
Requirements/givens
This algorithm will be hashing byte-strings with length of at least 1 byte
hash("ab") = hash('a') + hash('b')
Collisions between strings with the same characters in different order is ok
Generated hash should be an integer of native size (usually 32/64 bits)
String may contain any character from 0-256 (length is known, not \0 terminated)
The ascii alpha-numeric characters will be by far the most used
A disproportionate number of strings will be 1-8 ASCII characters
A very tiny percentage of the strings will actually contain bytes with values at or above 127
If this is a type of algorithm that has terminology associated with it, I would love to know that terminology. If I knew what a proper term/name for this type of hashing algorithm was it would be much easier to google.
I am thinking the simplest way to achieve this is:
Any byte's hash should be its value, normalized to <128 (if >128 subtract 128)
To get the hash of a string you normalize each byte to <128 and add it to the key
Depending on key size I may need to limit how many characters are used to hash to avoid overflow
I don't see anything wrong with just adding each (unsigned) byte value to create a hash which is just the sum of all the characters. There is nothing wrong with having an overflow: even if you reach the 32/64 bit limit (and it would have to be a VERY/EXTREMELY long string to do this) the overflow into a negative number won't matter in 2's complement arithmetic. As this is a linear process it doesn't matter how you split your string.