My goal is to store in Redis:
Plain IP addresses like 228.228.228.228
IP networks like 228.228.228.0/24
in order to check in request/response cycle whether or not
current IP xxx.yyy.xxx.vvv is inside( contained by):
Plain ips
or
Ip network ( for example 228.228.228.228 inside 228.228.228.0/24)
Overall amount of ips and networks - few 1000 items.
Question is – what is the best way (best structure) to store both plain ips and networks in Redis and make aforementioned check without fetching data from Redis?
Thanks.
P.S. Current IP is already known.
UPDATE
Ok, lets simplify it a bit with example.
I have 2 ips and 2 networks in where I want to check if certain ip is contained.
# 2 plain ip
202.76.250.29
37.252.145.1
# 2 networks
16.223.132.0/24
9.76.202.0/24
There are 2 possible ways where exact ip might be contained:
1)Just in plain ips. For example 202.76.250.29 contained in the structure above and 215.08.11.23 is not contained simply by definition.
2)Ip might be contained inside network. For example 9.76.202.100 contained inside networks 9.76.202.0/24 but not contained inside list of plain ips as there are no any exact ip = 9.76.202.100.
Little bit of of explanation about ip networks. Very simplified.
Ip network represents range of ips. For example ipv4 network "192.4.2.0/24" represents 256 ip addresses:
IPv4Address('192.4.2.1'), IPv4Address('192.4.2.2'),
…
…
…
IPv4Address('192.4.2.253'), IPv4Address('192.4.2.254')
In another words ip network is a range of ip addresses
from '192.4.2.1' up to '192.4.2.254'
In our example 9.76.202.100 contained inside networks 9.76.202.0/24 as one of this addresses inside the range.
My idea is like this:
Any ip address can be represented as integer. One of our ip addresses
202.76.250.29 converted to integer is 3394042397.
As ip network is a range of ips, so that it is possible to convert it in a range of integers by converting first and last ip in range in integers.
For example one of our networks 16.223.132.0/24 represents range between IPv4Address('16.223.132.1') and IPv4Address('16.223.132.254'). Or integers range from 283083777 up to 283083781 with step 1.
Individual ip can be represented as range between it’s integer and it’s integer + 1 (lower bound included, upper bound excluded).
Obviously search in plain ips can be done by putting them to SET and then using SISMEMBER. But what about searching inside networks. Can we do some trick with ranges maybe?
"Best" is subjective(in memory, in speed etc) but you may use two sets/hash to store them. Since they are unique both hashes and sets would be fine. If you prefer you can use a single set/hash to save both ip and network ip addresses but i would prefer separate since they are two different type of data sets(just like database tables).
Then you can use either of those
SISMEMBER with O(1) time complexity
HEXISTS with O(1) time complexity.
It can be handled on application level with multiple commands or lua script(in a single transaction).
Depending on your choice add to your keys with SADD and HSET(the field value would be 1).
--
Edit: (hope i get it right)
For the range of network addresses create sets from the integers surrounding two dots such as 12.345.67.1-12.345.67.254 range will be represented as 12.345.67 and you will add this to the set. When you want to search for 12.345.67.x it will be parsed into 12.345.67 in your application level and you will check with SISMEMBER. Same can be done with hash with HEXISTS.
Since ip addresses contain four different numbers with three dots, you will discard last dot and last number and the rest will be representing(i assume) the network range.
For IPs you can use Set and query by certain IP within O(1) time.
For IP range, I think you can use List with Lua Script for query. List will have O(n) time for searching, but since you only have 1000 items, O(N) and O(1) will not have a huge difference for Redis in memory query.
I'm dealing with a device that has both options to send data through UDP connection. As I couldn't find any comparison or something, could someone explain the difference in processing both?
Hex data transfers a byte as two hex characters, using only 4 bits of the available 8 bits. Ascii data transfers either 7bits or 8bits at a time, thus using the full range of 0..255 while a hex character only transfers 0..15.
For example, the number 18 is transferred as 12 hex coded (taking up two bytes) but as 18 ascii-encoded(taking up one byte 00010002).
I want to efficiently search IPv6 subnet range using redis.
i thought of storing the IPv6 numeric addresses in redis and search them by range.
those are 128-bit ints, e.g:
import ipaddress
int(ipaddress.ip_address(u'113f:a:2:3:4:1::77'))
> 22923991422715307029586104612626104439L
and query by range:
ZRANGEBYSCORE numerics <subnet-S-start> <subnet-S-end>
HOWEVER, redis sorted-sets can hold score of up to 2^53, so all my large ints are being trimmed and I'm losing precision.
Is there a way to save such large numbers in redis without losing precision?
Do you have a better suggestion?
Thanks
You can use the lexical range API, it will suit you exactly. https://redis.io/commands/zrangebylex
Insert the addresses with a score of 0, I don't even think you need to encode them as numbers, just pad the individual bytes, and you should be able to query an range.
We have a requirement to log IP address information of all users who use a certain web application based on Java EE 5.
What would be an appropriate sql data type for storing IPv4 or IPv6 addresses in the following supported databases (h2, mysql, oracle)?
There is also a need to filter activity from certain IP addresses. Should I just treat the representation as a string field (say varchar(32) to hold ipv4, ipv6 addresses)?
I'd store the IP addresses in a varchar(15). This is easily readable, and you can filter for specific IP's like where ip = '1.2.3.4'.
If you have to filter on networks, like 1.2.3.4/24, it becomes a different story. In that case your better off storing the IP address as a 4 byte binary.
If you have huge amounts of data and have to search through, for performance it would be better to convert string (dotted) representation of IPs to their proper integer values.
Either of these is valid
4 bytes, perhaps a 5th byte for CIDR
varchar(15) or (18) to store full representation in one go
Saying that, varchar(48) for SQL Server's sys.dm_exec_connections...
This question already has answers here:
What is the optimal length for an email address in a database?
(9 answers)
Closed 9 years ago.
I recognize that an email address can basically be indefinitely long so any size I impose on my varchar email address field is going to be arbitrary. However, I was wondering what the "standard" is? How long do you guys make it? (same question for Name field...)
update: Apparently the max length for an email address is 320 (<=64 name part, <= 255 domain). Do you use this?
The theoretical limit is really long but do you really need worry about these long Email addresses? If someone can't login with a 100-char Email, do you really care? We actually prefer they can't.
Some statistical data may shed some light on the issue. We analyzed a database with over 10 million Email addresses. These addresses are not confirmed so there are invalid ones. Here are some interesting facts,
The longest valid one is 89.
There are hundreds longer ones up to the limit of our column (255) but they are apparently fake by visual inspection.
The peak of the length distribution is at 19.
There isn't long tail. Everything falls off sharply after 38.
We cleaned up the DB by throwing away anything longer than 40. The good news is that no one has complained but the bad news is not many records got cleaned out.
I've in the past just done 255 because that's the so-ingrained standard of short but not too short input. That, and I'm a creature of habit.
However, since the max is 319, I'd do nvarchar(320) on the column. Gotta remember the #!
nvarchar won't use the space that you don't need, so if you only have a 20 character email address, it will only take up 20 bytes. This is in contrast to a nchar which will always take up its maximum (it right-pads the value with spaces).
I'd also use nvarchar in lieu of varchar since it's Unicode. Given the volatility of email addresses, this is definitely the way to go.
The following email address is only 94 characters:
i.have.a.really.long.name.like.seetharam.krishnapillai#AReallyLongCompanyNameOfSomeKind.com.au
Would an organisation actually give you an email that long?
If they were stupid enough to, would you actually use an email address like that?
Would anyone? Of course not. Too long to type and too hard to remember.
Even a 92-year-old technophobe would figure out how to sign up for a nice short gmail address, and just use that, rather than type this into your registration page.
Disk space probably isn't an issue, but there are at least two problems with allowing user input fields to be many times longer than they need to be:
Displaying them could mess up your UI (at best they will be cut off, at worst they push your containers and margins around)
Malicious users can do things with them you can't anticipate (like those cases where hackers used a free online API to store a bunch of data)
I like 50 chars:
123456789.123456789.123456789#1234567890123456.com
If one user in a million has to use their other email address to use my app, so be it.
(Statistics show that no-one actually enters more than about 40 chars for email address, see e.g.: ZZ Coder's answer https://stackoverflow.com/a/1297352/87861)
According to this text, based on the proper RFC documents, it's not 320 but 254:
http://www.eph.co.uk/resources/email-address-length-faq/
Edit:
Using WayBack Machine:
https://web.archive.org/web/20120222213813/http://www.eph.co.uk/resources/email-address-length-faq/
What is the maximum length of an email address?
254 characters
There appears to be some confusion over the maximum valid email
address size. Most people believe it to be 320 characters (64
characters for the username + 255 characters for the domain + 1
character for the # symbol). Other sources suggest 129 (64 + 1 + 64)
or 384 (128+1+255, assuming the username doubles in length in the
future).
This confusion means you should heed the 'robustness principle'
("developers should carefully write software that adheres closely to
extant RFCs but accept and parse input from peers that might not be
consistent with those RFCs." - Wikipedia) when writing software that
deals with email addresses. Furthermore, some software may be crippled
by naive assumptions, e.g. thinking that 50 characters is adequate
(examples). Your 200 character email address may be technically valid
but that will not help you if most websites or applications reject it.
The actual maximum email length is currently 254 characters:
"The original version of RFC 3696 did indeed say 320 was the maximum
length, but John Klensin (ICANN) subsequently accepted this was
wrong."
"This arises from the simple arithmetic of maximum length of a domain
(255 characters) + maximum length of a mailbox (64 characters) + the #
symbol = 320 characters. Wrong. This canard is actually documented in
the original version of RFC3696. It was corrected in the errata.
There's actually a restriction from RFC5321 on the path element of an
SMTP transaction of 256 characters. But this includes angled brackets
around the email address, so the maximum length of an email address is
254 characters." - Dominic Sayers
I use varchar(64) i do not think anyone could have longer email
If you're really being pendantic about it, make a username varchar(60), domain varchar(255). Then you can do ridiculous statistics on domain usage that is slightly faster than doing it as a single field. If you're feeling really gun-ho about optimization, that will also make your SMTP server able to send out emails with fewer connections / better batching.
RFC 5321 (the current SMTP spec, obsoletes RFC2821) states:
4.5.3.1.1. Local-part
The maximum total length of a user
name or other local-part is 64
octets.
4.5.3.1.2. Domain
The maximum total length of a
domain name or number is 255 octets.
This pertains to just localpart#domain, for a total of 320 ASCII (7-bit) characters.
If you plan to normalize your data, perhaps by splitting the localpart and domain into separate fields, additional things to keep in mind:
A technique known as VERP may result in full-length localparts for automatically generated mail (may not be relevant to your use case)
domains are case insensitive; recommend lowercasing the domain portion
localparts are case sensitive; user#domain.com and USER#domain.com are technically different addresses per the specs, although the policy at the domain.com may be to treat the two addresses as equivalent. It's best to restrict localpart case folding to domains that are known to do this.
For email, regardless of the spec, I virtually always go with 512 (nvarchar). Names and surnames are similar.
Really, you need to look at how much you care about having a little extra data. For me, mostly, it's not a worry, so I'll err on the conservative side. But if you've decided, through logically and accurate means, that you'll need to conserve space, then do so. But in general, be conservative with field sizes, and life shall be good.
Note that probably not all email clients support the RFC, so regardless of what it says, you may encounter different things in the wild.