Indexed ranged search algorithm for IP Addresses

Indexed ranged search algorithm for IP Addresses - indexing

Given an ACL list with 10 billion IPv4 ranges in CIDR notiation or between two IPs:
x.x.x.x/y
x.x.x.x - y.y.y.y
What is an effecient search/indexing algorithm for testing that a given IP address meets the critera of one or more ACL ranges?
Lets assume most ACL range definitions span a great number of class C blocks.
Indexing points via hash tables is easy but try as I might have not been able to come up with a reasonable method for detecting which points are covered by a large list of "lines".
Had some thoughts like indexing hints at a certain level of detail -- say pre-computing at the class C level each ACL that covered that point but the table would be too large.. Or some sort of KD tree to dynamically set levels of detail.
Also had the thought that maybe there are collision detection algorithms out there that can address this.
Any hints or pointers in the right direction?

The simple Radix Tree which has been used in the longest prefix match Internet route lookups, can be scaled to hold nodes that represent the larger CIDR subnets that overlap other smaller ones. A longest match lookup will traverse these nodes which will also be selected to get the entire set of CIDR subnets that match an IP address.
Now, to hold the IP ranges in the same tree, we can convert each range into a set of CIDR subnets. This can be always done though the set may have lots of subnets (and even some host IPs -- that is, IP/32 kind CIDR addresses).

You have 10 billion rules to match 4 billion possible addresses?
Make a table of 4 billion addresses. For each of the 10 billion rules, 'paint' the addresses it applies to, doing something sensible when two or more rules apply to the same address.

You can look at the Interval tree to find all intervals that overlap with any given interval or point.
For non-overlapping ip-ranges, you can use a b-tree or compact-tries like Judy arrays (64-bits) for indexing and searching (Store the start-ip as key and the end-ip as value).

Related

Matching network subnets with BigQuery

I have a huge set of object (i.e. traceroutes) hosted in Google BigQuery. Those objects can be considered as arrays of IP addresses (called hops). I am trying to find a query to identify all the traceroutes crossing a few set of prefixes of interest (i.e. prefixes_of_interest). Those prefixes can be found in another dataset hosted in BigQuery as well. I have been able to write a query to discover all the traceroutes crossing a specific set of IP addresses (i.e. ip_of_interests), after start_date via this query:
WITH
traceroutes AS (
SELECT
hops,
start_time
FROM
`traceroute` AS t
WHERE
DATE(start_time) >= start_date )
SELECT
hops,
h.hop_addr,
start_time
FROM
traceroutes,
UNNEST(hops) AS h
WHERE
h.hop_addr IN (
SELECT
IP
FROM
`ip_addresses`)
Now, in practice, I could translate all of my prefixes to IP addresses and reiterate a similar query but that feels like a very inefficient way to proceed. I would like to know if there is a way to match the h.hop_addr prefixes with the one from my database prefixes_of_interest. In practice, my subnetworks have different lengths (most are /24 but I have a few /20) but a solution that would match /24 would do the trick and allow me to save tons of space and processing time I reckon. Hopefully the question makes sense!

Redis | Best data structure to store IPs and networks

My goal is to store in Redis:
Plain IP addresses like 228.228.228.228
IP networks like 228.228.228.0/24
in order to check in request/response cycle whether or not
current IP xxx.yyy.xxx.vvv is inside( contained by):
Plain ips
or
Ip network ( for example 228.228.228.228 inside 228.228.228.0/24)
Overall amount of ips and networks - few 1000 items.
Question is – what is the best way (best structure) to store both plain ips and networks in Redis and make aforementioned check without fetching data from Redis?
Thanks.
P.S. Current IP is already known.
UPDATE
Ok, lets simplify it a bit with example.
I have 2 ips and 2 networks in where I want to check if certain ip is contained.
# 2 plain ip
202.76.250.29
37.252.145.1
# 2 networks
16.223.132.0/24
9.76.202.0/24
There are 2 possible ways where exact ip might be contained:
1)Just in plain ips. For example 202.76.250.29 contained in the structure above and 215.08.11.23 is not contained simply by definition.
2)Ip might be contained inside network. For example 9.76.202.100 contained inside networks 9.76.202.0/24 but not contained inside list of plain ips as there are no any exact ip = 9.76.202.100.
Little bit of of explanation about ip networks. Very simplified.
Ip network represents range of ips. For example ipv4 network "192.4.2.0/24" represents 256 ip addresses:
IPv4Address('192.4.2.1'), IPv4Address('192.4.2.2'),
…
…
…
IPv4Address('192.4.2.253'), IPv4Address('192.4.2.254')
In another words ip network is a range of ip addresses
from '192.4.2.1' up to '192.4.2.254'
In our example 9.76.202.100 contained inside networks 9.76.202.0/24 as one of this addresses inside the range.
My idea is like this:
Any ip address can be represented as integer. One of our ip addresses
202.76.250.29 converted to integer is 3394042397.
As ip network is a range of ips, so that it is possible to convert it in a range of integers by converting first and last ip in range in integers.
For example one of our networks 16.223.132.0/24 represents range between IPv4Address('16.223.132.1') and IPv4Address('16.223.132.254'). Or integers range from 283083777 up to 283083781 with step 1.
Individual ip can be represented as range between it’s integer and it’s integer + 1 (lower bound included, upper bound excluded).
Obviously search in plain ips can be done by putting them to SET and then using SISMEMBER. But what about searching inside networks. Can we do some trick with ranges maybe?

"Best" is subjective(in memory, in speed etc) but you may use two sets/hash to store them. Since they are unique both hashes and sets would be fine. If you prefer you can use a single set/hash to save both ip and network ip addresses but i would prefer separate since they are two different type of data sets(just like database tables).
Then you can use either of those
SISMEMBER with O(1) time complexity
HEXISTS with O(1) time complexity.
It can be handled on application level with multiple commands or lua script(in a single transaction).
Depending on your choice add to your keys with SADD and HSET(the field value would be 1).
--
Edit: (hope i get it right)
For the range of network addresses create sets from the integers surrounding two dots such as 12.345.67.1-12.345.67.254 range will be represented as 12.345.67 and you will add this to the set. When you want to search for 12.345.67.x it will be parsed into 12.345.67 in your application level and you will check with SISMEMBER. Same can be done with hash with HEXISTS.
Since ip addresses contain four different numbers with three dots, you will discard last dot and last number and the rest will be representing(i assume) the network range.

For IPs you can use Set and query by certain IP within O(1) time.
For IP range, I think you can use List with Lua Script for query. List will have O(n) time for searching, but since you only have 1000 items, O(N) and O(1) will not have a huge difference for Redis in memory query.

LDAP search for multiple complete DNs?

Assume I have an array of N DNs (distinguished names), e.g.:
cn=foo,dc=capmon,dc=lan
cn=bar,dc=capmon,dc=lan
cn=Fred Flintstone,ou=CapMon,dc=capmon,dc=lan
cn=Clark Kent,ou=yada,ou=whatnot,dc=capmon,dc=lan
They are not related and I cannot reduce/simplify the search. I have N complete DNs and want N records.
Can I write a single LDAP search that will return exactly N records, one for each DN? The assumption being that performance of both client and server will be better if I do it all in one search. Had it been SQL, it would be:
SELECT *
FROM dc=capmon,dc=lan
WHERE dn IN (
"cn=foo,dc=capmon,dc=lan",
"cn=bar,dc=capmon,dc=lan",
"cn=Fred Flintstone,ou=CapMon,dc=capmon,dc=lan",
"cn=Clark Kent,ou=yada,ou=whatnot,dc=capmon,dc=lan"
)
rather than doing individual LDAP searches in a for loop (which I do know how to do).
I tried against an MS Active Directory. There, all fields (seem to) have a distinguishedName attribute, and a search filter like this works (I added some newlines for readability):
(|
(distinguishedName=cn=ppolicy,dc=capmon,dc=lan)
(distinguishedName=cn=Users,dc=capmon,dc=lan)
<more ORed terms>
)
But this doesn't work:
(|
(dn=cn=ppolicy,dc=capmon,dc=lan)
(dn=cn=Users,dc=capmon,dc=lan)
<more ORed terms>
)
even though the returned records look like they contain dn attributes. :-(
An OpenLDAP server's records don't have distinguishedName attributes, and neither of the filters above work against it.
Can I do something that will work against most major LDAP servers?

It's not possible to "Read" several entries in a single operation.
You can do a single search operation that will match and return several entries, but you cannot search on the "DN" itself.
I've seen several applications that are trying to get several entries by using complex filters such as "(|(cn=foo)(cn=bar)(cn=Fred Flintstone))", but this may result in more entries, unless all CN values are unique. It's not really a good practice either, as there are limits in the number of elements you can have in the filter, and such requests are usually not optimized in term of I/O.
It will be faster to read each invidual entry, as LDAP servers are optimized for such operations. If you want to reduce the latency, you can issue multiple asynchronous search operations on the same connection.

Best Way To Index & Search for Value Between Hi & Lo Byte[] Columns?

I know about text indexing, but this is different. I have 2 byte array columns in a table, labeled StartByteArray & EndByteArray. The Start column is a starting IP address in byte array form, and the same with the End column, except it is the stop IP. You can think of the high & low columns as boundaries of IP Addresses. It looks like this (just 10 rows shown):
StartIPAddress StartByteArray EndIPAddress EndByteArray
41.0.0.0 0x29000000 41.31.255.255 0x291FFFFF
41.32.0.0 0x29200000 41.47.255.255 0x292FFFFF
41.48.0.0 0x29300000 41.55.255.255 0x2937FFFF
41.56.0.0 0x29380000 41.56.255.255 0x2938FFFF
41.57.0.0 0x29390000 41.57.63.255 0x29393FFF
41.57.64.0 0x29394000 41.57.79.255 0x29394FFF
41.57.80.0 0x29395000 41.57.95.255 0x29395FFF
41.57.96.0 0x29396000 41.57.111.255 0x29396FFF
41.57.112.0 0x29397000 41.57.115.255 0x293973FF
41.57.116.0 0x29397400 41.57.119.255 0x293977FF
That's it. The reason I did this was to make searching for a row easier, if that row 'contained, or bounded, the given IP Address. Sounds harder than it is.
Put another way, I want to search for the row that my given IP Address (once also converted to byte array) is within.
Now writing the usual SQL is easy (example on SO here, for example), but I've got a feeling there is a clever way to index these columns in such a way that it will be efficient, but all I have done is text indexing, and there are 2 columns here that I'm doing math comparisons to, not letters of words over x characters long.
I'm using SQL Server 2012, and can also convert the data to anything better suited, as I own the DB.
Any thoughts?

I sense there are some misunderstandings here. I hope I'll find them.
Indexing text columns is not different from indexing any other data type. A B-tree based index can index any data type that has a sort order. All it does is keep all index rows sorted by the key columns. This allows for range and point lookups. Binary data, string data and integer data are all fully supported.
Now writing the usual SQL is easy (example on SO here, for example)
This query does not solve your problem. It returns all rows where the StartByteArray would be in a given range. You want to opposite: You want the search argument to be in the range that a certain row specifies.
I already answered how to look up an IP range.
I've got a feeling there is a clever way to index these columns in such a way that it will be efficient
Just index on StartByteArray. That allows you to find the first row that matches a given IP.
but all I have done is text indexing
Not sure what you mean but whatever it is - it probably a misunderstanding.
Using a binary(4) to store IPs is clever. I never thought of doing that. I used a bigint in the past. That takes twice the amount of storage that would strictly be needed, though.

capturing user IP address information for audit

We have a requirement to log IP address information of all users who use a certain web application based on Java EE 5.
What would be an appropriate sql data type for storing IPv4 or IPv6 addresses in the following supported databases (h2, mysql, oracle)?
There is also a need to filter activity from certain IP addresses. Should I just treat the representation as a string field (say varchar(32) to hold ipv4, ipv6 addresses)?

I'd store the IP addresses in a varchar(15). This is easily readable, and you can filter for specific IP's like where ip = '1.2.3.4'.
If you have to filter on networks, like 1.2.3.4/24, it becomes a different story. In that case your better off storing the IP address as a 4 byte binary.

If you have huge amounts of data and have to search through, for performance it would be better to convert string (dotted) representation of IPs to their proper integer values.

Either of these is valid
4 bytes, perhaps a 5th byte for CIDR
varchar(15) or (18) to store full representation in one go
Saying that, varchar(48) for SQL Server's sys.dm_exec_connections...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas