How can I optimize finding the shortest description of a set - optimization

The goal is to find the optimal description of a subset of countries, using a combination of individual countries and groups of countries that are predefined.
Set of countries is comprised of all 2 letter ISO country codes, (US/DE/NL/etc)
Groups of countries are fixed, for example:
GROUP1: NL,BE,LU
GROUP2: NL,BE,LU,CZ,US,FI,NO,GI,FR,DK
I want to have a representation of this list of countries:
CZ,US,FI,NO,GI,FR,DK,DE
I would like to shorten this using the defined groups, to the representation with the least amount of elements. For this example that would be:
+GROUP2 +DE -GROUP1
Because if we expand this back out we get:
Add GROUP2: NL,BE,LU,CZ,US,FI,NO,GI,FR,DK
Add DE
Remove GROUP1: NL,BE,LU
Which brings us back to CZ,US,FI,NO,GI,FR,DK,DE
Any pointers to libraries or examples are welcome in any programming language
I'm at a loss what a good approach would be. I've tried looking into existing algorithms, comparable problems, but I can't seem to wrap my head around the problem and find a good fit.
Just to illustrate a simple brute force solution that will obviously not scale:
let countries = ['C1', 'C2', 'C3', 'C4', 'C5', 'C6']
let groups = {
G1: ['C1', 'C2', 'C3', 'C4', 'C5'],
G2: ['C1', 'C4'],
}
let output = getBest(['C2', 'C3', 'C5', 'C6'])
// output == ["+C6", "+G1", "-G2"]
function getBest(input) {
const ids = countries.concat(Object.keys(groups))
let best = input
for (const t of combinations(ids)) {
if (expand(t, groups).sort().toString() == input.toString()) {
if (t.length < best.length) best = [...t]
}
}
return best
}
// Expands short form to long form
function expand(s, groups) {
return Array.from(
s.sort().reduce((acc, curr) => {
let symbol = curr[0]
let id = curr.slice(1)
if (groups[id]) {
curr = groups[id]
} else {
curr = [id]
}
if (symbol == '+') {
return new Set([...acc, ...curr])
} else {
return new Set([...acc].filter((a) => !curr.includes(a)))
}
}, new Set())
)
}
// Yields all possible short form options
function* combinations(array) {
array = ['+', '-'].reduce((acc, curr) => {
return acc.concat(array.map((s) => curr + s))
}, [])
for (const subset of subsets(array)) {
yield subset
}
}
// Creates powerset of array
function* subsets(array, offset = 0) {
while (offset < array.length) {
let first = array[offset++]
for (let subset of subsets(array, offset)) {
subset.push(first)
yield subset
}
}
yield []
}

To me, the problem does not sound like there exists a classical model for it with a well-known polynomial-time algorithm.
A reasonable approach looks as follows.
Consider formulas in a formal language, like (G1 subtract G2) unite (G3 intersect G4), where Gi are predefined groups (and perhaps individual countries, but that will slow the solution down a lot).
Each formula's score is its length plus the size of difference with the desired answer (so that we add or subtract individual elements as the last step of the formula).
Now, for formulas of lengths 0, 1, 2, ..., (iterative deepening), recursively generate all possible formulas of such length and consider their score.
Stop when the length reaches the score of the best answer so far.
There's some room to optimize (for example, prune clearly stupid branches like Gi symdiff Gi, or perhaps memoize the shortest formula for each set we can obtain), but the solution is nevertheless exponential in the number of items and groups.

Related

How to make an ordered list of boolean array permutations with a given number of trues

Is there an efficient method to generate all possible arrays of booleans with a given number of "true" values?
Right now I'm incrementing a number and checking if its binary representation has the given number of 1s (and if so, adding that array). But this becomes extremely slow for larger givens.
This is the kind of input-output that I'm looking for:
(length: 4, trues: 2) -> [[1,1,0,0],[1,0,1,0],[0,1,1,0],[1,0,0,1],[0,1,0,1],[0,0,1,1]]
The trouble is doing it in less than O(2^N), and so that they're ordered as the little endian binary representations would be.
If it helps the length would be a fixed number at compile time (currently it's 64). I wrote it as an input because I might have to increase it to 128, but it won't vary during runtime.
You can define a recursive solution to this problem.
fn solution(length: u32, trues: u32) -> Vec<Vec<bool>>;
How do we formulate this function recursively? Let's think about the last element of the output arrays. If that last element is false, then the amount of true elements in the 0..length-1 elements must be trues. If that last element is true, then the amount of true elements in the 0..length-1 elements must be trues-1.
So we can just answer the problem for (length-1, trues) and then extend them all with false, and answer the problem for (length-1, trues-1) and extend them all with true, and then we can combine the result (putting the ends-with-true case first for little endian ordering). Adding in some base cases, we get the following code:
fn solution(length: u32, trues: u32) -> Vec<Vec<bool>> {
if trues > length {
// no candidate arrays exist
return vec![];
}
if length == 0 {
// one array exists: the empty array
return vec![vec![]];
}
if trues == 0 {
// one array exists: the all-false array
return vec![vec![false; length as usize]];
}
let mut result = Vec::new();
for mut ones in solution(length-1, trues-1) {
ones.push(true);
result.push(ones);
}
for mut zeroes in solution(length-1, trues) {
zeroes.push(false);
result.push(zeroes);
}
result
}
If the time complexity for solution(L, T) is O(S(L,T)), then the time complexity of solution(L,T) can be recursively expressed as O(S(L-1,T-1) + S(L-1,T)), with the base cases S(0,0) = 1, S(L,L+1)=1, S(L,0) = L. This is the best achievable time complexity, since S(L,T) = S(L-1,T-1) + S(L-1,T) is also the recursive formula for how many arrays of length L and true-count T exist. This is also the same recurrence formula as the binomial recurrence equation, but the base cases are different. I will leave computing the time complexity as an exercise to the reader, and there are a couple of trivial optimizations to the above code that can be done to bring down the computation time further.

Run a regex on a Supply or other stream-like sequence?

Suppose I have a Supply, Channel, IO::Handle, or similar stream-like source of text, and I want to scan it for substrings matching a regex. I can't be sure that matching substrings do not cross chunk boundaries. The total length is potentially infinite and cannot be slurped into memory.
One way this would be possible is if I could instantiate a regex matching engine and feed it chunks of text while it maintains its state. But I don't see any way to do that -- I only see methods to run the match engine to completion.
Is this possible?
After some more searching, I may have answered my own question. Specifically, it seems Seq.comb is capable of combining chunks and lazily processing them:
my $c = supply {
whenever Supply.interval(1.0) -> $v {
my $letter = do if ($v mod 2 == 0) { "a" } else { "b" };
my $chunk = $letter x ($v + 1);
say "Pushing {$chunk}";
emit($chunk);
}
};
my $c2 = $c.comb(/a+b+/);
react {
whenever $c2 -> $v {
say "Got {$v}";
}
}
See also the concurrency features used to construct this example.

What is the most efficient way to generate random numbers from a union of disjoint ranges in Kotlin?

I would like to generate random numbers from a union of ranges in Kotlin. I know I can do something like
((1..10) + (50..100)).random()
but unfortunately this creates an intermediate list, which can be rather expensive when the ranges are large.
I know I could write a custom function to randomly select a range with a weight based on its width, followed by randomly choosing an element from that range, but I am wondering if there is a cleaner way to achieve this with Kotlin built-ins.
Suppose your ranges are nonoverlapped and sorted, if not, you could have some preprocessing to merge and sort.
This comes to an algorithm choosing:
O(1) time complexity and O(N) space complexity, where N is the total number, by expanding the range object to a set of numbers, and randomly pick one. To be compact, an array or list could be utilized as the container.
O(M) time complexity and O(1) space complexity, where M is the number of ranges, by calculating the position in a linear reduction.
O(M+log(M)) time complexity and O(M) space complexity, where M is the number of ranges, by calculating the position using a binary search. You could separate the preparation(O(M)) and generation(O(log(M))), if there are multiple generations on the same set of ranges.
For the last algorithm, imaging there's a sorted list of all available numbers, then this list can be partitioned into your ranges. So there's no need to really create this list, you just calculate the positions of your range s relative to this list. When you have a position within this list, and want to know which range it is in, do a binary search.
fun random(ranges: Array<IntRange>): Int {
// preparation
val positions = ranges.map {
it.last - it.first + 1
}.runningFold(0) { sum, item -> sum + item }
// generation
val randomPos = Random.nextInt(positions[ranges.size])
val found = positions.binarySearch(randomPos)
// binarySearch may return an "insertion point" in negative
val range = if (found < 0) -(found + 1) - 1 else found
return ranges[range].first + randomPos - positions[range]
}
Short solution
We can do it like this:
fun main() {
println(random(1..10, 50..100))
}
fun random(vararg ranges: IntRange): Int {
var index = Random.nextInt(ranges.sumOf { it.last - it.first } + ranges.size)
ranges.forEach {
val size = it.last - it.first + 1
if (index < size) {
return it.first + index
}
index -= size
}
throw IllegalStateException()
}
It uses the same approach you described, but it calls for random integer only once, not twice.
Long solution
As I said in the comment, I often miss utils in Java/Kotlin stdlib for creating collection views. If IntRange would have something like asList() and we would have a way to concatenate lists by creating a view, this would be really trivial, utilizing existing logic blocks. Views would do the trick for us, they would automatically calculate the size and translate the random number to the proper value.
I implemented a POC, maybe you will find it useful:
fun main() {
val list = listOf(1..10, 50..100).mergeAsView()
println(list.size) // 61
println(list[20]) // 60
println(list.random())
}
#JvmName("mergeIntRangesAsView")
fun Iterable<IntRange>.mergeAsView(): List<Int> = map { it.asList() }.mergeAsView()
#JvmName("mergeListsAsView")
fun <T> Iterable<List<T>>.mergeAsView(): List<T> = object : AbstractList<T>() {
override val size = this#mergeAsView.sumOf { it.size }
override fun get(index: Int): T {
if (index < 0 || index >= size) {
throw IndexOutOfBoundsException(index)
}
var remaining = index
this#mergeAsView.forEach { curr ->
if (remaining < curr.size) {
return curr[remaining]
}
remaining -= curr.size
}
throw IllegalStateException()
}
}
fun IntRange.asList(): List<Int> = object : AbstractList<Int>() {
override val size = endInclusive - start + 1
override fun get(index: Int): Int {
if (index < 0 || index >= size) {
throw IndexOutOfBoundsException(index)
}
return start + index
}
}
This code does almost exactly the same thing as short solution above. It only does this indirectly.
Once again: this is just a POC. This implementation of asList() and mergeAsView() is not at all production-ready. We should implement more methods, like for example iterator(), contains() and indexOf(), because right now they are much slower than they could be. But it should work efficiently already for your specific case. You should probably test it at least a little. Also, mergeAsView() assumes provided lists are immutable (they have fixed size) which may not be true.
It would be probably good to implement asList() for IntProgression and for other primitive types as well. Also you may prefer varargs version of mergeAsView() than extension function.
As a final note: I guess there are libraries that does this already - probably some related to immutable collections. But if you look for a relatively lightweight solution, it should work for you.

Find a subsequence in a list

Lets assume that we are looking for a sequence in a list and this sequence should satisfy some conditions, for example I have a series of numbers like this:
[1,2,4,6,7,8,12,13,14,15,20]
I need to find the largest sequence so that its consecutive elements have a difference of 1, So what I expected to get is:
[12,13,14,15]
I'm curious if there is any way to get in Kotlin Sequence or inline functions like groupBy or something else.
PS: I know how to create sequences, the question is how to evaluate and extract some sequences with given conditions.
There is no built in functionality for this "sequence" recognition, but you could solve it with the fold operation:
val result = listOf(1, 2, 3, 12, 13, 14, 15)
.distinct() // remove duplicates
.sorted() // by lowest first
.fold(mutableListOf<Int>() to mutableListOf<List<Int>>()) { (currentList, allLists), currentItem ->
if (currentList.isEmpty()) { // Applies only to the very first item
mutableListOf(currentItem) to allLists
} else {
if (currentItem - currentList.max()!! == 1) { // Your custom sequence recognition 'difference of 1'
currentList.apply { add(currentItem) } to allLists
} else {
mutableListOf(currentItem) to allLists.apply { add(currentList) } // Next
}
}
}
.let { it.second.apply { add(it.first) } } // Add last list
.maxBy { it.size } // We need the longest pattern - which will take first of the stack - it could be multiple.
// If you need more precise list, sort them by your criteria

Specman/e constraint (for each in) iteration

Can I iterate through only a part of a list in e, in a constraint.
For example, this code will go through the whole layer_l list:
<'
struct layer_s {
a : int;
keep soft a == 3;
};
struct layer_gen_s {
n_layers : int;
keep soft n_layers == 8;
layer_l : list of layer_s;
keep layer_l.size() == read_only(n_layers);
};
extend sys {
layer_gen : layer_gen_s;
run() is also {
messagef(LOW, "n_layers = %0d", layer_gen.n_layers);
for each in layer_gen.layer_l{
messagef(LOW, "layer[%2d]: a = %0d", index, it.a);
};
};
};
-- this will go through all layer_l
extend layer_gen_s {
keep for each (layer) using index (i) in layer_l {
layer.a == 7;
};
};
But, I would like to only iterate the for each in through, for example, 2 items. I tried the code below, but it doesn't work:
-- this produces an error
extend layer_gen_s {
keep for each (layer) using index (i) in [layer_l.all(index < 2)] {
layer.a == 7;
};
};
Also I don't want to use implication, so this is not what I want:
-- not what I want, I want to specify directly in iterated list
extend layer_gen_s {
keep for each (layer) using index (i) in layer_l {
(i < 2) => {
layer.a == 7;
};
};
};
Using the list slicing operator doesn't work either, because the path in a for..each constraint is limited to a simple path (e.g. a list variable). The following doesn't work either:
keep for each (layer) using index (i) in layer_l[0..2] {
//...
};
This is a Specman limitation.
To force looping over a sub-list your only bet is to create that sub-list as a separate variable:
layer_subl: list of layer_s;
keep layer_subl.size() == 3;
keep for each (layer) using index (i) in layer_subl {
layer == layer_l[i];
};
Now you can loop on only the first 3 elements within your for..each constraint:
keep for each (layer) in layer_subl {
layer.a == 7;
};
This avoids using implication inside the constraint. Whether this is worth is for you to decide. Also note that the lists will contain the same objects (this is good). No extra struct objects get created.
Creation of the sub-list like this is boilerplate code that could be handled by the tool itself. This would make the code much more concise and readable. You could contact your vendor and request this feature.