Algorithm to group consecutive words minimizing length per group - objective-c

From an input of space-delimited words, how to concatenate consecutive words so that:
each group has a minimum length L (spaces don't count)
longest group length is minimal (spaces don't count)
Example input:
would a cat eat a mouse
Example minimum length:
L = 5
Naive algorithm that solves the first condition but not the second one:
while length of a group is less than L, concatenate next word to group
if last group is shorter than L, concatenate last two groups together
This naive algorithm produces:
group 1: would
group 2: acateat
group 3: amouse
longest group length: 7
Second condition is not solved because a better solution would be:
group 1: woulda
group 2: cateat
group 3: amouse
longest group length: 6
Which algorithm would solve the second condition (minimal longest group) with relatively fast execution as a program? (by fast, I'd like to avoid testing all possible combinations)
I know C, ObjC, Swift, Javascript, Python, but pseudocode is fine.

This can be done with dynamic programming approach. Let's count a function F(i) - the minimum length of the longest group among correct divisions of the first i words into groups.
F(0) = 0
F(i) = Min(Max(F(j), totalLen(j+1, i))), for j in [0..i-1]
Where
totalLen(i, j) = total length of words from i to j, if the length is at least L
totalLen(i, j) = MAX, if total length is less than L
The answer is the value of F(n). To get the groups themselves we can save the indices of the best j for every i.
There is a implementation from the scratch in c++:
const vector<string> words = {"would", "a", "cat", "eat", "a", "mouse"};
const int L = 5;
int n = words.size();
vector<int> prefixLen = countPrefixLen(words);
vector<int> f(n+1);
vector<int> best(n+1, -1);
int maxL = prefixLen[n];
f[0] = 0;
for (int i = 1; i <= n; ++i) {
f[i] = maxL;
for (int j = 0; j < i; ++j) {
int totalLen = prefixLen[i] - prefixLen[j];
if (totalLen >= L) {
int maxLen = max(f[j], totalLen);
if (f[i] > maxLen) {
f[i] = maxLen;
best[i] = j;
}
}
}
}
output(f[n], prev, words);
Preprocessing and output details:
vector<int> countPrefixLen(const vector<string>& words) {
int n = words.size();
vector<int> prefixLen(n+1);
for (int i = 1; i <= n; ++i) {
prefixLen[i] = prefixLen[i-1] + words[i-1].length();
}
return prefixLen;
}
void output(int answer, const vector<int>& best, const vector<string>& words) {
cout << answer << endl;
int j = best.size()-1;
vector<int> restoreIndex(1, j);
while (j > 0) {
int i = best[j];
restoreIndex.push_back(i);
j = i;
}
reverse(restoreIndex.begin(), restoreIndex.end());
for (int i = 0; i+1 < restoreIndex.size(); ++i) {
for (int j = restoreIndex[i]; j < restoreIndex[i+1]; ++j) {
cout << words[j] << ' ';
}
cout << endl;
}
}
Output:
6
would a
cat eat
a mouse
Runnable: https://ideone.com/AaV5C8
Further improvement
The complexity of this algorithm is O(N^2). If it is too slow for your data I can suggest a simple optimization:
Let's inverse the inner loop. First, this allows to get rid of the prefixLen array and it's preprocessing, because now we add words one by one to the group (actually, we could get rid of this preprocessing in the initial version, but at the expense of simplicity). What is more important we can break our loop when totalLen would be not less than already computed f[i] because further iterations will never lead to an improvement. The code of the inner loop could be changed to:
int totalLen = 0;
for (int j = i-1; j >= 0; --j) {
totalLen += words[j].length();
if (totalLen >= L) {
int maxLen = max(f[j], totalLen);
if (f[i] > maxLen) {
f[i] = maxLen;
best[i] = j;
}
}
if (totalLen >= f[i]) break;
}
This can drastically improve the performance for not very big values of L.

Related

What is the complexity of this for loop, for (int j = i; j < n; j++)?

what is the complexity of the second for loop? would it be n-i? from my understanding a the first for loop will go n times, but the index in the second for loop is set to i instead.
//where n is the number elements in an array
for (int i = 0; i < n; i++) {
for (int j = i; j < n; j++) {
// Some Constant time task
}
}
In all, the inner loop iterates sum(1..n) times, which is n * (n + 1) / 2, which is O(n2)
If you try to visualise this as a matrix where lines represents i and each columns represents j you'll see that this forms a triangle with the sides n
Example with n being 4
0 1 2 3
1 2 3
2 3
3
The inner loop has (on average) complexity n/2 which is O(n).
The total complexity is n*(n+1)/2 or O(n^2)
The number of steps this takes is a Triangle Number. Here's a bit of code I put together in LINQpad (yeah, sorry about answering in C#, but hopefully this is still readable):
void Main()
{
long k = 0;
// Whatever you want
const int n = 13;
for (int i = 0; i < n; i++)
{
for (int j = i; j < n; j++)
{
k++;
}
}
k.Dump();
triangleNumber(n).Dump();
(((n * n) + n) / 2).Dump();
}
int triangleNumber(int number)
{
if (number == 0) return 0;
else return number + triangleNumber(number - 1);
}
All 3 print statements (.Dump() in LINQpad) produce the same answer (91 for the value of n I selected, but again you can choose whatever you want).
As others indicated, this is O(n^2). (You can also see this Q&A for more details on that).
We can see that the total iteration of the loop is n*(n+1)/2. I am assuming that you are clear with that from the above explanations.
Now let's find the asymptotic time complexity in an easy logical way.
Big Oh, comes to play when the value of n is a large number, in such cases we need not consider the dividing by 2 ( 2 is a constant) because (large number / 2) is also a large number.
This leaves us with n*(n+1).
As explained above, since n is a large number, (n+1) can be approximated to (n).
thus leaving us with (n*n).
hence the time complexity O(n^2).

Time complexity of two nested loops depending on n and k

I have a string of n chars and a k length of unique substrings
I'm trying to understand the time complexity of this code:
for (int i = 0; i <= inputStr.length() - k; i++) {
String substr = inputStr.substring(i, i + k);
Set<Character> setChars = new HashSet<Character>();
for (int j = 0; j < k; j++) {
setChars.add(substr.charAt(j));
}
if (setChars.size() == num) {
set.add(substr);
}
}
If I correctly understood the time complexity might be expressed by the formula:
f((n-k+1)*k)
I believe that the worse case I can have is when k = n/2, so:
f((n-k+1)k) = nn/2 - n/2*n/2 + n/2 = 1/2*n*n - 1/4*n*n + 1/2*n =>
O(n)
You are correct until the implication.
f((n-k+1)k) = nn/2 - n/2*n/2 + n/2 = 1/2*n*n - 1/4*n*n + 1/2*n
= 1/4n*n + 1/2*n => O(n*n)

Find nth int with 10 set bits

Find the nth int with 10 set bits
n is an int in the range 0<= n <= 30 045 014
The 0th int = 1023, the 1st = 1535 and so on
snob() same number of bits,
returns the lowest integer bigger than n with the same number of set bits as n
int snob(int n) {
int a=n&-n, b=a+n;
return b|(n^b)/a>>2;
}
calling snob n times will work
int nth(int n){
int o =1023;
for(int i=0;i<n;i++)o=snob(o);
return o;
}
example
https://ideone.com/ikGNo7
Is there some way to find it faster?
I found one pattern but not sure if it's useful.
using factorial you can find the "indexes" where all 10 set bits are consecutive
1023 << x = the (x+10)! / (x! * 10!) - 1 th integer
1023<<1 is the 10th
1023<<2 is the 65th
1023<<3 the 285th
...
Btw I'm not a student and this is not homework.
EDIT:
Found an alternative to snob()
https://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
int lnbp(int v){
int t = (v | (v - 1)) + 1;
return t | ((((t & -t) / (v & -v)) >> 1) - 1);
}
I have built an implementation that should satisfy your needs.
/** A lookup table to see how many combinations preceeded this one */
private static int[][] LOOKUP_TABLE_COMBINATION_POS;
/** The number of possible combinations with i bits */
private static int[] NBR_COMBINATIONS;
static {
LOOKUP_TABLE_COMBINATION_POS = new int[Integer.SIZE][Integer.SIZE];
for (int bit = 0; bit < Integer.SIZE; bit++) {
// Ignore less significant bits, compute how many combinations have to be
// visited to set this bit, i.e.
// (bit = 4, pos = 5), before came 0b1XXX and 0b1XXXX, that's C(3, 3) + C(4, 3)
int nbrBefore = 0;
// The nth-bit can be only encountered after pos n
for (int pos = bit; pos < Integer.SIZE; pos++) {
LOOKUP_TABLE_COMBINATION_POS[bit][pos] = nbrBefore;
nbrBefore += nChooseK(pos, bit);
}
}
NBR_COMBINATIONS = new int[Integer.SIZE + 1];
for (int bits = 0; bits < NBR_COMBINATIONS.length; bits++) {
NBR_COMBINATIONS[bits] = nChooseK(Integer.SIZE, bits);
assert NBR_COMBINATIONS[bits] > 0; // Important for modulo check. Otherwise we must use unsigned arithmetic
}
}
private static int nChooseK(int n, int k) {
assert k >= 0 && k <= n;
if (k > n / 2) {
k = n - k;
}
long nCk = 1; // (N choose 0)
for (int i = 0; i < k; i++) {
// (N choose K+1) = (N choose K) * (n-k) / (k+1);
nCk *= (n - i);
nCk /= (i + 1);
}
return (int) nCk;
}
public static int nextCombination(int w, int n) {
// TODO: maybe for small n just advance naively
// Get the position of the current pattern w
int nbrBits = 0;
int position = 0;
while (w != 0) {
final int currentBit = Integer.lowestOneBit(w); // w & -w;
final int bitPos = Integer.numberOfTrailingZeros(currentBit);
position += LOOKUP_TABLE_COMBINATION_POS[nbrBits][bitPos];
// toggle off bit
w ^= currentBit;
nbrBits++;
}
position += n;
// Wrapping, optional
position %= NBR_COMBINATIONS[nbrBits];
// And reverse lookup
int v = 0;
int m = Integer.SIZE - 1;
while (nbrBits-- > 0) {
final int[] bitPositions = LOOKUP_TABLE_COMBINATION_POS[nbrBits];
// Search for largest bitPos such that position >= bitPositions[bitPos]
while (Integer.compareUnsigned(position, bitPositions[m]) < 0)
m--;
position -= bitPositions[m];
v ^= (0b1 << m--);
}
return v;
}
Now for some explanation. LOOKUP_TABLE_COMBINATION_POS[bit][pos] is the core of the algorithm that makes it as fast as it is. The table is designed so that a bit pattern with k bits at positions p_0 < p_1 < ... < p_{k - 1} has a position of `\sum_{i = 0}^{k - 1}{ LOOKUP_TABLE_COMBINATION_POS[i][p_i] }.
The intuition is that we try to move back the bits one by one until we reach the pattern where are all bits are at the lowest possible positions. Moving the i-th bit from position to k + 1 to k moves back by C(k-1, i-1) positions, provided that all lower bits are at the right-most position (no moving bits into or through each other) since we skip over all possible combinations with the i-1 bits in k-1 slots.
We can thus "decode" a bit pattern to a position, keeping track of the bits encountered. We then advance by n positions (rolling over in case we enumerated all possible positions for k bits) and encode this position again.
To encode a pattern, we reverse the process. For this, we move bits from their starting position forward, as long as the position is smaller than what we're aiming for. We could, instead of a linear search through LOOKUP_TABLE_COMBINATION_POS, employ a binary search for our target index m but it's hardly needed, the size of an int is not big. Nevertheless, we reuse our variant that a smaller bit must also come at a less significant position so that our algorithm is effectively O(n) where n = Integer.SIZE.
I remain with the following assertions to show the resulting algorithm:
nextCombination(0b1111111111, 1) == 0b10111111111;
nextCombination(0b1111111111, 10) == 0b11111111110;
nextCombination(0x00FF , 4) == 0x01EF;
nextCombination(0x7FFFFFFF , 4) == 0xF7FFFFFF;
nextCombination(0x03FF , 10) == 0x07FE;
// Correct wrapping
nextCombination(0b1 , 32) == 0b1;
nextCombination(0x7FFFFFFF , 32) == 0x7FFFFFFF;
nextCombination(0xFFFFFFEF , 5) == 0x7FFFFFFF;
Let us consider the numbers with k=10 bits set.
The trick is to determine the rank of the most significant one, for a given n.
There is a single number of length k: C(k, k)=1. There are k+1 = C(k+1, k) numbers of length k + 1. ... There are C(m, k) numbers of length m.
For k=10, the limit n are 1 + 10 + 55 + 220 + 715 + 2002 + 5005 + 11440 + ...
For a given n, you easily find the corresponding m. Then the problem is reduced to finding the n - C(m, k)-th number with k - 1 bits set. And so on recursively.
With precomputed tables, this can be very fast. 30045015 takes 30 lookups, so that I guess that the worst case is 29 x 30 / 2 = 435 lookups.
(This is based on linear lookups, to favor small values. By means of dichotomic search, you reduce this to less than 29 x lg(30) = 145 lookups at worse.)
Update:
My previous estimates were pessimistic. Indeed, as we are looking for k bits, there are only 10 determinations of m. In the linear case, at worse 245 lookups, in the dichotomic case, less than 50.
(I don't exclude off-by-one errors in the estimates, but clearly this method is very efficient and requires no snob.)

Calc CRC8 for objective c

I need the method to check sum CRC8.
I found this code, but it's not working:
- (int)crc8Checksum:(NSString*)dataFrame{
char j;
int crc8 = 0;
int x = 0;
for (int i = 0; i < [dataFrame length]; i++){
x = [dataFrame characterAtIndex:i];
for (int k = 0; k < 8; k++){
j = 1 & (x ^ crc8);
crc8 = floor0(crc8 / 2) & 0xFF;
x = floor0(x / 2) & 0xFF;
if (j != 0 ){
crc8 = crc8 ^ 0x8C;
}
}
}
return crc8;
}
Help me please!
What do you mean "it's not working"? There are 14 different CRC-8 definitions in this catalog, and probably many more out there in the wild. Do you have some CRC values you are comparing to? Is there documentation on what CRC you actually need? What are your test messages and corresponding expected CRCs?
You can't just pick some random CRC-8 code and expect it to do what you need.
That particular code computes a CRC-8/MAXIM in the linked catalog. However it is truly awful code. With unnecessary divides and floors. Here is a better, simpler, faster inner loop:
crc8 ^= x;
for (int k = 0; k < 8; k++)
crc8 = crc8 & 1 ? (crc8 >> 1) ^ 0x8c : crc8 >> 1;
You can get it faster still with tables and algorithms that compute the CRC a byte at a time or a machine word at a time.
The x in the code has its own problems, since an NSString can be a string of unicode characters, so characterAtIndex may not return a byte, and length may not return the number of bytes. You need a way to get the message as a series of bytes.

Finding the big theta bound

Give big theta bound for:
for (int i = 0; i < n; i++) {
if (i * i < n) {
for (int j = 0; j < n; j++) {
count++;
}
}
else {
int k = i;
while (k > 0) {
count++;
k = k / 2;
}
}
}
So here's what I think..Not sure if it's right though:
The first for loop will run for n iterations. Then the for for loop within the first for loop will run for n iterations as well, giving O(n^2).
For the else statement, the while loop will run for n iterations and the k = k/ 2 will run for logn time giving O(nlogn). So then the entire thing will look like n^2 + nlogn and by taking the bigger run time, the answer would be theta n^2 ?
I would say the result is O(nlogn) because i*i is typically not smaller than n for a linear n. The else branch will dominate.
Example:
n= 10000
after i=100 the else part will be calculated instead of the inner for loop