Consecutive item comparison with smallest memory footprint

Consecutive item comparison with smallest memory footprint - optimization

In J (using J503, not J6 or 7), normally when I want to see if the elements of an array are smaller than their predecessor, I use this:
smaller =: }:<:}.
Which results in n-1 items:
smaller 1 2 3 4 5
1 1 1 1
smaller 1 2 4 3
1 1 0
Internally, }: and }. create two arrays (one omits the last item, the other omits the first), finally allowing the <: comparison. The total memory usage will be 2(n-1) for the 2 temp arrays.
memuse =: 7!:2
i =: ,(10000#1)?10000
memuse 'smaller i'
148480
Another approach which should intuitively work better takes more memory:
smaller2 =: 13 : '2<:/\y.'
memuse 'smaller i'
214400
(J6 handles this much better. But I'm stuck with J5).
What would be a leaner alternative for the same operation?

Edit:
In J504, using rotate (1|.) seems to be the best choice so far:
smallerS =: <:1&|.
l =: ?. 10000$1000
;memuse &.> 'smaller l';'smaller2 l';'smallerS l'
148480 214400 82880
I would expect
2<:/\y
to be much better, given that 2 f/\y is special code since J 4.06. In the off chance that <: is the reason for the poor performance, try:
0>:2-/\ y

Related

Discrete Binary Search Main Theory

I have read this: https://www.topcoder.com/community/competitive-programming/tutorials/binary-search.
I can't understand some parts==>
What we can call the main theorem states that binary search can be
used if and only if for all x in S, p(x) implies p(y) for all y > x.
This property is what we use when we discard the second half of the
search space. It is equivalent to saying that ¬p(x) implies ¬p(y) for
all y < x (the symbol ¬ denotes the logical not operator), which is
what we use when we discard the first half of the search space.
But I think this condition does not hold when we want to find an element(checking for equality only) in an array and this condition only holds when we're trying to find Inequality for example when we're searching for an element greater or equal to our target value.
Example: We are finding 5 in this array.
indexes=0 1 2 3 4 5 6 7 8
1 3 4 4 5 6 7 8 9
we define p(x)=>
if(a[x]==5) return true else return false
step one=>middle index = 8+1/2 = 9/2 = 4 ==> a[4]=5
and p(x) is correct for this and from the main theory, the result is that
p(x+1) ........ p(n) is true but its not.
So what is the problem?

We CAN use that theorem when looking for an exact value, because we
only use it when discarding one half. If we are looking for say 5,
and we find say 6 in the middle, the we can discard the upper half,
because we now know (due to the theorem) that all items in there are > 5
Also notice, that if we have a sorted sequence, and want to find any element
that satisfies an inequality, looking at the end elements is enough.

Why is the time complexity of Heap Sort, O(nlogn)?

I'm trying to understand time complexity of different data structures and began with heap sort. From what I've read, I think collectively people agree heap sort has a time complexity of O(nlogn); however, I have difficulty understanding how that came to be.
Most people seem to agree that the heapify method takes O(logn) and buildmaxheap method takes O(n) thus O(nlogn) but why does heapify take O(logn)?
From my perspective, it seems heapify is just a method that compares a node's left and right node and properly swaps them depending if it is min or max heap. Why does that take O(logn)?
I think I'm missing something here and would really appreciate if someone could explain this better to me.
Thank you.

It seems that you are confusing about the time complexity about heap sort. It is true that build a maxheap from an unsorted array takes your O(n) time and O(1) for pop one element out. However, after you pop out the top element from the heap, you need to move the last element(A) in your heap to the top and heapy for maintaining heap property. For element A, it at most pop down log(n) times which is the height of your heap. So, you need at most log(n) time to get the next maximum value after you pop maximum value out. Here is an example of the process of heapsort.
18
15 8
7 11 1 2
3 6 4 9
After you pop out the number of 18, you need to put the number 9 on the top and heapify 9.
9
15 8
7 11 1 2
3 6 4
we need to pop 9 down because of 9 < 15
15
9 8
7 11 1 2
3 6 4
we need to pop 9 down because of 9 < 11
15
11 8
7 9 1 2
3 6 4
9 > 4 which means heapify process is done. And now you get the next maximum value 15 safely without broking heap property.
You have n number for every number you need to do the heapify process. So the time complexity of heapsort is O(nlogn)

You are missing the recursive call at the end of the heapify method.
Heapify(A, i) {
le <- left(i)
ri <- right(i)
if (le<=heapsize) and (A[le]>A[i])
largest <- le
else
largest <- i
if (ri<=heapsize) and (A[ri]>A[largest])
largest <- ri
if (largest != i) {
exchange A[i] <-> A[largest]
Heapify(A, largest)
}
}
In the worst case, after each step, largest will be about two times i. For i to reach the end of the heap, it would take O(logn) steps.

Checking if IP falls within a range with Redis

I am interested in using Redis to check if a IP address (converted into integer) falls within a range of IPs. It is very likely that the ranges will overlap.
I have found this question/answer, although I am not able to fully understand the logic behind it.
Thank you for your help!

EDIT - Since I got a downvote (a comment to explain why would be nice), I've removed some clutter from my answer.
#DidierSpezia answer in your linked question is a good answer, but it becomes hard to maintain if you are adding/removing ranges.
However it is not trivial (and expensive) to build and maintain it.
I have an answer that is easier to maintain, but it could get slow and memory expensive to compute with many ranges as it requires cloning a set of all ranges.
You need to save all ranges twice, in two sets. The score of each range will be its border values.
Going with the sets in #DidierSpezia example:
A 2-8
B 4-6
C 2-9
D 7-10
Your two sets will be:
ZADD ranges:low 2 "2-8" 4 "4-6" 2 "2-9" 7 "7-10"
ZADD ranges:high 8 "2-8" 6 "4-6" 9 "2-9" 10 "7-10"
To query to which ranges a value belongs, you need to trim the ranges that the lower border is higher than the queried value, and trim the ranges that the higher border is lower.
The most efficient way I can think of is cloning one of the sets, trimming one of it sides by the rules gave above, changing the scores of the ranges to reflect the other border and then trim the second side.
Here's how to find the ranges 5 belongs to:
ZUNIONSTORE tmp 1 ranges:low
ZREMRANGEBYSCORE tmp (5 +inf
ZINTERSTORE tmp 2 tmp ranges:high WEIGHTS 0 1
ZREMRANGEBYSCORE tmp -inf (5
ZRANGE tmp 0 -1

In this discussion, Dvir Volk and #antirez suggested to use a sorted set in which each entry represent a range, and has the following form:
Member = "min-max" range
Score = max value
For example:
ZADD z 10 "0-10"
ZADD z 20 "10-20"
ZADD z 100 "50-100"
And in order to check if a value falls within a range, you can use ZRANGEBYSCORE and parse the member returned.
For example, to check value 5:
ZRANGEBYSCORE z 5 +inf LIMIT 0 1
this will return the "0-10" member, and you only need to parse the string and validate if your value is in between.
To check value 25:
ZRANGEBYSCORE z 25 +inf LIMIT 0 1
will return "50-100", but the value is not between that range.

Karnaugh map group sizes

Full disclosure, this is for an assignment I don't think I'm looking for spoon feeding, more so just a general question. Am a I allowed to break that into a group of 8 and 2 groups of 4, or do all group sizes have to be equal, ie 4 groups of 4
1 0 1 1
0 0 0 0
1 1 1 1
1 1 1 1
Sorry if this is obvious, but my searches haven't been explicit and my teacher was quite vague. Thanks!

TL;DR: Groups don't have to be equal in size.
Let see what happens if, in your case, you take 11 groups of one. Then you will have an equation of eleven terms. (ie. case_1 or case_2 or... case_11).
By making big group, in your case 1 group of 8 and 2 groups of 4, you will have a very short and simplified equation like: case_group_8 or case_group_4_1 or case_group_4_2.
Both grouping are correct (we took all the one in the map) but the second is the most optimized. (i.e. you cannot simplified more)
Making 4 groups of 4 will bring you an equation that can be simplified more.
The best way now is for you to try both grouping (all 4 vs 8/4/4) and see the output result.

Counting rows in multiple variables using SAS

I have a question in creating a count variable using SAS.
Q R
----
1 a
1 a
1 b
1 b
1 b
2 a
3 a
3 c
4 c
4 c
4 c
I need to create a variable S that counts the rows that has same combination of Q and R. The following will be the output.
Q R S
-------------------
1 a 1
1 a 2
1 b 1
1 b 2
1 b 3*
2 a 1
3 a 1
3 c 1
4 b 1
4 b 2
4 b 3
I tried using following program:
data two;
set one;
S + 1;
by Q R;
if first.Q and first.R then S = 1;
run;
But, this did not run correctly. For example, * will come out as 1 instead of 3. I would appreciate any tips on how to make this counting variable work correctly.

Very close, your if statement is should be first.R (or change the and to OR but that isn't efficient). I usually prefer to have the increment after the set to 1.
data two;
set one;
by Q R;
*Retain S; *implicitly retained by using the +1 notation;
if first.R then S = 1;
else S+1;
run;

Reese's example is certainly sufficient in this case, but given this was a simple question with a mostly uninteresting answer, I'll instead present a very small variant simply from a programming style standpoint.
data two;
set one;
by Q R;
if first.R then s=0;
s+1;
run;
This will likely function exactly the same as Reese's code and the original question's code once the first.Q is removed. However, it has two slight differences.
First off, I like to group if first. variable-resetting code (that isn't otherwise dependent on location) in one place, as early as possible in the code (after by statements, where, "early" subsetting if, array, format, length, and retain). This is useful in an organizational standpoint because this code (usually) is something that roughly parallels what SAS does in between data step iterations, but for BY groups - so it's nice to have it at the start of the data step.
Second, I like initializing to zero. That avoids the need for the else conditional, which makes the code clearer; you're not doing anything different on the first row per BY group other than the re-initialization, so it makes sense to not have the increment be conditional.
These are both helpful in a more complex version of this data step; obviously in this particular step it doesn't matter much, but in a larger data step it can be helpful to organize your code more effectively. Imagine this data step:
data two;
set one;
by Q R;
retain Z;
format Z $12.;
array CS[15];
if first.R then do; *re-init block;
S=0;
Z=' ';
end;
S+1; *increment counter;
do _t = 1 to dim(CS);
Z = cats(Z,ifc(CS[_t]>0,put(CS[_t],8.),''));
end;
keep Q R S Z;
run;
That data step is nicely organized, and has a logical flow, even though almost every step could be moved anywhere else in the data step. Grouping the initializations together makes it more readable, particularly if you always group things that way.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Consecutive item comparison with smallest memory footprint - optimization

Related

Discrete Binary Search Main Theory

Why is the time complexity of Heap Sort, O(nlogn)?

Checking if IP falls within a range with Redis

Karnaugh map group sizes

Counting rows in multiple variables using SAS

Categories

Resources