Low Level Events in Solidity and log3 - uint to byte32

Low Level Events in Solidity and log3 - uint to byte32 - solidity

I have a question regarding low level events in solidity I can't really wrap my head around.
So in theory an event that looks like this:
event MyEvent(address indexed oneAddress, bool isTrueOrNot, uint256 myUnsingedNumber);
inside a function I would use it like that for example:
MyEvent(msg.sender, true, 5);
But now going to low-level events with log2 (log_i = i+1 parameters = 3). How would that be used there? I've tried around a bit but couldn't come up with the right solution...
log2(??, sha3("MyEvent(address,bool,uint256)"), msg.sender, ??)
In the samples in the Docs its quite straight forward, but I have real trouble putting that into this example here.
Here's the link to the docs: http://solidity.readthedocs.io/en/develop/contracts.html#events
Especially together with indexed, and uint256 to byte32 conversion, as all the parameters must be in byte32. Hope I haven't overlooked something...
Thanks!

I think the usage is log1( value, 'log_topic'); and then log2(value, 'log-topic1', 'log-topic2')
So if msg.sender is the value analyzed put it first in both cases.

Related

Why does this solidity function run into gas errors?

I'm trying to figure out some strange behavior. The function below takes in an array like [1,2,3,4,5], loops through it, and looks at another contract to verify ownership. I wrote it like this (taking in a controlled / limited array) to limit the amount of looping required (to avoid gas issues). The weird part (well, to me) is that I can run this a few times and it works great, mapping the unmapped values. It will always process as expected until I run about 50 items through it. After that, the next time it will gas out even if the array includes only one value. So, I'm wondering what's going on here...
function claimFreeNFTs (uint[] memory _IDlist) external payable noReentrant {
IERC721 OGcontract = IERC721(ERC721_contract);
uint numClaims = 0;
for (uint i = 0; i < _IDlist.length; i++) {
uint thisID = _IDlist[i];
require(OGcontract.ownerOf(thisID)==msg.sender, 'Must own token.' );
if ( !claimedIDList(thisID) ) { // checks mapping here...
claimIDset(thisID); // maps unmapped values here;
numClaims++;
}
}
if ( numClaims > 0 ) {
_safeMint(msg.sender, numClaims);
emit Mint(msg.sender, totalSupply());
}
}
Any thoughts / directions appreciated. :-)

Well, there was a bit more to the function, actually. I'd edited out some of what I thought was extraneous, but it turned out my error was in the extra stuff. The above does actually work. (Sorry.) After doing the mint, I was also reducing the supply of a reserve wallet on the contract -- one that held (suprise!) 50 NFTs. So, after this function processed 50, it was making that wallet hold negative NFTs, which screwed things up. Long story, but on Remix, I'd forgotten to set values in the constructor in the proper order, which is how I screwed it up in the first place. Anyway, solved.

nand2tetris HDL: Getting error "Sub bus of an internal node may not be used"

I am trying to make a 10-bit adder/subtractor. Right now, the logic works as intended. However, I am trying to set all bits to 0 iff there is overflow. To do this, I need to pass the output (tempOut) through a 10-bit Mux, but in doing so, am getting an error.
Here is the chip:
/**
* Adds or Subtracts two 10-bit values.
* Both inputs a and b are in SIGNED 2s complement format
* when sub == 0, the chip performs add i.e. out=a+b
* when sub == 1, the chip performs subtract i.e. out=a-b
* carry reflects the overflow calculated for 10-bit add/subtract in 2s complement
*/
CHIP AddSub10 {
IN a[10], b[10], sub;
OUT out[10],carry;
PARTS:
// If sub == 1, subtraction, else addition
// First RCA4
Not4(in=b[0..3], out=notB03);
Mux4(a=b[0..3], b=notB03, sel=sub, out=MuxOneOut);
RCA4(a=a[0..3], b=MuxOneOut, cin=sub, sum=tempOut[0..3], cout=cout03);
// Second RCA4
Not4(in=b[4..7], out=notB47);
Mux4(a=b[4..7], b=notB47, sel=sub, out=MuxTwoOut);
RCA4(a=a[4..7], b=MuxTwoOut, cin=cout03, sum=tempOut[4..7], cout=cout47);
// Third RCA4
Not4(in[0..1]=b[8..9], out=notB89);
Mux4(a[0..1]=b[8..9], b=notB89, sel=sub, out=MuxThreeOut);
RCA4(a[0..1]=a[8..9], b=MuxThreeOut, cin=cout47, sum[0..1]=tempOut[8..9], sum[0]=tempA, sum[1]=tempB, sum[2]=carry);
// FIXME, intended to solve overflow/underflow
Xor(a=tempA, b=tempB, out=overflow);
Mux10(a=tempOut, b=false, sel=overflow, out=out);
}

Instead of x[a..b]=tempOut[c..d] you need to use the form x[a..b]=tempVariableAtoB (creating a new internal bus) and combine these buses in your Mux10:
Mux10(a[0..3]=temp0to3, a[4..7]=temp4to7, ... );

Without knowing what line the compiler is complaining about, it is difficult to diagnose the problem. However, my best guess is that you can't use an arbitrary internal bus like tempOut because the compiler doesn't know how big it is when it first runs into it.
The compiler knows the size of the IN and OUT elements, and it knows the size of the inputs and outputs of a component. But it can't tell how big tempOut would be without parsing everything, and that's probably outside the scope of the compiler design.
I would suggest you refactor so that each RCA4 has a discrete output bus (ie: sum1, sum2, sum3). You can then use them and their individual bits as needed in the Xor and Mux10.

How to convert Greensock's CustomEase functions to be usable in CreateJS's Tween system?

I'm currently working on a project that does not include GSAP (Greensock's JS Tweening library), but since it's super easy to create your own Custom Easing functions with it's visual editor - I was wondering if there is a way to break down the desired ease-function so that it can be reused in a CreateJS Tween?
Example:
var myEase = CustomEase.create("myCustomEase", [
{s:0,cp:0.413,e:0.672},{s:0.672,cp:0.931,e:1.036},
{s:1.036,cp:1.141,e:1.036},{s:1.036,cp:0.931,e:0.984},
{s:0.984,cp:1.03699,e:1.004},{s:1.004,cp:0.971,e:0.988},
{s:0.988,cp:1.00499,e:1}
]);
So that it turns it into something like:
var myEase = function(t, b, c, d) {
//Some magic algorithm performed on the 7 bezier/control points above...
}
(Here is what the graph would look like for this particular easing method.)

I took the time to port and optimize the original GSAP-based CustomEase class... but due to license restrictions / legal matters (basically a grizzly bear that I do not want to poke with a stick...), posting the ported code would violate it.
However, it's fair for my own use. Therefore, I believe it's only fair that I guide you and point you to the resources that made it possible.
The original code (not directly compatible with CreateJS) can be found here:
https://github.com/art0rz/gsap-customease/blob/master/CustomEase.js (looks like the author was also asked to take down the repo on github - sorry if the rest of this post makes no sense at all!)
Note that CreateJS's easing methods only takes a "time ratio" value (not time, start, end, duration like GSAP's easing method does). That time ratio is really all you need, given it goes from 0.0 (your start value) to 1.0 (your end value).
With a little bit of effort, you can discard those parameters from the ease() method and trim down the final returned expression.
Optimizations:
I took a few extra steps to optimize the above code.
1) In the constructor, you can store the segments.length value directly as this.length in a property of the CustomEase instance to cut down a bit on the amount of accessors / property lookups in the ease() method (where qty is set).
2) There's a few redundant calculations done per Segments that can be eliminated in the ease() method. For instance, the s.cp - s.s and s.e - s.s operations can be precalculated and stored in a couple of properties in each Segments (in its constructor).
3) Finally, I'm not sure why it was designed this way, but you can unwrap the function() {...}(); that are returning the constructors for each classes. Perhaps it was used to trap the scope of some variables, but I don't see why it couldn't have wrapped the entire thing instead of encapsulating each one separately.
Need more info? Leave a comment!

Keeping time using timer interrupts an embedded microcontroller

This question is about programming small microcontrollers without an OS. In particular, I'm interested in PICs at the moment, but the question is general.
I've seen several times the following pattern for keeping time:
Timer interrupt code (say the timer fires every second):
...
if (sec_counter > 0)
sec_counter--;
...
Mainline code (non-interrupt):
sec_counter = 500; // 500 seconds
while (sec_counter)
{
// .. do stuff
}
The mainline code may repeat, set the counter to various values (not just seconds) and so on.
It seems to me there's a race condition here when the assignment to sec_counter in the mainline code isn't atomic. For example, in PIC18 the assignment is translated to 4 ASM statements (loading each byte at the time and selecting the right byte from the memory bank before that). If the interrupt code comes in the middle of this, the final value may be corrupted.
Curiously, if the value assigned is less than 256, the assignment is atomic, so there's no problem.
Am I right about this problem?
What patterns do you use to implement such behavior correctly? I see several options:
Disable interrupts before each assignment to sec_counter and enable after - this isn't pretty
Don't use an interrupt, but a separate timer which is started and then polled. This is clean, but uses up a whole timer (in the previous case the 1-sec firing timer can be used for other purposes as well).
Any other ideas?

The PIC architecture is as atomic as it gets. It ensures that all read-modify-write operations to a memory file are 'atomic'. Although it takes 4-clocks to perform the entire read-modify-write, all 4-clocks are consumed in a single instruction and the next instruction uses the next 4-clock cycle. It is the way that the pipeline works. In 8-clocks, two instructions are in the pipeline.
If the value is larger than 8-bit, it becomes an issue as the PIC is an 8-bit machine and larger operands are handled in multiple instructions. That will introduce atomic issues.

You definitely need to disable the interrupt before setting the counter. Ugly as it may be, it is necessary. It is a good practice to ALWAYS disable the interrupt before configuring hardware registers or software variables affecting the ISR method. If you are writing in C, you should consider all operations as non-atomic. If you find that you have to look at the generated assembly too many times, then it may be better to abandon C and program in assembly. In my experience, this is rarely the case.
Regarding the issue discussed, this is what I suggest:
ISR:
if (countDownFlag)
{
sec_counter--;
}
and setting the counter:
// make sure the countdown isn't running
sec_counter = 500;
countDownFlag = true;
...
// Countdown finished
countDownFlag = false;
You need an extra variable and is better to wrap everything in a function:
void startCountDown(int startValue)
{
sec_counter = 500;
countDownFlag = true;
}
This way you abstract the starting method (and hide ugliness if needed). For example you can easily change it to start a hardware timer without affecting the callers of the method.

Write the value then check that it is the value required would seem to be the simplest alternative.
do {
sec_counter = value;
} while (sec_counter != value);
BTW you should make the variable volatile if using C.
If you need to read the value then you can read it twice.
do {
value = sec_counter;
} while (value != sec_counter);

Because accesses to the sec_counter variable are not atomic, there's really no way to avoid disabling interrupts before accessing this variable in your mainline code and restoring interrupt state after the access if you want deterministic behavior. This would probably be a better choice than dedicating a HW timer for this task (unless you have a surplus of timers, in which case you might as well use one).

If you download Microchip's free TCP/IP Stack there are routines in there that use a timer overflow to keep track of elapsed time. Specifically "tick.c" and "tick.h". Just copy those files over to your project.
Inside those files you can see how they do it.

It's not so curious about the less than 256 moves being atomic - moving an 8 bit value is one opcode so that's as atomic as you get.
The best solution on such a microcontroller as the PIC is to disable interrupts before you change the timer value. You can even check the value of the interrupt flag when you change the variable in the main loop and handle it if you want. Make it a function that changes the value of the variable and you could even call it from the ISR as well.

Well, what does the comparison assembly code look like?
Taken to account that it counts downwards, and that it's just a zero compare, it should be safe if it first checks the MSB, then the LSB. There could be corruption, but it doesn't really matter if it comes in the middle between 0x100 and 0xff and the corrupted compare value is 0x1ff.

The way you are using your timer now, it won't count whole seconds anyway, because you might change it in the middle of a cycle.
So, if you don't care about it. The best way, in my opinion, would be to read the value, and then just compare the difference. It takes a couple of OPs more, but has no multi-threading problems.(Since the timer has priority)
If you are more strict about the time value, I would automatically disable the timer once it counts down to 0, and clear the internal counter of the timer and activate once you need it.

Move the code portion that would be on the main() to a proper function, and have it conditionally called by the ISR.
Also, to avoid any sort of delaying or missing ticks, choose this timer ISR to be a high-prio interrupt (the PIC18 has two levels).

One approach is to have an interrupt keep a byte variable, and have something else which gets called at least once every 256 times the counter is hit; do something like:
// ub==unsigned char; ui==unsigned int; ul==unsigned long
ub now_ctr; // This one is hit by the interrupt
ub prev_ctr;
ul big_ctr;
void poll_counter(void)
{
ub delta_ctr;
delta_ctr = (ub)(now_ctr-prev_ctr);
big_ctr += delta_ctr;
prev_ctr += delta_ctr;
}
A slight variation, if you don't mind forcing the interrupt's counter to stay in sync with the LSB of your big counter:
ul big_ctr;
void poll_counter(void)
{
big_ctr += (ub)(now_ctr - big_ctr);
}

No one addressed the issue of reading multibyte hardware registers (for example a timer.
The timer could roll over and increment its second byte while you're reading it.
Say it's 0x0001ffff and you read it. You might get 0x0010ffff, or 0x00010000.
The 16 bit peripheral register is volatile to your code.
For any volatile "variables", I use the double read technique.
do {
t = timer;
} while (t != timer);

What are you favorite low level code optimization tricks? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I know that you should only optimize things when it is deemed necessary. But, if it is deemed necessary, what are your favorite low level (as opposed to algorithmic level) optimization tricks.
For example: loop unrolling.

gcc -O2
Compilers do a lot better job of it than you can.

Picking a power of two for filters, circular buffers, etc.
So very, very convenient.
-Adam

Why, bit twiddling hacks, of course!

One of the most useful in scientific code is to replace pow(x,4) with x*x*x*x. Pow is almost always more expensive than multiplication. This is followed by
for(int i = 0; i < N; i++)
{
z += x/y;
}
to
double denom = 1/y;
for(int i = 0; i < N; i++)
{
z += x*denom;
}
But my favorite low level optimization is to figure out which calculations can be removed from a loop. Its always faster to do the calculation once rather than N times. Depending on your compiler, some of these may be automatically done for you.

Inspect the compiler's output, then try to coerce it to do something faster.

I wouldn't necessarily call it a low level optimization, but I have saved orders of magnitude more cycles through judicious application of caching than I have through all my applications of low level tricks combined. Many of these methods are applications specific.
Having an LRU cache of database queries (or any other IPC based request).
Remembering the last failed database query and returning a failure if re-requested within a certain time frame.
Remembering your location in a large data structure to ensure that if the next request is for the same node, the search is free.
Caching calculation results to prevent duplicate work. In addition to more complex scenarios, this is often found in if or for statements.
CPUs and compilers are constantly changing. Whatever low level code trick that made sense 3 CPU chips ago with a different compiler may actually be slower on the current architecture and there may be a good chance that this trick may confuse whoever is maintaining this code in the future.

++i can be faster than i++, because it avoids creating a temporary.
Whether this still holds for modern C/C++/Java/C# compilers, I don't know. It might well be different for user-defined types with overloaded operators, whereas in the case of simple integers it probably doesn't matter.
But I've come to like the syntax... it reads like "increment i" which is a sensible order.

Using template metaprogramming to calculate things at compile time instead of at run-time.

Years ago with a not-so-smart compilier, I got great mileage from function inlining, walking pointers instead of indexing arrays, and iterating down to zero instead of up to a maximum.
When in doubt, a little knowledge of assembly will let you look at what the compiler is producing and attack the inefficient parts (in your source language, using structures friendlier to your compiler.)

precalculating values.
For instance, instead of sin(a) or cos(a), if your application doesn't necessarily need angles to be very precise, maybe you represent angles in 1/256 of a circle, and create arrays of floats sine[] and cosine[] precalculating the sin and cos of those angles.
And, if you need a vector at some angle of a given length frequently, you might precalculate all those sines and cosines already multiplied by that length.
Or, to put it more generally, trade memory for speed.
Or, even more generally, "All programming is an exercise in caching" -- Terje Mathisen
Some things are less obvious. For instance traversing a two dimensional array, you might do something like
for (x=0;x<maxx;x++)
for (y=0;y<maxy;y++)
do_something(a[x,y]);
You might find the processor cache likes it better if you do:
for (y=0;y<maxy;y++)
for (x=0;x<maxx;x++)
do_something(a[x,y]);
or vice versa.

Don't do loop unrolling. Don't do Duff's device. Make your loops as small as possible, anything else inhibits x86 performance and gcc optimizer performance.
Getting rid of branches can be useful, though - so getting rid of loops completely is good, and those branchless math tricks really do work. Beyond that, try never to go out of the L2 cache - this means a lot of precalculation/caching should also be avoided if it wastes cache space.
And, especially for x86, try to keep the number of variables in use at any one time down. It's hard to tell what compilers will do with that kind of thing, but usually having less loop iteration variables/array indexes will end up with better asm output.
Of course, this is for desktop CPUs; a slow CPU with fast memory access can precalculate a lot more, but in these days that might be an embedded system with little total memory anyway…

I've found that changing from a pointer to indexed access may make a difference; the compiler has different instruction forms and register usages to choose from. Vice versa, too. This is extremely low-level and compiler dependent, though, and only good when you need that last few percent.
E.g.
for (i = 0; i < n; ++i)
*p++ = ...; // some complicated expression
vs.
for (i = 0; i < n; ++i)
p[i] = ...; // some complicated expression

Optimizing cache locality - for example when multiplying two matrices that don't fit into cache.

Allocating with new on a pre-allocated buffer using C++'s placement new.

Counting down a loop. It's cheaper to compare against 0 than N:
for (i = N; --i >= 0; ) ...
Shifting and masking by powers of two is cheaper than division and remainder, / and %
#define WORD_LOG 5
#define SIZE (1 << WORD_LOG)
#define MASK (SIZE - 1)
uint32_t bits[K]
void set_bit(unsigned i)
{
bits[i >> WORD_LOG] |= (1 << (i & MASK))
}
Edit
(i >> WORD_LOG) == (i / SIZE) and
(i & MASK) == (i % SIZE)
because SIZE is 32 or 2^5.

Jon Bentley's Writing Efficient Programs is a great source of low- and high-level techniques -- if you can find a copy.

Eliminating branches (if/elses) by using boolean math:
if(x == 0)
x = 5;
// becomes:
x += (x == 0) * 5;
// if '5' was a base 2 number, let's say 4:
x += (x == 0) << 2;
// divide by 2 if flag is set
sum >>= (blendMode == BLEND);
This REALLY speeds things out especially when those ifs are in a loop or somewhere that is being called a lot.

The one from Assembler:
xor ax, ax
instead of:
mov ax, 0
Classical optimization for program size and performance.

In SQL, if you only need to know whether any data exists or not, don't bother with COUNT(*):
SELECT 1 FROM table WHERE some_primary_key = some_value
If your WHERE clause is likely return multiple rows, add a LIMIT 1 too.
(Remember that databases can't see what your code's doing with their results, so they can't optimise these things away on their own!)

Recycling the frame-pointer all of a sudden
Pascal calling-convention
Rewrite stack-frame tail call optimizarion (although it sometimes messes with the above)
Using vfork() instead of fork() before exec()
And one I am still looking for, an excuse to use: data driven code-generation at runtime

Liberal use of __restrict to eliminate load-hit-store stalls.

Rolling up loops.
Seriously, the last time I needed to do anything like this was in a function that took 80% of the runtime, so it was worth trying to micro-optimize if I could get a noticeable performance increase.
The first thing I did was to roll up the loop. This gave me a very significant speed increase. I believe this was a matter of cache locality.
The next thing I did was add a layer of indirection, and put some more logic into the loop, which allowed me to only loop through the things I needed. This wasn't as much of a speed increase, but it was worth doing.
If you're going to micro-optimize, you need to have a reasonable idea of two things: the architecture you're actually using (which is vastly different from the systems I grew up with, at least for micro-optimization purposes), and what the compiler will do for you.
A lot of the traditional micro-optimizations trade space for time. Nowadays, using more space increases the chances of a cache miss, and there goes your performance. Moreover, a lot of them are now done by modern compilers, and typically better than you're likely to do them.
Currently, you should (a) profile to see if you need to micro-optimize, and then (b) try to trade computation for space, in the hope of keeping as much as possible in cache. Finally, run some tests, so you know if you've improved things or screwed them up. Modern compilers and chips are far too complex for you to keep a good mental model, and the only way you'll know if some optimization works or not is to test.

In addition to Joshua's comment about code generation (a big win), and other good suggestions, ...
I'm not sure if you would call it "low-level", but (and this is downvote-bait) 1) stay away from using any more levels of abstraction than absolutely necessary, and 2) stay away from event-driven notification-style programming, if possible.
If a computer executing a program is like a car running a race, a method call is like a detour. That's not necessarily bad except there's a strong temptation to nest those things, because once you're written a method call, you tend to forget what that call could cost you.
If your're relying on events and notifications, it's because you have multiple data structures that need to be kept in agreement. This is costly, and should only be done if you can't avoid it.
In my experience, the biggest performance killers are too much data structure and too much abstraction.

I was amazed at the speedup I got by replacing a for loop adding numbers together in structs:
const unsigned long SIZE = 100000000;
typedef struct {
int a;
int b;
int result;
} addition;
addition *sum;
void start() {
unsigned int byte_count = SIZE * sizeof(addition);
sum = malloc(byte_count);
unsigned int i = 0;
if (i < SIZE) {
do {
sum[i].a = i;
sum[i].b = i;
i++;
} while (i < SIZE);
}
}
void test_func() {
unsigned int i = 0;
if (i < SIZE) { // this is about 30% faster than the more obvious for loop, even with O3
do {
addition *s1 = &sum[i];
s1->result = s1->b + s1->a;
i++;
} while ( i<SIZE );
}
}
void finish() {
free(sum);
}
Why doesn't gcc optimise for loops into this? Or is there something I missed? Some cache effect?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas