Multiplying int by 65536 in shader does not compile on older ES 2 device

Multiplying int by 65536 in shader does not compile on older ES 2 device - opengl-es-2.0

When I execute
int temp2 = temp1 * 65536;
in my vertex shader on my old Xperia J smart phone (about 8 or so years old), I get a shader compilation error - unfortunately the API does not display the reason for the error.
But when I run the same code on my modern smart phone I get no compilation error.
A work around on the old phone is to use
int temp2 = temp1 * int(65536.0);
instead.
I am using precision highp float; and I have tried precision highp int; but that didn't solve the problem.
Any info on why this is the case? Maybe it's just a bug from the earlier GLSL implementations?
Another workaround I have thought about but not yet tried is just uploading 65536 as an integer uniform.

I thought I'd post my findings in the comments as an answer, as I found another "bug" in the older GLSL compilers.
First for my OP... seems that the older GLSL compiler can read in literal values incorrectly, which I fixed by sending them up as uniform
The second weird behaviour I've found concerns the following code, which is from a vertex shader dealing with sprite data sent in as a texture. All the data is read correctly except the v_Color information. But if you move the line marked (*) to the very end of the code shown, the colour info arrives correctly. It also arrives correctly if you move the v_Color line higher up and change the offset orders accordingly. I guess this must be an old GLSL shader compiler bug too.
spriteId = int(a_vertexId) / 6;
offset = float(spriteId) * 6.0;
vertexId = int(a_vertexId) - spriteId * 6; (*)
tx = getFloat(texture2D(u_texSpriteData, uv(offset, u_SpriteDataSize)));
ty = getFloat(texture2D(u_texSpriteData, uv(offset + 1.0, u_SpriteDataSize)));
angle = getFloat(texture2D(u_texSpriteData, uv(offset + 2.0, u_SpriteDataSize)));
scale = getFloat(texture2D(u_texSpriteData, uv(offset + 3.0, u_SpriteDataSize)));
texId = getFloat(texture2D(u_texSpriteData, uv(offset + 4.0, u_SpriteDataSize)));
v_Color = texture2D(u_texSpriteData, uv(offset + 5.0, u_SpriteDataSize));

Related

How to demonstrate a memory misalignment error in C on a macbook pro (Intel 64 bit processor)

In an attempt to understand C memory alignment or whatever the term is (data structure alignment?), i'm trying to write code that results in a alignment error. The original reason that brought me to learning about this is that i'm writing data parsing code that reads binary data received over the network. The data contains some uint32s, uint64s, floats, and doubles, and i'd like to make sure they are never corrupted due to errors in my parsing code.
An unsuccessful attempt at causing some problem due to misalignment:
uint32_t integer = 1027;
uint8_t * pointer = (uint8_t *)&integer;
uint8_t * bytes = malloc(5);
bytes[0] = 23; // extra byte to misalign uint32_t data
bytes[1] = pointer[0];
bytes[2] = pointer[1];
bytes[3] = pointer[2];
bytes[4] = pointer[3];
uint32_t integer2 = *(uint32_t *)(bytes + 1);
printf("integer: %u\ninteger2: %u\n", integer, integer2);
On my machine both integers print out the same. (macbook pro with Intel 64 bit processor, not sure what exactly determines alignment behaviour, is it the architecture? or exact CPU model? or compiler maybe? i use Xcode so clang)
I guess my processor/machine/setup supports unaligned reads so it takes the above code without any problems.
What would a case where parsing of say an uint32_t would fail because of code not taking alignment in account? Is there a way to make it fail on a modern Intel 64 bit system? Or am i safe from alignment errors when using simple datatypes like integers and floats (no structs)?
Edit: If anyone's reading this later, i found a similar question with interesting info: Mis-aligned pointers on x86

Normally, the x86 architecture doesn't have alignment requirements [except for some SIMD instructions like movdqa).
However, since you're trying to write code to cause such an exception ...
There is an alignment check exception bit that can be set into the x86 flags register. If you turn in on, an unaligned access will generate an exception which will show up [under linux at least] as a bus error (i.e. SIGBUS)
See my answer here: any way to stop unaligned access from c++ standard library on x86_64? for details and some sample programs to generate an exception.

Measuring Program Execution Time with Cycle Counters

I have confusion in this particular line-->
result = (double) hi * (1 << 30) * 4 + lo;
of the following code:
void access_counter(unsigned *hi, unsigned *lo)
// Set *hi and *lo to the high and low order bits of the cycle
// counter.
{
asm("rdtscp; movl %%edx,%0; movl %%eax,%1" // Read cycle counter
: "=r" (*hi), "=r" (*lo) // and move results to
: /* No input */ // the two outputs
: "%edx", "%eax");
}
double get_counter()
// Return the number of cycles since the last call to start_counter.
{
unsigned ncyc_hi, ncyc_lo;
unsigned hi, lo, borrow;
double result;
/* Get cycle counter */
access_counter(&ncyc_hi, &ncyc_lo);
lo = ncyc_lo - cyc_lo;
borrow = lo > ncyc_lo;
hi = ncyc_hi - cyc_hi - borrow;
result = (double) hi * (1 << 30) * 4 + lo;
if (result < 0) {
fprintf(stderr, "Error: counter returns neg value: %.0f\n", result);
}
return result;
}
The thing I cannot understand is that why is hi being multiplied with 2^30 and then 4? and then low added to it? Someone please explain what is happening in this line of code. I do know that what hi and low contain.

The short answer:
That line turns a 64bit integer that is stored as 2 32bit values into a floating point number.
Why doesn't the code just use a 64bit integer? Well, gcc has supported 64bit numbers for a long time, but presumably this code predates that. In that case, the only way to support numbers that big is to put them into a floating point number.
The long answer:
First, you need to understand how rdtscp works. When this assembler instruction is invoked, it does 2 things:
1) Sets ecx to IA32_TSC_AUX MSR. In my experience, this generally just means ecx gets set to zero.
2) Sets edx:eax to the current value of the processor’s time-stamp counter. This means that the lower 64bits of the counter go into eax, and the upper 32bits are in edx.
With that in mind, let's look at the code. When called from get_counter, access_counter is going to put edx in 'ncyc_hi' and eax in 'ncyc_lo.' Then get_counter is going to do:
lo = ncyc_lo - cyc_lo;
borrow = lo > ncyc_lo;
hi = ncyc_hi - cyc_hi - borrow;
What does this do?
Since the time is stored in 2 different 32bit numbers, if we want to find out how much time has elapsed, we need to do a bit of work to find the difference between the old time and the new. When it is done, the result is stored (again, using 2 32bit numbers) in hi / lo.
Which finally brings us to your question.
result = (double) hi * (1 << 30) * 4 + lo;
If we could use 64bit integers, converting 2 32bit values to a single 64bit value would look like this:
unsigned long long result = hi; // put hi into the 64bit number.
result <<= 32; // shift the 32 bits to the upper part of the number
results |= low; // add in the lower 32bits.
If you aren't used to bit shifting, maybe looking at it like this will help. If lo = 1 and high = 2, then expressed as hex numbers:
result = hi; 0x0000000000000002
result <<= 32; 0x0000000200000000
result |= low; 0x0000000200000001
But if we assume the compiler doesn't support 64bit integers, that won't work. While floating point numbers can hold values that big, they don't support shifting. So we need to figure out a way to shift 'hi' left by 32bits, without using left shift.
Ok then, shifting left by 1 is really the same as multiplying by 2. Shifting left by 2 is the same as multiplying by 4. Shifting left by [omitted...] Shifting left by 32 is the same as multiplying by 4,294,967,296.
By an amazing coincidence, 4,294,967,296 == (1 << 30) * 4.
So why write it in that complicated fashion? Well, 4,294,967,296 is a pretty big number. In fact, it's too big to fit in an 32bit integer. Which means if we put it in our source code, a compiler that doesn't support 64bit integers may have trouble figuring out how to process it. Written like this, the compiler can generate whatever floating point instructions it might need to work on that really big number.
Why the current code is wrong:
It looks like variations of this code have been wandering around the internet for a long time. Originally (I assume) access_counter was written using rdtsc instead of rdtscp. I'm not going to try to describe the difference between the two (google them), other than to point out that rdtsc does not set ecx, and rdtscp does. Whoever changed rdtsc to rdtscp apparently didn't know that, and failed to adjust the inline assembler stuff to reflect it. While your code might work fine despite this, it might do something weird instead. To fix it, you could do:
asm("rdtscp; movl %%edx,%0; movl %%eax,%1" // Read cycle counter
: "=r" (*hi), "=r" (*lo) // and move results to
: /* No input */ // the two outputs
: "%edx", "%eax", "%ecx");
While this will work, it isn't optimal. Registers are a valuable and scarce resource on i386. This tiny fragment uses 5 of them. With a slight modification:
asm("rdtscp" // Read cycle counter
: "=d" (*hi), "=a" (*lo)
: /* No input */
: "%ecx");
Now we have 2 fewer assembly statements, and we only use 3 registers.
But even that isn't the best we can do. In the (presumably long) time since this code was written, gcc has added both support for 64bit integers and a function to read the tsc, so you don't need to use asm at all:
unsigned int a;
unsigned long long result;
result = __builtin_ia32_rdtscp(&a);
'a' is the (useless?) value that was being returned in ecx. The function call requires it, but we can just ignore the returned value.
So, instead of doing something like this (which I assume your existing code does):
unsigned cyc_hi, cyc_lo;
access_counter(&cyc_hi, &cyc_lo);
// do something
double elapsed_time = get_counter(); // Find the difference between cyc_hi, cyc_lo and the current time
We can do:
unsigned int a;
unsigned long long before, after;
before = __builtin_ia32_rdtscp(&a);
// do something
after = __builtin_ia32_rdtscp(&a);
unsigned long long elapsed_time = after - before;
This is shorter, doesn't use hard-to-understand assembler, is easier to read, maintain and produces the best possible code.
But it does require a relatively recent version of gcc.

iOS cocos2D Chipmunk Body's velocity is invalid

In cocos2d-spritebuilder 3.4 I've create my physicworld and my player like many of the official examples.
self.sprite.physicsBody = [CCPhysicsBody bodyWithCircleOfRadius:10 andCenter:self.anchorPoint];
self.sprite.physicsBody.type = CCPhysicsBodyTypeDynamic;
etc...
So, when I try to move my sprite to the right for example with
[self.sprite.physicsBody applyImpulse: CGPointMake(32,0)];
my app going in crash reporting this:
Aborting due to Chipmunk error: Body's velocity is invalid.
Failed condition: v.x == v.x && v.y == v.y
Source:.../Source/libs/cocos2d-iphone/external/Chipmunk/src/cpBody.c:106
Chipmunk source's have:
-(void)applyImpulse:(CGPoint)impulse { _body.velocity = cpvadd(_body.velocity, cpvmult(CCP_TO_CPV(impulse), 1.0f/_body.mass));}
So, I knew MASS was setted by default to 1.0f, but I want to try to set this value by hand..) :
self.sprite.physicsBody.mass = 1.0f
Same error, same results...
But if I try to set:
self.sprite.physicsBody.body.mass = 1.0f
It work great.
So, what's the differences btw these two parameters?
Why applyImpulse take only body.mass?

As far as I know, Chipmunk doesn't play nice when applying a "big" impulse. What you should try and do is build up velocity over several frames.
Edit: Check this link http://forum.cocos2d-swift.org/t/ccphysics-applyimpulse-not-working/12151/12

Science of Chords

I've been doing research on trying to understand the way sounds and sine waves work, particularly with chords. So far, my understanding is as follows:
b(t) = sin(Api(t)) is the base note of the chord at frequency A.
T(t) = sin(5/4piA(t)) is the major third of the base b(t).
D(t) = sin(3/2piA(t)) is the dominant (fifth) of the base b(t).
A(t) = sin(2Api(t)) is the octave.
Each one alone is a separate frequency which is easy for a computer generator to sound. However, the major chord of the note with frequency A is as follows:
Major Chord = b+T+D+A
I was wondering if anyone has a way to make a computer synthesizer play this function so I can hear the result; most programs I have found only take Hz as an input, an while this function has a wavelength, it's different from the simple sine wave with the same wavelength.
Note: will post this in the physics and computer sections as well - just wondering if you musicians know something about this.

It's a bit unclear to me what you're trying to do, so I'm explaining a few things, and giving you a few options to investigate further, depending on what your purpose is.
The de facto means of synthesizing music on computers uses MIDI (Musical Instrument Digital Interface). Because musicians rarely think directly in terms of frequencies or wavelengths (they count steps or half-steps along a scale), MIDI represents each pitch with an integer, that represents a number of half-steps. By default, MIDI assumes these pitches are tuned using a standard called "12-Tone Equal Temperament" (12TET), and, unfortunately, it isn't particularly easy to make it use other tunings.
What does this mean? And why is it a problem? (I'm not sure how much of this you know already, so I apologize if I'm repeating what you know)
In theory what you say about tuning being based on frequency ratios is 100% absolutely correct -- this is a system called Just Tuning. The major third is a 5/4 ratio and the perfect fifth is a 3/2 ratio. However instruments with fixed-pitches (keyboards, fretted instruments, etc...) are rarely tuned that way in practice. Instead, they compromise, by rounding each note in a chromatic scale to the nearest 12th of an octave. Since an adding octave is equivalent to multiplying the initial frequency by 2, adding a 12th of an octave is the equivalent of multiplying the initial frequency by 2^(1/12). This is how all the half steps on a piano are usually tuned.
So instead of the pure ratios, you would actually have:
sin(A pi t)
sin(2^(4/12) A pi t)
sin(2^(7/12) A pi t)
sin(2^(12/12) A pi t)
Note: Compare 2^(4/12) ~ 1.26, with 5/4 = 1.25. Also compare 2^(7/12) ~ 1.498, with 3/2 = 1.5.
These are exactly the frequencies that any MIDI synthesizer will play, given the MIDI notes numbered n, n+4, n+7, and n+12. So, if you are only looking to play a chord, and don't care about the frequency ratios being pure (just), you can just use MIDI.
However, if you are looking for something that will play the actual just ratios, it will be a bit trickier. You might start with looking at some of the things here: https://music.stackexchange.com/questions/351/software-that-allows-playing-in-different-temperaments
If you just want to see what they sound like, you can check out this youtube video:
https://www.youtube.com/watch?v=6NlI4No3s0M
If you can write software, you might try writing your own, but I don't want to go into how to do that if that's not going to be helpful.
I'm not sure what kinds of programs you're describing that "only takes Hz as input". Is this a software library (like an API call?) or something different? There are (obviously) API calls that can send more complex data to the soundcard than a single-frequency wave.
EDIT: I've not used it, but it looks like perhaps this software is capable of doing what you want: https://www.youtube.com/watch?v=z0NZQMiDdNU

I think you are going at the problem from a wrong direction. You are using sinoid signals as the basis of your "chords" and pure intervals.
The output of that is strictly periodic, with a period that is the least common multiple of the individual periods. So basically you have not as much a "chord" but rather a "tone".
Organs use that technique: you can combine an 8" pipe with a 5⅓" pipe in order to arrive at a tone sounding like it came from some funny 16" pipe. That's not a "chord", that's registration. Classical composition theory does not allow quint parallels to avoid that effect: quints must only occur transitorily and more often than not moving in a different direction or out of synch with the base note.
"chords" play with aural ambiguities between tone colors and voices: they excite the same region in your inner ear. However, real "chords" and choral effects also have beatings and interferences from non-ideal frequency ratios, and the tones they are made of have harmonics of their own, making it possible to discern them as independent entities.
The experienced music listener perceives all that as an independent phenomenon. However, if you start with pure sinoids or highly frequency-locked comparatively pure sources like snare-less organ pipes, this becomes hard to do.
So I'm not sure you are doing yourself a favor by looking at sinoids. It's like trying to understand a painting based on primary color components. It's more of a reproduction toolkit than anything tied to higher-level perception.

A very low-barrier way to play is to use wavepot
The code to do what you ask in your question is
var A = 440
export function dsp(t) {
var b = Math.sin(t * Math.PI * 2 * A);
var T = Math.sin(t * Math.PI * 2 * 5/4 * A);
var D = Math.sin(t * Math.PI * 2 * 3/2 * A);
var O = Math.sin(t * Math.PI * 2 * 2 * A);
return (b + T + D+ O) * 1/4
}
which is at this link.
Note that this might not sound much like a chord to you due to the fact that sine waves have no harmonics. Here is the same example, but using a saw-tooth waveform, which has many harmonics.
var A = 440
//function f(x){return Math.sin(x) }
function f(x){return 2 * ( (x/(2*Math.PI) % 1) -.5) } //sawtooth
export function dsp(t) {
var b = f(t * Math.PI * 2 * A);
var T = f(t * Math.PI * 2 * 5/4 * A);
var D = f(t * Math.PI * 2 * 3/2 * A);
var O = f(t * Math.PI * 2 * 2 * A);
return (b + T + D+ O) * 1/4
}

ROL / ROR on variable using inline assembly only in Objective-C [duplicate]

This question already has answers here:
ROL / ROR on variable using inline assembly in Objective-C
(2 answers)
Closed 9 years ago.
A few days ago, I asked the question below. Because I was in need of a quick answer, I added:
The code does not need to use inline assembly. However, I haven't found a way to do this using Objective-C / C++ / C instructions.
Today, I would like to learn something. So I ask the question again, looking for an answer using inline assembly.
I would like to perform ROR and ROL operations on variables in an Objective-C program. However, I can't manage it – I am not an assembly expert.
Here is what I have done so far:
uint8_t v1 = ....;
uint8_t v2 = ....; // v2 is either 1, 2, 3, 4 or 5
asm("ROR v1, v2");
the error I get is:
Unknown use of instruction mnemonic with unknown size suffix
How can I fix this?

A rotate is just two shifts - some bits go left, the others right - once you see this rotating is easy without assembly. The pattern is recognised by some compilers and compiled using the rotate instructions. See wikipedia for the code.
Update: Xcode 4.6.2 (others not tested) on x86-64 compiles the double shift + or to a rotate for 32 & 64 bit operands, for 8 & 16 bit operands the double shift + or is kept. Why? Maybe the compiler understands something about the performance of these instructions, maybe the just didn't optimise - but in general if you can avoid assembler do so, the compiler invariably knows best! Also using static inline on the functions, or using macros defined in the same way as the standard macro MAX (a macro has the advantage of adapting to the type of its operands), can be used to inline the operations.
Addendum after OP comment
Here is the i86_64 assembler as an example, for full details of how to use the asm construct start here.
First the non-assembler version:
static inline uint32 rotl32_i64(uint32 value, unsigned shift)
{
// assume shift is in range 0..31 or subtraction would be wrong
// however we know the compiler will spot the pattern and replace
// the expression with a single roll and there will be no subtraction
// so if the compiler changes this may break without:
// shift &= 0x1f;
return (value << shift) | (value >> (32 - shift));
}
void test_rotl32(uint32 value, unsigned shift)
{
uint32 shifted = rotl32_i64(value, shift);
NSLog(#"%8x <<< %u -> %8x", value & 0xFFFFFFFF, shift, shifted & 0xFFFFFFFF);
}
If you look at the assembler output for profiling (so the optimiser kicks in) in Xcode (Product > Generate Output > Assembly File, then select Profiling in the pop-up menu as the bottom of the window) you will see that rotl32_i64 is inlined into test_rotl32 and compiles down to a rotate (roll) instruction.
Now producing the assembler directly yourself is a bit more involved than for the ARM code FrankH showed. This is because to take a variable shift value a specific register, cl, must be used, so we need to give the compiler enough information to do that. Here goes:
static inline uint32 rotl32_i64_asm(uint32 value, unsigned shift)
{
// i64 - shift must be in register cl so create a register local assigned to cl
// no need to mask as i64 will do that
register uint8 cl asm ( "cl" ) = shift;
uint32 shifted;
// emit the rotate left long
// %n values are replaced by args:
// 0: "=r" (shifted) - any register (r), result(=), store in var (shifted)
// 1: "0" (value) - *same* register as %0 (0), load from var (value)
// 2: "r" (cl) - any register (r), load from var (cl - which is the cl register so this one is used)
__asm__ ("roll %2,%0" : "=r" (shifted) : "0" (value), "r" (cl));
return shifted;
}
Change test_rotl32 to call rotl32_i64_asm and check the assembly output again - it should be the same, i.e. the compiler did as well as we did.
Further note that if the commented out masking line in rotl32_i64 is included it essentially becomes rotl32 - the compiler will do the right thing for any architecture all for the cost of a single and instruction in the i64 version.
So asm is there is you need it, using it can be somewhat involved, and the compiler will invariably do as well or better by itself...
HTH

The 32bit rotate in ARM would be:
__asm__("MOV %0, %1, ROR %2\n" : "=r"(out) : "r"(in), "M"(N));
where N is required to be a compile-time constant.
But the output of the barrel shifter, whether used on a register or an immediate operand, is always a full-register-width; you can shift a constant 8-bit quantity to any position within a 32bit word, or - as here - shift/rotate the value in a 32bit register any which way.
But you cannot rotate 16bit or 8bit values within a register using a single ARM instruction. None such exists.
That's why the compiler, on ARM targets, when you use the "normal" (portable [Objective-]C/C++) code (in << xx) | (in >> (w - xx)) will create you one assembler instruction for a 32bit rotate, but at least two (a normal shift followed by a shifted or) for 8/16bit ones.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas