Writing API called from Lua - 0 or 1 based? - api

If you were to write an API that is called from Lua (which is 1-based, e.g. table indices start at 1), would you apply the same rule to your API?
For example, say your API had a function called GetFoo(x, y) which returned a Foo at the coordinate (x,y). Would you start your coordinate axes at (0,0) or (1,1) for the API, assuming that in the system itself (say written in C or C++ which are 0-based) these things start at (0,0) (so if you used the Lua convention you would always have to subtract 1 when retrieving numbers for these kinds of operations from the lua stack).

I haven't used Lua, but I would say for a coordinate system specifically (0,0) would be preferred.
For everything else, as long as you state it clearly in the documentation, by all means start indices at 1.

You could also just use the 0 index in your table/arrays. The only inconvenience is the standard libraries use the 1-based convention. So things like table.sort, string operations, etc ... will ignore the table[0] element.

Related

Fortran arrays argument [duplicate]

I'm going through a Fortran code, and one bit has me a little puzzled.
There is a subroutine, say
SUBROUTINE SSUB(X,...)
REAL*8 X(0:N1,1:N2,0:N3-1),...
...
RETURN
END
Which is called in another subroutine by:
CALL SSUB(W(0,1,0,1),...)
where W is a 'working array'. It appears that a specific value from W is passed to the X, however, X is dimensioned as an array. What's going on?
This is non-uncommon idiom for getting the subroutine to work on a (rectangular in N-dimensions) subset of the original array.
All parameters in Fortran (at least before Fortran 90) are passed by reference, so the actual array argument is resolved as a location in memory. Choose a location inside the space allocated for the whole array, and the subroutine manipulates only part of the array.
Biggest issue: you have to be aware of how the array is laid out in memory and how Fortran's array indexing scheme works. Fortran uses column major array ordering which is the opposite convention from c. Consider an array that is 5x5 in size (and index both directions from 0 to make the comparison with c easier). In both languages 0,0 is the first element in memory. In c the next element in memory is [0][1] but in Fortran it is (1,0). This affects which indexes you drop when choosing a subspace: if the original array is A(i,j,k,l), and the subroutine works on a three dimensional subspace (as in your example), in c it works on Aprime[i=constant][j][k][l], but in Fortran in works on Aprime(i,j,k,l=constant).
The other risk is wrap around. The dimensions of the (sub)array in the subroutine have to match those in the calling routine, or strange, strange things will happen (think about it). So if A is declared of size (0:4,0:5,0:6,0:7), and we call with element A(0,1,0,1), the receiving routine is free to start the index of each dimension where ever it likes, but must make the sizes (4,5,6) or else; but that means that the last element in the j direction actually wraps around! The thing to do about this is not use the last element. Making sure that that happens is the programmers job, and is a pain in the butt. Take care. Lots of care.
in fortran variables are passed by address.
So W(0,1,0,1) is value and address. so basically you pass subarray starting at W(0,1,0,1).
This is called "sequence association". In this case, what appears to be a scaler, an element of an array (actual argument in caller) is associated with an array (implicitly the first element), the dummy argument in the subroutine . Thereafter the elements of the arrays are associated by storage order, known as "sequence". This was done in Fortran 77 and earlier for various reasons, here apparently for a workspace array -- perhaps the programmer was doing their own memory management. This is retained in Fortran >=90 for backwards compatibility, but IMO, doesn't belong in new code.

How to access net displacements in pyiron

Using pyiron, I want to calculate the mean square displacement of the ions in my system. How do I see the total displacement (i.e. not folded back by periodic boundary conditions) without dumping very frequently and checking when an atom passes over the boundary and gets wrapped?
Try to compare job['output/generic/unwrapped_positions'][-1] and job.structure.positions+job.output.total_displacements[-1]. If they deliver the same values, it's definitely fine both ways. If not, you can post the relevant lines in your notebook here.
I'd like to add a few comments to Jan's answer:
While job['output/generic/unwrapped_positions'] returns the unwrapped positions parsed from the output files, job.output.total_displacements returns the displacement of atoms calculated from each pair of consecutive snapshots. So if an atom moves more than half the box length in any direction, job.output.total_displacements will give wrong coordinates. Therefore, job['output/generic/unwrapped_positions'] is generally more trustworthy, but it is not available in all the codes (since some codes simply do not provide an output for unwrapped positions).
Moreover, if an interactive job is used, it is possible that job.structure.positions does not return the initial positions, i.e. job.structure.positions+job.output.total_displacements won't be initial positions + displacements.
So, in short, my answer to your question would be rather "Use job['output/generic/unwrapped_positions'] and if it's not available, use job.structure.positions+job.output.total_displacements but be aware of potential problems you might be running into."

What is option–operand separation?

I recently read that option-operand separation is a principle that was introduced in the Eiffel language (I've never used Eiffel).
From the Wikipedia article:
[Option–operand separation] states that an operation's arguments should contain only operands — understood as information necessary to its operation — and not options — understood as auxiliary information. Options are supposed to be set in separate operations.
Does this mean that a function should only contain "essential" arguments that are part of its functionality, and that there shouldn't be any arguments that change the functionality (which instead should be a separate function)?
Could someone explain it simply, preferably with pseudocode example(s)?
Yes, this is the idea: arguments should not be used to select particular behavior. Different methods (features in Eiffel terms) should be used instead.
Example. Suppose, there is a method that moves a 2-D figure to a given position. The position could be specified using either polar or Cartesian coordinates:
move (coordinate_1, coordinate_2: REAL_64; is_polar: BOOLEAN)
-- Move the figure to the position (coordinate_1, coordinate_2)
-- using polar system if is_polar is True, and Cartesian system otherwise.
According to the principle, it's better to define two functions:
cartesian_move (x, y: REAL_64)
-- Move the figure to the position with Cartesian coordinates (x, y).
polar_move (rho, phi: REAL_64)
-- Move the figure to the position with polar coordinates (rho, phi).
Although the principle seems to be universally applicable, some object-oriented languages does not provide sufficient means for that in certain cases. The obvious example are constructors that in many languages have the same name, so using options becomes the only choice (a workaround would be to use object factories in these cases).

A general-purpose warp-level std::copy-like function - what should it account for?

A C++ standard library implements std::copy with the following code (ignoring all sorts of wrappers and concept checks etc) with the simple loop:
for (; __first != __last; ++__result, ++__first)
*__result = *__first;
Now, suppose I want a general-purpose std::copy-like function for warps (not blocks; not grids) to use for collaboratively copying data from one place to another. Let's even assume for simplicity that the function takes pointers rather than an arbitrary iterator.
Of course, writing general-purpose code in CUDA is often a useless pursuit - since we might be sacrificing a lot of the benefit of using a GPU in the first place in favor of generality - so I'll allow myself some boolean/enum template parameters to possibly select between frequently-occurring cases, avoiding runtime checks. So the signature might be, say:
template <typename T, bool SomeOption, my_enum_t AnotherOption>
T* copy(
T* __restrict__ destination,
const T* __restrict__ source,
size_t length
);
but for each of these cases I'm aiming for optimal performance (or optimal expected performance given that we don't know what other warps are doing).
Which factors should I take into consideration when writing such a function? Or in other words: Which cases should I distinguish between in implementing this function?
Notes:
This should target Compute Capabilities 3.0 or better (i.e. Kepler or newer micro-architectures)
I don't want to make a Runtime API memcpy() call. At least, I don't think I do.
Factors I believe should be taken into consideration:
Coalescing memory writes - ensuring that consecutive lanes in a warp write to consecutive memory locations (no gaps).
Type size vs Memory transaction size I - if sizeof(T) is sizeof(T) is 1 or 2, and we have have each lane write a single element, the entire warp would write less than 128B, wasting some of the memory transaction. Instead, we should have each thread place 2 or 4 input elements in a register, and write that
Type size vs Memory transaction size II - For type sizes such that lcm(4, sizeof(T)) > 4, it's not quite clear what to do. How well does the compiler/the GPU handle writes when each lane writes more than 4 bytes? I wonder.
Slack due to the reading of multiple elements at a time - If each thread wishes to read 2 or 4 elements for each write, and write 4-byte integers - we might have 1 or 2 elements at the beginning and the end of the input which must be handled separately.
Slack due to input address mis-alignment - The input is read in 32B transactions (under reasonable assumptions); we thus have to handle the first elements up to the multiple of 32B, and the last elements (after the last such multiple,) differently.
Slack due to output address mis-alignment - The output is written in transactions of upto 128B (or is it just 32B?); we thus have to handle the first elements up to the multiple of this number, and the last elements (after the last such multiple,) differently.
Whether or not T is trivially-copy-constructible. But let's assume that it is.
But it could be that I'm missing some considerations, or that some of the above are redundant.
Factors I've been wondering about:
The block size (i.e. how many other warps are there)
The compute capability (given that it's at least 3)
Whether the source/target is in shared memory / constant memory
Choice of caching mode

Custom EQ AudioUnit on iOS

The only effect AudioUnit on iOS is the "iTunes EQ", which only lets you use EQ pre-sets. I would like to use a customized eq in my audio graph
I came across this question on the subject and saw an answer suggesting using this DSP code in the render callback. This looks promising and people seem to be using this effectively on various platforms. However, my implementation has a ton of noise even with a flat eq.
Here's my 20 line integration into the "MixerHostAudio" class of Apple's "MixerHost" example application (all in one commit):
https://github.com/tassock/mixerhost/commit/4b8b87028bfffe352ed67609f747858059a3e89b
Any ideas on how I could get this working? Any other strategies for integrating an EQ?
Edit: Here's an example of the distortion I'm experiencing (with the eq flat):
http://www.youtube.com/watch?v=W_6JaNUvUjA
In the code in EQ3Band.c, the filter coefficients are used without being initialized. The init_3band_state method initialize just the gains and frequencies, but the coefficients themselves - es->f1p0 etc. are not initialized, and therefore contain some garbage values. That might be the reason for the bad output.
This code seems wrong in more then one way.
A digital filter is normally represented by the filter coefficients, which are constant, the filter inner state history (since in most cases the output depends on history) and the filter topology, which is the arithmetic used to calculate the output given the input and the filter (coeffs + state history). In most cases, and of course when filtering audio data, you expect to get 0's at the output if you feed 0's to the input.
The problems in the code you linked to:
The filter coefficients are changed in each call to the processing method:
es->f1p0 += (es->lf * (sample - es->f1p0)) + vsa;
The input sample is usually multiplied by the filter coefficients, not added to them. It doesn't make any physical sense - the sample and the filter coeffs don't even have the same physical units.
If you feed in 0's, you do not get 0's at the output, just some values which do not make any sense.
I suggest you look for another code - the other option is debugging it, and it would be harder.
In addition, you'd benefit from reading about digital filters:
http://en.wikipedia.org/wiki/Digital_filter
https://ccrma.stanford.edu/~jos/filters/