Using bc from C code

Using bc from C code - yacc

I am writing the code for expression evaluator using lex and yacc which can have following operations:
/ , * , + , - , pow(a,b) , sqrt(a) , log(a)
also there can be brackets in the expression.
Input expression is in the file "calculator.input"
I have to compare the time of my code with bc, I am facing following problems:
1) bc doesn't accept pow(a,b) and log(a) it instead accepts a^b and l(a) .
How do I change it?
2) How do I use the bc from the main funtion in the yacc program ? or that can't be done?

I think it would be easier to change your code than to change bc, but if you want to try, you can find pointers to bc's source bundles on the GNU project page and in the FreeBSD source mirror. Of course, the end result would not strictly speaking be bc any more, so I don't know if it would still count, for the purposes of your assignment.
I don't know what the specifications are for the pow function you are supposed to implement, but note that bc's ^ operator only allows integer exponents, so it might not work on all your test cases (unless, of course, all your test cases have integer exponents.) You could compute a^b with e(l(a)*b), but it won't be as accurate for integer exponents:
e(l(10)*100)
99999999999999999920085453156357924020916787698393558126052191252537\
96016108317256511712576426623511.11829711443225035170
10^100
10000000000000000000000000000000000000000000000000000000000000000000\
000000000000000000000000000000000
You might want to consult with your tutor, professor, or teaching assistant.
If you don't want to (or are not allowed to) generate the bc equivalent test cases by hand, you might be able to automate the process with sed (if the exponential sub-expressions are not complicated), or by adapting your calculator to output the expression in bc's syntax. The latter would be a fairly easy project, and you'd probably learn something by implementing it.
If you are using a Unix-like system, you can easily run any command-line utility from a C program. (Indeed, you can do that on non-Unix-like systems, too, but the library functions will differ.) If you don't need to pass data to bc through its stdin, you can use the popen(3) library function, which is certainly the easiest solution.
Otherwise, you will have to set up a pair of pipe(2)s (one for writing to bc's stdin and the other for reading from its stdout), fork(2) a child process, and use one of the exec* function calls, probably execlp(3) or execvp(3), to run bc in the child. (Watch out for pipe deadlock while you are writing to and reading from the child.) Once the child process finishes (which you'll notice because you'll get an EOF on the pipe you're using to read from its stdout, you should use wait(3) or waitpid(3) to get its status code.
If all that seems too complicated, you could use the much simpler solution of running both your program and bc from your shell. (You can use the time shell built-in on Unix-like shells to get a measure of execution time, although it will not be microsecond resolution which might be necessary for such a simple program.)

Related

Is it possible to get the native CPU size of an integer in Rust?

For fun, I'm writing a bignum library in Rust. My goal (as with most bignum libraries) is to make it as efficient as I can. I'd like it to be efficient even on unusual architectures.
It seems intuitive to me that a CPU will perform arithmetic faster on integers with the native number of bits for the architecture (i.e., u64 for 64-bit machines, u16 for 16-bit machines, etc.) As such, since I want to create a library that is efficient on all architectures, I need to take the target architecture's native integer size into account. The obvious way to do this would be to use the cfg attribute target_pointer_width. For instance, to define the smallest type which will always be able to hold more than the maximum native int size:
#[cfg(target_pointer_width = "16")]
type LargeInt = u32;
#[cfg(target_pointer_width = "32")]
type LargeInt = u64;
#[cfg(target_pointer_width = "64")]
type LargeInt = u128;
However, while looking into this, I came across this comment. It gives an example of an architecture where the native int size is different from the pointer width. Thus, my solution will not work for all architectures. Another potential solution would be to write a build script which codegens a small module which defines LargeInt based on the size of a usize (which we can acquire like so: std::mem::size_of::<usize>().) However, this has the same problem as above, since usize is based on the pointer width as well. A final obvious solution is to simply keep a map of native int sizes for each architecture. However, this solution is inelegant and doesn't scale well, so I'd like to avoid it.
So, my questions: is there a way to find the target's native int size, preferably before compilation, in order to reduce runtime overhead? Is this effort even worth it? That is, is there likely to be a significant difference between using the native int size as opposed to the pointer width?

It's generally hard (or impossible) to get compilers to emit optimal code for BigNum stuff, that's why https://gmplib.org/ has its low level primitive functions (mpn_... docs) hand-written in assembly for various target architectures with tuning for different micro-architecture, e.g. https://gmplib.org/repo/gmp/file/tip/mpn/x86_64/core2/mul_basecase.asm for the general case of multi-limb * multi-limb numbers. And https://gmplib.org/repo/gmp/file/tip/mpn/x86_64/coreisbr/aors_n.asm for mpn_add_n and mpn_sub_n (Add OR Sub = aors), tuned for SandyBridge-family which doesn't have partial-flag stalls so it can loop with dec/jnz.
Understanding what kind of asm is optimal may be helpful when writing code in a higher level language. Although in practice you can't even get close to that so it sometimes makes sense to use a different technique, like only using values up to 2^30 in 32-bit integers (like CPython does internally, getting the carry-out via a right shift, see the section about Python in this). In Rust you do have access to add_overflow to get the carry-out, but using it is still hard.
For practical use, writing Rust bindings for GMP is probably your best bet, unless that already exists.
Using the largest chunks possible is very good; on all current CPUs, add reg64, reg64 has the same throughput and latency as add reg32, reg32 or reg8. So you get twice as much work done per unit. And carry propagation through 64 bits of result in 1 cycle of latency.
(There are alternate ways to store BigInteger data that can make SIMD useful; #Mysticial explains in Can long integer routines benefit from SSE?. e.g. 30 value bits per 32-bit int, allowing you to defer normalization until after a few addition steps. But every use of such numbers has to be aware of these issues so it's not an easy drop-in replacement.)
In Rust, you probably want to just use u64 regardless of the target, unless you really care about small-number (single-limb) performance on 32-bit targets. Let the compiler build u64 operations for you out of add / adc (add with carry).
The only thing that might need to be ISA-specific is if u128 is not available on some targets. You want to use 64 * 64 => 128-bit full multiply as your building block for multiplication; if the compiler can do that for you with u128 then that's great, especially if it inlines efficiently.
See also discussion in comments under the question.
One stumbling block for getting compilers to emit efficient BigInt addition loops (even inside the body of one unrolled loop) is writing an add that takes a carry input and produces a carry output. Note that x += 0xff..ff + carry=1 needs to produce a carry out even though 0xff..ff + 1 wraps to zero. So in C or Rust, x += y + carry has to check for carry out in both the y+carry and the x+= parts.
It's really hard (probably impossible) to convince compiler back-ends like LLVM to emit a chain of adc instructions. An add/adc is doable when you don't need the carry-out from adc. Or probably if the compiler is doing it for you for u128.overflowing_add
Often compilers will turn the carry flag into a 0 / 1 in a register instead of using adc. You can hopefully avoid that for at least pairs of u64 in addition by combining the input u64 values to u128 for u128.overflowing_add. That will hopefully not cost any asm instructions because a u128 already has to be stored across two separate 64-bit registers, just like two separate u64 values.
So combining up to u128 could just be a local optimization for a function that adds arrays of u64 elements, to get the compiler to suck less.

In my library ibig what I do is:
Select architecture-specific size based on target_arch.
If I don't have a value for an architecture, select 16, 32 or 64 based on target_pointer_width.
If target_pointer_width is not one of these values, use 64.

Lua Spaghetti Modules

I am currently developing my own programming language. The codebase (in Lua) is composed of several modules, as follows:
The first, error.lua, has no dependancies;
lexer.lua depends only on error.lua;
prototypes.lua also has no dependancies;
parser.lua, instead, depends on all the modules above;
interpreter.lua is the fulcrum of the whole codebase. It depends on error.lua, parser.lua, and memory.lua;
memory.lua depends on functions.lua;
finally, functions.lua depends on memory.lua and interpreter.lua. It is required from inside memory.lua, so we can say that memory.lua also depends on interpreter.lua.
With "A depends on B" I mean that the functions declared in A need those declared in B.
The real problem, though, is when A depends on B which depends on A, which, as you can understand from the list above, happens quite frequently in my code.
To give a concrete example of my problem, here's how interpreter.lua looks like:
--first, I require the modules that DON'T depend on interpreter.lua
local parser, Error = table.unpack(require("parser"))
--(since error.lua is needed both in the lexer, parser and interpreter module,
--I only actually require it once in lexer.lua and then pass its result around)
--Then, I should require memory.lua. But since memory.lua and
--functions.lua need some functions from interpreter.lua to work, I just
--forward declare the variables needed from those functions and then those functions themself:
--forward declaration
local globals, new_memory, my_nil, interpret_statement
--functions I need to declare before requiring memory.lua
local function interpret_block()
--uses interpret_statement and new_memory
end
local function interpret_expresion()
--uses new_memory, Error and my_nil
end
--Now I can safely require memory.lua:
globals, new_memory, my_nil = require("memory.lua")(interpret_block, interpret_espression)
--(I'll explain why it returns a function to call later)
--Then I have to fulfill the forward declaration of interpret_executement:
function interpret_executement()
--uses interpret_expression, new_memory and Error
end
--finally, the result is a function
return function()
--uses parser, new_fuction and globals
end
The memory.lua module returns a function so that it can receive interpret_block and interpret_expression as arguments, like this:
--memory.lua
return function(interpret_block, interpret_expression)
--declaration of globals, new_memory, my_nil
return globals, new_memory, my_nil
end
Now, I got the idea of the forward declarations here and that of the functions-as-modules (like in memory.lua, to pass some functions from the requiring module to the required module) here. They're all great ideas, and I must say that they work greatly. But you pay in readability.
In fact, breaking in smaller pieces the code this time made my work harder that it would have been if I coded everything in a single file, which is impossible for me because it's over than 1000 lines of code and I'm coding from a smartphone.
The feeling I have is that of working with spaghetti code, only on a larger scale.
So how could I solve the problem of my code being ununderstandable because of some modules needing each other to work (which doesn't involve making all the variables global, of course)? How would programmers in other languages solve this problem? How should I reorganize my modules? Are there any standard rules in using Lua modules that could also help me with this problem?

If we look at your lua files as a directed graph, where a vertice points from a dependency to its usage, the goal is to modify your graph to be a tree or forest, as you intend to get rid of the cycles.
A cycle is a set of nodes, which, traversed in the direction of the vertices can reach the starting node.
Now, the question is how to get rid of cycles?
The answer looks like this:
Let's consider node N and let's consider {D1, D2, ..., Dm} as its direct dependencies. If there is no Di in that set that depends on N either directly or indirectly, then you can leave N as it is. In that case, the set of problematic dependencies looks like this: {}
However, what if you have a non-empty set, like this: {PD1, ..., PDk} ?
You then need to analyze PDi for i between 1 and k along with N and see what is the subset in each PDi that does not depend on N and what is the subset of N which does not depend on any PDi. This way you can define N_base and N, PDi_base and PDi. N depends on N_base, just like all PDi elements and PDi depends on PDi_base along with N_base.
This approach minimalizes circles in the dependency tree. However, it is quite possible that a function set of {f1, ..., fl} exists in this group which cannot be migrated into _base as discussed due to dependencies and there are still cycles. In this case you need to give a name to the group in question, create a module for it and migrate all to functions into that group.

G_LLL_XD function in NTL library faulty

I am trying to use the G_LLL_XD function on the NTL library. Whenever I use the function in this format:
G_LLL_XD(B, delta); ,
the program works.
Though, when I want to change the default deep or prune variables and write the function in one of these ways:
G_LLL_XD(B, delta, deep, check, verbose);
G_LLL_XD(B, delta, prune, check, verbose);
during runtime, I get this error:
R610
- abort() has been called
and in the command prompt it says:
"sorry...deep insertions not implemented"
I find this very weird since whenever I use prune as a variable, I get this crash error, which I shouldn't because the function shouldn't be looking for deep insertion but prune, and when I do use deep as a variable and have implemented deep, I still get an error.
Can anybody help me understand what the problem is or how I can fix this? Thank you very much.

I dont found a argument prune for LLL function in NTL. But there is one for BKZ. Since the are both accept positive intergers, its only a naming confusion.
From the documentation:
NOTE: use of "deep" is obsolete, and has been "deprecated". It is
recommended to use BKZ_FP to achieve higher-quality reductions.
Moreover, the Givens versions do not support "deep", and setting
deep != 0 will raise an error in this case.
So you can not use G_LLL_XD with deep != 0 but LLL_XD should work (but it is deprecated).
But as mentioned, you should consider using BKZ_XD instead of LLL_XD.
A BKZ basis of a lattice is also LLL reduced, so there should be no problem. BKZ is slower than LLL but you can choose a small Blocksize, maybe 10 or 20 but also 2 or 4 will work, to speed the reduction up.

Using fractional exponent with bc

bc, a Linux command-line calculator, is proficient enough to calculate
3^2
9
Even a negative exponent doesn't confuse it:
3^-2
0.11111
Yet it fails when it encounters
9^0.5
Runtime warning (func=(main), adr=8): non-zero scale in exponent
How could it be that bc can't handle this?
And what does the error message mean?
Yes, I've read this and the solution given there:
e(0.5*l(9))
2.99999999999999999998
And yes, it is no good because of precision loss and
A calculator is supposed to solve expressions. You are not supposed to
make life easier for the calculator, it is supposed to be the other
way around...
This feature was designed to encourage users to write their own functions. Making it a unique calculator that requires a user-defined function to calculate a square root.
It doesn't really bother me to write a function for tangents or cotangents as it looks pretty straightforward given s(x) and c(x). But in my opinion calculating a square root through a user-defined function is a bit too much.
Why anyone uses bc if there's Python out there? Speed?

In bc, b must be an integer in a ^ b. However you can add your own functions to bc like this:
create a file ~/.bcrc, add the following function to it:
define pow(a, b) {
if (scale(b) == 0) {
return a ^ b;
}
return e(b*l(a));
}
then you can start bc as follows:
bc ~/.bcrc -l
so you can use function pow to do such calculation.
See more here, you can add some more functions to bc.

bc is very basic and more complex functions not provided by the "math extension" must be implemented in the language itself: it has all you need to do it; in particular "power" is a common example even on wikipedia.
But you may be also interested in reading for example this answer here on SO.

Quick divisibility check in ZX81 BASIC

Since many of the Project Euler problems require you to do a divisibility check for quite a number of times, I've been trying to figure out the fastest way to perform this task in ZX81 BASIC.
So far I've compared (N/D) to INT(N/D) to check, whether N is dividable by D or not.
I have been thinking about doing the test in Z80 machine code, I haven't yet figured out how to use the variables in the BASIC in the machine code.
How can it be achieved?

You can do this very fast in machine code by subtracting repeatedly. Basically you have a procedure like:
set accumulator to N
subtract D
if carry flag is set then it is not divisible
if zero flag is set then it is divisible
otherwise repeat subtraction until one of the above occurs
The 8 bit version would be something like:
DIVISIBLE_TEST:
LD B,10
LD A,100
DIVISIBLE_TEST_LOOP:
SUB B
JR C, $END_DIVISIBLE_TEST
JR Z, $END_DIVISIBLE_TEST
JR $DIVISIBLE_TEST_LOOP
END_DIVISIBLE_TEST:
LD B,A
LD C,0
RET
Now, you can call from basic using USR. What USR returns is whatever's in the BC register pair, so you would probably want to do something like:
REM poke the memory addresses with the operands to load the registers
POKE X+1, D
POKE X+3, N
LET r = USR X
IF r = 0 THEN GOTO isdivisible
IF r <> 0 THEN GOTO isnotdivisible
This is an introduction I wrote to Z80 which should help you figure this out. This will explain the flags if you're not familiar with them.
There's a load more links to good Z80 stuff from the main site although it is Spectrum rather than ZX81 focused.
A 16 bit version would be quite similar but using register pair operations. If you need to go beyond 16 bits it would get a bit more convoluted.
How you load this is up to you - but the traditional method is using DATA statements and POKEs. You may prefer to have an assembler figure out the machine code for you though!

Your existing solution may be good enough. Only replace it with something faster if you find it to be a bottleneck in profiling.
(Said with a straight face, of course.)
And anyway, on the ZX81 you can just switch to FAST mode.

Don't know if RANDOMIZE USR is available in ZX81 but I think it can be used to call routines in assembly. To pass arguments you might need to use POKE to set some fixed memory locations before executing RANDOMIZE USR.
I remember to find a list of routines implemented in the ROM to support the ZX Basic. I'm sure there are a few to perform floating operation.
An alternative to floating point is to use fixed point math. It's a lot faster in these kind of situations where there is no math coprocessor.
You also might find more information in Sinclair User issues. They published some articles related to programming in the ZX Spectrum

You should place the values in some pre-known memory locations, first. Then use the same locations from within Z80 assembler. There is no parameter passing between the two.
This is based on what I (still) remember of ZX Spectrum 48. Good luck, but you might consider upgrading your hw. ;/

The problem with Z80 machine code is that it has no floating point ops (and no integer divide or multiply, for that matter). Implementing your own FP library in Z80 assembler is not trivial. Of course, you can use the built-in BASIC routines, but then you may as well just stick with BASIC.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas